diff --git a/src/redis_stack/probabilistic_data_structures.md b/src/redis_stack/probabilistic_data_structures.md deleted file mode 100644 index d9993e9..0000000 --- a/src/redis_stack/probabilistic_data_structures.md +++ /dev/null @@ -1,27 +0,0 @@ -In very broad terms probabilistic data structures (PDS) allow us to get to a "close enough" result in a much shorter time and by using significantly less memory. - -Redis Stack supports 4 of the most famous PDS: -- Bloom filters -- Cuckoo filters -- Count-Min Sketch -- Top-K - -In the rest of this tutorial we'll introduce how you can use a Bloom filter to save many heavy calls to the relational database, or a lot of memory, compared to using sets or hashes. -A Bloom filter is a probabilistic data structure that enables you to check if an element is present in a set using a very small memory space of a fixed size. **It can guarantee the absence of an element from a set, but it can only give an estimation about its presence**. So when it responds that an element is not present in a set (a negative answer), you can be sure that indeed is the case. However, one out of every N positive answers will be wrong. -Even though it looks unusual at a first glance, this kind of uncertainty still has its place in computer science. There are many cases out there where a negative answer will prevent very costly operations; - -How can a Bloom filter be useful to our bike shop? For starters, we could keep a Bloom filter that stores all usernames of people who've already registered with our service. That way, when someone is creating a new account we can very quickly check if that username is free. If the answer is yes, we'd still have to go and check the main database for the precise result, but if the answer is no, we can skip that call and continue with the registration. - -Another, perhaps more interesting example is for showing better and more relevant ads to users. We could keep a bloom filter per user with all the products they've bought from the shop, and when we get a list of products from our suggestion engine we could check it against this filter. - - -```redis Add all bought product ids in the Bloom filter -BF.MADD user:778:bought_products 4545667 9026875 3178945 4848754 1242449 -``` - -Just before we try to show an ad to a user, we can first check if that product id is already in their "bought products" Bloom filter. If the answer is yes - we might choose to check the main database, or we might skip to the next recommendation from our list. But if the answer is no, then we know for sure that our user hasn't bought that product: - -```redis Has a user bought this product? -BF.EXISTS user:778:bought_products 1234567 // No, the user has not bought this product -BF.EXISTS user:778:bought_products 3178945 // The user might have bought this product -``` diff --git a/src/redis_stack/working_with_bloom_filter.md b/src/redis_stack/working_with_bloom_filter.md new file mode 100644 index 0000000..7985934 --- /dev/null +++ b/src/redis_stack/working_with_bloom_filter.md @@ -0,0 +1,42 @@ +RedisBloom is developed by Redis Inc., and adds probabilistic data structures, including a Bloom filter, to Redis. To install RedisBloom on top of an existing Redis, download and run [Redis Stack](https://redis.io/docs/stack/get-started/install/), which includes RedisBloom and other capabilities. Also check out [RedisBloom commands](https://redis.io/commands/?group=bf). + +A Bloom filter is a probabilistic data structure that enables you to check if an element is present in a set using a very small memory space of a fixed size. It can guarantee the absence of an element from a set by giving a Boolean response about its presence. So, when it responds that an element is not present in a set ('false`), you can be sure that indeed is the case. However, false positive matches are possible. Controllable tradeoffs between accuracy and memory consumption from a Bloom filter are possible via the `error_rate` argument of the `BF.RESERVE` command. + +If you receive unhelpful queries from the clients to the database because no keys are matched in Redis, you can use a Bloom filter to filter such queries. + +These guidelines help you learn how to use a Bloom filter to reduce heavy calls to the relational database or memory. + +## Avoiding cache penetration + +* Before checking the cache, implement some logic (e.g., IP range filtering). If the same unacceptable addresses are queried repeatedly, you may consider storing these addresses in Redis with an empty string value. +* If you need to store millions of invalid keys, indeed, you may consider using a Bloom filter. +* Create a Bloom filter using `BF.RESERVE` and add invalid addresses using `BF.ADD`. To determine if an invalid address has been seen before, use `BF.EXISTS`. The answer `1` means that, with high probability, the value has been seen before. An `0` means that it definitely wasn't seen before. + +## Handling incoming requests + +Because false-positive matches are possible with a Bloom filter (BF), you can use these options to better handle incoming requests. + +### Store all valid keys in a BF upfront + +* Add all valid keys to the BF. +* When a request is received, search in the Bloom filter. +* If found in the BF, it is, with high probability, a valid key. Try to fetch it from the DB. If not found in the DB (low probability) it was a false positive. +* If not found in the BF, it is necessarily an invalid key. + +### Store valid keys in a BF on the fly + +* When a request is received, search in the Bloom filter. +* If found in the BF, it is, with high probability, a valid key that was already seen. Try to fetch it from the DB. If not found in the DB (low probability) it was a false positive. +* If not found in the BF - it is either a first-time valid key or an invalid key. Check, and if valid - add to the BF. + + +### Store invalid keys in a BF + +* When a request is received, search in the Bloom filter. +* If found in the BF, it is, with high probability, an invalid key. Note that it may be a valid key (low probability) and you will ignore it, but that's a price you should be ready to pay if you go this way. +* If not found in the BF, it is either a valid key or a first-time invalid key. Check and, if invalid, add it to the BF. + +## Notes + +* You don't need to add an item to a BF more than once. There is no benefit, but also no harm. +* You can't delete keys from a BF, but you can use a Cuckoo filter instead, which supports deletions but has some disadvantages compared to BF. RedisBloom supports Cuckoo filters as well. diff --git a/src/redis_stack/working_with_graphs.md b/src/redis_stack/working_with_graphs.md index 10924dd..000ec03 100644 --- a/src/redis_stack/working_with_graphs.md +++ b/src/redis_stack/working_with_graphs.md @@ -1,10 +1,20 @@ -Redis Stack offers native graph capabilities. You can use graphs for highly interconnected data, like relationships between people, organisations, groups, documents or places they have access to and so on. +Redis Stack offers a labeled property graph data structure. The Labeled Property Graph data model is a modern generic NoSQL data model. +Basically, it utilizes the graph mathematical structure to represent and query data. +As a mathematical structure, a graph is a collection of vertices (also called nodes) and _edges_. As a data structure, in a labeled property graph, the graph vertices represent _entities_. +Entities are physical, conceptual, virtual, or fictional particulars or endurants, while the _graph edges_ represent relationships. +Each relationship is basically an association or an interaction between a pair of entities. +Each entity can have a set of labels, for example `Person`, `Police officer`, and `Bank Account`, and each relationship must have a type, for example `owns` or `member of`. +Each node and each relationship can also have a set of properties, where each property is a key-value pair. +For example, you can have a `name` property for a `Person` entity, or a `start date` property for an `owns` relationship. -For our shop, we would like to track which users have bought what so that we can suggest bikes based on the fact that their friends have also bought them. +You can use graphs for highly interconnected data, like relationships between people, organizations, groups, documents, or places they have access to, and so on. -## Creating nodes +Suppose you want to track which bikes users bought so you can suggest them based on the fact that their friends have also bought them. + +## Create nodes + +This query creates a single bike node and sets its properties. -This query will create a single bike node and set its properties ```redis Create a bike node GRAPH.QUERY bikes_graph 'CREATE (b:Bike { Brand:"Velorim", @@ -14,6 +24,8 @@ GRAPH.QUERY bikes_graph 'CREATE (b:Bike { RETURN b' ``` +Now, load more bikes. + ```redis Load more bikes // Let's load some more bikes GRAPH.QUERY bikes_graph 'CREATE (b:Bike { Brand:"Bicyk", Model:"Hillcraft", Price:"1200", Type: "Kids Mountain Bikes" })' @@ -27,6 +39,8 @@ GRAPH.QUERY bikes_graph 'CREATE (b:Bike { Brand:"nHill", Model:"Summit", Price:" GRAPH.QUERY bikes_graph 'CREATE (b:Bike { Brand:"BikeShind", Model:"ThrillCycle", Price:"815", Type: "Commuter Bikes" })' ``` +Let's create some users. + ```redis Create users // Let's create some user nodes GRAPH.QUERY bikes_graph 'CREATE (u:User { Name:"Andrea"})' @@ -36,12 +50,15 @@ GRAPH.QUERY bikes_graph 'CREATE (u:User { Name:"Noah"})' GRAPH.QUERY bikes_graph 'CREATE (u:User { Name:"Mario"})' ``` -## Adding relationships -We model graph data very similarly to how we would describe it in a human language: -- A user makes a transaction -- That transaction contains a bike -We already have User and Bike nodes, we're only missing the Transactions, so let's create them. -We also need to establish the relationships between all the nodes; we do that by matching the existing nodes, saving them in a variable (b, u, t) and using that variable to create the relationships +## Add relationships + +Model graph data: + +- A user makes a transaction. +- That transaction contains a bike. + +You already have `User` and `Bike` nodes. You're only missing transactions. Let's create them. +You also need to establish the relationships between all the nodes. Match the existing nodes, save them in a variable (b, u, t), and use that variable to create the relationships. ```redis Model bike sales GRAPH.QUERY bikes_graph ' @@ -52,13 +69,15 @@ GRAPH.QUERY bikes_graph ' ``` Let's load some more relationships: + ```redis Load more bike sales GRAPH.QUERY bikes_graph 'MATCH (b:Bike { Model: "Hillcraft"}), (u:User {Name: "Alicia"}) CREATE (t:Transaction {Value: 1200 }) CREATE (u)-[r1:MADE]->(t) CREATE (t)-[r2:CONTAINS]->(b)' GRAPH.QUERY bikes_graph 'MATCH (b:Bike { Model: "ThrillCycle"}), (u:User {Name: "Andrea"}) CREATE (t:Transaction {Value: 815 }) CREATE (u)-[r1:MADE]->(t) CREATE (t)-[r2:CONTAINS]->(b)' GRAPH.QUERY bikes_graph 'MATCH (b:Bike { Model: "XBN 2.1 Alloy"}), (u:User {Name: "Mathew"}) CREATE (t:Transaction {Value: 810 }) CREATE (u)-[r1:MADE]->(t) CREATE (t)-[r2:CONTAINS]->(b)' ``` -Let's create a REVIEWED relationship between some users and bikes. The relationship will have a "Stars" property that will show the number of stars that the user assigned to the bike and a "ReviewID" property which will point us to the document that contains the review +Let's create a `REVIEWED` relationship between some users and bikes. The relationship has a `Stars` property that shows the number of stars that the user assigned to the bike and a `ReviewID` property that points you to the document that contains the review. + ```redis Model users reviewing bikes GRAPH.QUERY bikes_graph ' MATCH (u:User {Name: "Noah"}), @@ -70,7 +89,8 @@ GRAPH.QUERY bikes_graph 'MATCH (u:User {Name: "Mathew"}), (b:Bike { Model: "XBN GRAPH.QUERY bikes_graph 'MATCH (u:User {Name: "Mario"}), (b:Bike { Model: "Hillcraft"}) CREATE (u)-[r:REVIEWED {ReviewID: 123, Stars: 3}]->(b)' ``` -Users of our bike shop will be able to follow each other so they can get updates on their recent updates +Users of the bike shop are able to follow each other so they can get updates on their recent activity. + ```redis Users can follow each other GRAPH.QUERY bikes_graph 'MATCH (u1:User {Name: "Andrea"}), (u2:User {Name: "Noah"}) CREATE (u1)-[r:FOLLOWS]->(u2)' GRAPH.QUERY bikes_graph 'MATCH (u1:User {Name: "Andrea"}), (u2:User {Name: "Alicia"}) CREATE (u1)-[r:FOLLOWS]->(u2)' @@ -79,9 +99,10 @@ GRAPH.QUERY bikes_graph 'MATCH (u1:User {Name: "Mathew"}), (u2:User {Name: "Mari GRAPH.QUERY bikes_graph 'MATCH (u1:User {Name: "Mario"}), (u2:User {Name: "Andrea"}) CREATE (u1)-[r:FOLLOWS]->(u2)' ``` -## Utilising the graph for discovering how data is related -When a user is viewing a page of a bike, we can increase the probability of a sale by showing the relationships that exist between the bike and the user, for example, someone our user follows might have bought the bike already, or might have reviewed it. -This is very easy to query with a graph database but very tricky with a relational database. +## Use graph to discover how data is related + +When a user accesses a bike page, you can increase the probability of a sale by showing the relationships that exist between the bike and the user. For example, someone your user follows might have already bought the bike or might have reviewed it. +This is very tricky to query with a relational database, but you can easily query it using a graph database. ```redis Check user's connection with a bike GRAPH.QUERY bikes_graph 'MATCH p=(u:User {Name: "Andrea"})-[r*1..5]->(b:Bike {Model: "Hillcraft"}) return p' @@ -93,4 +114,21 @@ GRAPH.QUERY bikes_graph 'MATCH p=(u1:User {Name: "Andrea"})-[f:FOLLOWS]->(u2:Use ```redis All users who I follow who reviewed this bike with more than 3 stars GRAPH.QUERY bikes_graph 'MATCH p=(u1:User {Name: "Andrea"})-[f:FOLLOWS]->(u2:User)-[r:REVIEWED]->(b:Bike {Model: "Hillcraft"}) WHERE r.Stars>3 return p' -``` \ No newline at end of file +``` + +## Use Bloom to check if username is free + +Wonder how a Bloom filter can be used for your bike shop? For starters, you could keep a Bloom filter that stores all usernames of people who've already registered with your service. That way, when someone creates a new account, you can very quickly check if that username is free. If the answer is yes, you still have to go and check the main database for the precise result. But, if the answer is no, you can skip that call and continue with the registration. + +Another, perhaps more interesting example, is showing better and more relevant ads to users. You can keep a Bloom filter per user with all the products they bought from the shop, and when you get a list of products from your suggestion engine, you can check it against this filter. + +```redis Add all bought product ids in the Bloom filter +BF.MADD user:778:bought_products 4545667 9026875 3178945 4848754 1242449 +``` + +Just before you try to show an ad to a user, you can first check if that product id is already in their `bought products` Bloom filter. If the answer is yes, you can choose to check the main database, or you might skip to the next recommendation from your list. But if the answer is no, then you know for sure that your user did not buy that product: + +```redis Has a user bought this product? +BF.EXISTS user:778:bought_products 1234567 // No, the user has not bought this product +BF.EXISTS user:778:bought_products 3178945 // The user might have bought this product +``` diff --git a/src/redis_stack/working_with_path_algorithms.md b/src/redis_stack/working_with_path_algorithms.md new file mode 100644 index 0000000..99127c1 --- /dev/null +++ b/src/redis_stack/working_with_path_algorithms.md @@ -0,0 +1,193 @@ +`algo.SPpaths` and `algo.SSpaths` can solve a wide range of real-world problems, where minimum-weight paths need to be found. `algo.SPpaths` finds paths between a given pair of nodes, while `algo.SSpaths` finds paths from a given source node. Weight can represent time, distance, price, or any other measurement. A bound can be set on another property (e.g., finding a minimum-time bounded-price way to reach from point A to point B). Both algorithms are performant and have low memory requirements. + +For both algorithms, you can set: + +* A list of relationship types to traverse (relTypes). + +* The relationships' property whose sum you want to minimize (weight). + +* An optional relationships' property whose sum you want to bound (cost) and the optional bound (maxCost). + +* An optional bound on the path length - the number of relationships along the path (maxLen). + +The number of paths you want to retrieve: either all minimal-weight paths (pathCount is 0), a single minimal-weight path (pathCount is 1), or n minimal-weight paths with potentially different weights (pathCount is n). + +This tutorial shows you how to use these algorithms. + + +## Step 1: Create graph + +Analyze the query to create a graph. Note how nodes and relationships are created: + +```redis Create a graph +GRAPH.QUERY city_graph "CREATE + (a:City{Name:'A'}), + (b:City{Name:'B'}), + (c:City{Name:'C'}), + (d:City{Name:'D'}), + (e:City{Name:'E'}), + (f:City{Name:'F'}), + (g:City{Name:'G'}), + (a)-[r1:ROAD_TO{Time:4, Dist:3}]->(b), + (a)-[r2:ROAD_TO{Time:3, Dist:8}]->(c), + (a)-[r3:ROAD_TO{Time:4, Dist:2}]->(d), + (b)-[r4:ROAD_TO{Time:5, Dist:7}]->(e), + (b)-[r5:ROAD_TO{Time:5, Dist:5}]->(d), + (d)-[r6:ROAD_TO{Time:4, Dist:5}]->(e), + (c)-[r7:ROAD_TO{Time:3, Dist:6}]->(f), + (d)-[r8:ROAD_TO{Time:1, Dist:4}]->(c), + (d)-[r9:ROAD_TO{Time:2, Dist:12}]->(f), + (e)-[r10:ROAD_TO{Time:5, Dist:5}]->(g), + (f)-[r11:ROAD_TO{Time:4, Dist:2}]->(g) + return a,b,c,d,e,f,g,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11" +``` + +## Fastest path between A and G + +What is the fastest path (in minutes) from A to G? + +```redis Find fastest path +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}),(g:City{Name:'G'}) + CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['ROAD_TO'], + weightProp: 'Time'}) + YIELD path, pathWeight + RETURN pathWeight, [n in nodes(path) | n.Name] as pathNodes" +``` + +## Find all shortest paths between A and G + +What are all the shortest paths (in kilometers) from A to G? + +```redis Find all shortest paths +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}),(g:City{Name:'G'}) + CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['ROAD_TO'], + pathCount: 0, + weightProp: 'Dist'}) + YIELD path, pathWeight + RETURN pathWeight, [n in nodes(path) | n.Name] as pathNodes" +``` + +By specifying `pathCount:0` you're asking for all minimal-weight paths instead of only one, which is the default. + +## Find N shortest paths between A and G + +What are the five shortest paths (in kilometres) from A to G? + +```redis Find N shortest paths +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}),(g:City{Name:'G'}) + CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['ROAD_TO'], + pathCount: 5, + weightProp: 'dist'}) + YIELD path, pathWeight + RETURN pathWeight, [n in nodes(path) | n.Name] + ORDER BY pathWeight" +``` + +## Find N time-bounded shortest paths from A to G + +Which two shortest paths (in kilometers) can you take from A to G, where you can reach G in up to 12 minutes? + +For queries where you need to find the shortest paths bounded by one of the relationships' properties, you can use the `costProp` input argument: + +```redis Find N time-bounded shortest paths from A to G +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}),(g:City{Name:'G'}) + CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['ROAD_TO'], + pathCount: 2, + weightProp: 'Dist', + costProp: 'Time', + maxCost: 12}) + YIELD path, pathWeight, pathCost + RETURN pathWeight, pathCost, [n in nodes(path) | n.Name] + ORDER BY pathWeight" +``` + +## Find paths that revert or ignore the relationship direction + +What paths with lengths of up to 4 can you take from D to G, assuming you can traverse each road in both directions? + +```redis Find paths with specific lengths +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'D'}),(g:City{Name:'G'}) + CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['ROAD_TO'], + relDirection: 'both', + pathCount: 1000, + weightProp: 'Dist', + maxLen: 4} ) + YIELD path, pathWeight + RETURN pathWeight, [n in nodes(path) | n.Name] as pathNodes ORDER BY pathWeight" +``` + +In the query above, you specified `maxLen: 4`, limiting the number of hops between nodes. + +## Find time-bounded possible shortest paths from A + +Which paths can you take from A if you limit the trip to 8 minutes? + +```redis Find time-bounded shortest paths from A +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}) + CALL algo.SSpaths({ + sourceNode: a, + relTypes: ['ROAD_TO'], + pathCount: 1000, + costProp: 'Time', + maxCost: 8}) + YIELD path, pathCost + RETURN pathCost, [n in nodes(path) | n.Name] as pathNodes ORDER BY pathCost" +``` + +## Find possible shortest paths from a node + +What five shortest paths (in kilometers) can you take from A? + +```redis Find shortest paths from a node +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}) + CALL algo.SSpaths({ + sourceNode: a, + relTypes: ['ROAD_TO'], + pathCount: 5, + weightProp: 'Dist', + costProp: 'Cost'} ) + YIELD path, pathWeight, pathCost + RETURN pathWeight, pathCost, [n in nodes(path) | n.Name] as pathNodes + ORDER BY pathWeight" +``` + +## Find N time-bound shortest paths from A + +What five shortest paths (in kilometers) can you take from A if you limit the trip to 6 minutes? + +```redis Find N time-bound shortest paths +GRAPH.QUERY city_graph "MATCH + (a:City{Name:'A'}) + CALL algo.SSpaths({ + sourceNode: a, + relTypes: ['ROAD_TO'], + pathCount: 5, + weightProp: 'Dist', + costProp: 'Time', + maxCost: 6}) + YIELD path, pathWeight, pathCost + RETURN pathWeight, pathCost, [n in nodes(path) | n.Name] as pathNodes ORDER BY pathWeight" +``` + diff --git a/src/redis_stack/working_with_tdigest.md b/src/redis_stack/working_with_tdigest.md new file mode 100644 index 0000000..b3cd084 --- /dev/null +++ b/src/redis_stack/working_with_tdigest.md @@ -0,0 +1,285 @@ + +t-digest is a probabilistic data structure that can be used to answer the following questions: + +* Which fraction of the values in the data stream are smaller than a given value? + +* How many values in the data stream are smaller than a given value? + +* Which value is smaller than p percent of the values in the data stream? (what is the p-percentile value)? + +* What is the mean value between the p1-percentile value and the p2-percentile value? + +* What is the value of the nth smallest / largest value in the data stream? (what is the value with [reverse] rank n)? + +Using t-digest in Redis is simple and straightforward. Follow these examples to learn how. + +## Create a sketch and add observations + +You can simply create a t-digest with `TDIGEST.CREATE` and add observations with `TDIGEST.ADD`. + +`TDIGEST.CREATE key [COMPRESSION compression]` initializes a new t-digest sketch (and error if such key already exists). You use the `COMPRESSION` argument to specify the tradeoff between accuracy and memory consumption. The default is 100. Higher values mean more accuracy. + +`TDIGEST.ADD key value...` adds a new floating-point value (observation) to the sketch. + +Let’s create a digest named t with compression 1000 (very accurate) and add 15 observations. + +```redis Create a digest +TDIGEST.CREATE t COMPRESSION 1000 +TDIGEST.ADD t 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 +``` + +You can repeat calling `TDIGEST.ADD` whenever new observations are available. + +## Estimate fractions or ranks by values + +Use TDIGEST.CDF value... to retrieve, for each input value, an estimation of the fraction of (observations smaller than the given value + half the observations equal to the given value). + +```redis Estimate fractions +TDIGEST.CDF t 0 1 2 3 4 5 6 +1) "0" +2) "0.033333333333333333" +3) "0.13333333333333333" +4) "0.29999999999999999" +5) "0.53333333333333333" +6) "0.83333333333333337" +7) "1" +``` + +As you can see, all the estimations in this simple example are accurate. + +`TDIGEST.RANK key value...` is similar to `TDIGEST.CDF`, but used for estimating the number of observations instead of the fraction of observations. More accurately, it returns, for each input value, an estimation of the number of observations smaller than a given value + half the observations equal to the given value. + +`-1` is returned when value is smaller than the value of the smallest observation + +The number of observations is returned when the value is larger than the value of the largest observation. +Otherwise, an estimation of the number of observations smaller than a given value + half the observations equal to the given value is returned. +The estimations are rounded to the nearest integer before being reported. + +``` redis Use TDIGEST.RANK +TDIGEST.RANK t 0 1 2 3 4 5 6 +1) "-1" +2) "1" +3) "2" +4) "5" +5) "8" +6) "13" +7) "15" +``` + +Again, all estimations in this example are accurate. + +And lastly, `TDIGEST.REVRANK key value...` is similar to `TDIGEST.RANK` but returns, for each input value, an estimation of the number of observations larger than a given value + half the observations equal to the given value. + +`-1` is returned when value is larger than the value of the largest observation. + +The number of observations is returned when the value is smaller than the value of the smallest observation. +Otherwise, an estimation of the number of observations larger than a given value + half the observations equal to the given value is returned. +The estimations are rounded to the nearest integer before being reported. + +```redis Use TDIGEST.REVRANK +TDIGEST.REVRANK t 0 1 2 3 4 5 6 +1) "15" +2) "14" +3) "13" +4) "10" +5) "7" +6) "2" +7) "-1" +``` + +`TDIGEST.RANK(v) + TDIGEST.REVRANK(v)` for any `v` between the minimum and the maximum observation is equal to the number of observations. + +## Estimate values by fractions or ranks + +`TDIGEST.QUANTILE key fraction...` returns, for each input fraction, an estimation of the value (floating point) that is smaller than the given fraction of observations. + +```redis Use TDIGEST.QUANTILE +TDIGEST.QUANTILE t 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 + 1) "1" + 2) "2" + 3) "3" + 4) "3" + 5) "4" + 6) "4" + 7) "4" + 8) "5" + 9) "5" +10) "5" +11) "5" +``` + +`TDIGEST.BYRANK key rank...` returns, for each input rank, an estimation of the value (floating point) with that rank. + +Let's use `n` to denote the number of observations added to the sketch. `TDIGEST.BYRANK` returns: + +* An accurate result when rank is `0` (the value of the smallest observation). + +* An accurate result when rank is `n-1` (the value of the largest observation). + +* `inf` when rank is equal to `n` or larger than `n`. + +```redis Use TDIGEST.BYRANK +TDIGEST.BYRANK t 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + 1) "1" + 2) "2" + 3) "2" + 4) "3" + 5) "3" + 6) "3" + 7) "4" + 8) "4" + 9) "4" +10) "4" +11) "5" +12) "5" +13) "5" +14) "5" +15) "5" +16) "inf" +``` + +`TDIGEST.BYREVRANK key rank...` returns, for each input reverse rank, an estimation of the value (floating point) with that reverse rank. + +Let's use `n` to denote the number of observations added to the sketch. `TDIGEST.BYREVRANK` returns: + +* An accurate result when rank is `0` (the value of the largest observation). + +* An accurate result when rank is `n-1` (the value of the smallest observation). + +* -inf when rank is equal to `n` or larger than `n`. + +```redis Use TDIGEST.BYREVRANK +TDIGEST.BYREVRANK t 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + 1) "5" + 2) "5" + 3) "5" + 4) "5" + 5) "5" + 6) "4" + 7) "4" + 8) "4" + 9) "4" +10) "3" +11) "3" +12) "3" +13) "2" +14) "2" +15) "1" +16) "-inf" +``` + +## Estimate trimmed mean + +`TDIGEST.TRIMMED_MEAN key lowFraction highFraction` estimate the mean value between the specified fractions. + +This is especially useful for calculating the average value ignoring outliers, for example, the average value between the 20th percentile and the 80th percentile. + +```redis Estimate trimmed mean +TDIGEST.TRIMMED_MEAN t 0.2 0.8 +"3.8888888888888888" +TDIGEST.TRIMMED_MEAN t 0.1 0.9 +"3.7692307692307692" +TDIGEST.TRIMMED_MEAN t 0 1 +"3.6666666666666665" +``` + +## Merge sketches + +Sometimes it is useful to merge sketches. +Suppose you measure latencies for 3 servers, and you want to calculate the 90%, 95%, and 99% latencies for all the servers combined. + +`TDIGEST.MERGE destKey numKeys sourceKey... [COMPRESSION compression] [OVERRIDE]` merges multiple sketches into a single sketch. + +If `destKey` does not exist, a new sketch is created. + +If `destKey` is an existing sketch, its values are merged with the values of the source keys. To override the destination key contents, use `OVERRIDE`. + +When `COMPRESSION` is not specified: + +* If `destKey` does not exist or if `OVERRIDE` is specified, the compression is set to the maximum value among all source sketches. + +* If `destKey` already exists and `OVERRIDE` is not specified, its compression is not changed. + +```redis Merge sketches +TDIGEST.CREATE s1 +TDIGEST.ADD s1 1 2 3 4 5 +TDIGEST.CREATE s2 +TDIGEST.ADD s2 6 7 8 9 10 +TDIGEST.MERGE sM 2 s1 s2 +TDIGEST.BYRANK sM 0 1 2 3 4 5 6 7 8 9 10 + 1) "1" + 2) "2" + 3) "3" + 4) "4" + 5) "5" + 6) "6" + 7) "7" + 8) "8" + 9) "9" +10) "10" +11) "inf" +``` + +## Retrieve sketch information + +Use `TDIGEST.MIN` and `TDIGEST.MAX` to retrieve the minimum and maximum values in the sketch, respectively. +Both return `nan` when the sketch is empty. + +```redis Retrieve sketch info +TDIGEST.MIN t +"1" +TDIGEST.MAX t +"5" +``` + +Both commands return accurate results and are equivalent to `TDIGEST.BYRANK key 0` and `TDIGEST.BYREVRANK key 0`, respectively. + +Use `TDIGEST.INFO` to retrieve additional information about the t-digest. + +```redis Retrieve additional info +TDIGEST.INFO t + 1) Compression + 2) (integer) 1000 + 3) Capacity + 4) (integer) 6010 + 5) Merged nodes + 6) (integer) 15 + 7) Unmerged nodes + 8) (integer) 0 + 9) Merged weight +10) "15" +11) Unmerged weight +12) "0" +13) Observations +14) "15" +15) Total compressions +16) (integer) 1 +17) Memory usage +18) (integer) 96168 +``` + +The following values are reported: + +* `Compression` - The compression (controllable trade-off between accuracy and memory consumption) of the sketch, as set in `TDIGEST.CREATE` or `TDIGEST.MERGE`. + +* `Observations` - Number of observations added to the sketch. + +* `Memory usage` - Number of bytes allocated for the sketch. + +In addition, the following _internals_ are reported: + +* `Capacity` - Size of the buffer used for storing the centroids and for the incoming unmerged observations. + +* `Merged nodes` - Number of merged observations. + +* `Unmerged nodes` - Number of buffered nodes (uncompressed observations). + +* `Merged weight` - Weight of values of the merged nodes. + +* `Unmerged weight` - Weight of values of the unmerged nodes (uncompressed observations). + +* `Total compressions` - Number of times this sketch compressed data together. + +## Reset a sketch + +`TDIGEST.RESET` key empties the sketch and re-initializes it. \ No newline at end of file diff --git a/src/tutorials.json b/src/tutorials.json index 9c038e9..ff08997 100644 --- a/src/tutorials.json +++ b/src/tutorials.json @@ -48,12 +48,28 @@ "path": "/redis_stack/working_with_graphs.md" } }, + "algorithms": { + "type": "internal-link", + "id": "working_with_path_algorithms", + "label": "Working with path algorithms", + "args": { + "path": "/redis_stack/working_with_path_algorithms.md" + } + }, "bloom": { "type": "internal-link", - "id": "probabilistic_data_structures", - "label": "Probabilistic data structures", + "id": "working_with_bloom_filter", + "label": "Working with Bloom filter", + "args": { + "path": "/redis_stack/working_with_bloom_filter.md" + } + }, + "tdigest": { + "type": "internal-link", + "id": "working_with_tdigest", + "label": "Working with t-digest", "args": { - "path": "/redis_stack/probabilistic_data_structures.md" + "path": "/redis_stack/working_with_tdigest.md" } } }