You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by le...@apache.org on 2022/07/29 20:20:59 UTC

[datasketches-website] branch master updated: Updated tutorial

This is an automated email from the ASF dual-hosted git repository.

leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git


The following commit(s) were added to refs/heads/master by this push:
     new cda31263 Updated tutorial
cda31263 is described below

commit cda3126302e0063cfded2c30c7e428453a1cbec8
Author: Lee Rhodes <le...@users.noreply.github.com>
AuthorDate: Fri Jul 29 13:20:40 2022 -0700

    Updated tutorial
---
 .../SketchingQuantilesAndRanksTutorial.md          | 133 ++++++++++++++-------
 1 file changed, 88 insertions(+), 45 deletions(-)

diff --git a/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md b/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
index 648bb618..cc515b04 100644
--- a/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
+++ b/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
@@ -57,7 +57,7 @@ To wit:
 * A quartile is a quantile where the rank domain is divided into forths. For example, "An SAT Math score of 600 is at the third quartile (rank = 0.75).
 * The median is a quantile that splits the rank domain in half. For example, "An SAT Math score of 520 is at the median (rank = 0.5).
 
-## The quantile and rank functions
+## The simple quantile and rank functions
 Let's examine the following table:
 
 | Quantile:       | 10 | 20 | 30 | 40 | 50 |
@@ -65,11 +65,11 @@ Let's examine the following table:
 | Natural Rank    | 1  | 2  | 3  | 4  | 5  |
 | Normalized Rank | .2 | .4 | .6 | .8 | 1.0|
 
-Let's define the functions
+Let's define the simple functions
 
-### ***quantile(rank)*** or ***q(r)*** := return the quantile value ***q*** associated with<br> a given ***rank, r***.
+### ***quantile(rank)*** or ***q(r)*** := return the quantile value ***q*** associated with a given ***rank, r***.
 
-### ***rank(quantile)*** or ***r(q)*** := return the rank ***r*** associated with<br> a given ***quantile, q***.  
+### ***rank(quantile)*** or ***r(q)*** := return the rank ***r*** associated with a given ***quantile, q***.  
 
 Using an example from the table:
 
@@ -128,88 +128,131 @@ One can find examples of the following definitions in the research literature.
 
 These next examples use a small data set that mimics what could be the result of both duplication and sketch data deletion.
 
-## Two search conventions used when finding ranks, r(q)
+## The rank functions with inequalities
 
-### The ***non inclusive*** criterion for ***r(q)*** (a.k.a. the ***LT*** criterion):
+### ***rank(quantile, NON_INCLUSIVE)*** or ***r(q, LT)*** :=<br>Given *q*, return the rank, *r*, of the largest quantile that is strictly *Less Than* *q*.  
 
-<b>Definition:</b>
-Given *q*, return the rank, *r*, of the largest quantile that is strictly less than *q*.
 
 <b>Implementation:</b>
 Given *q*, search the quantile array until we find the adjacent pair *{q1, q2}* where *q1 < q <= q2*. Return the rank, *r*, associated with *q1*, the first of the pair.
 
-<b>NOTES:</b>
+<b>Boundary Notes:</b>
 
 * If the given *q* is larger than the largest quantile retained by the sketch, the sketch will return the rank of the largest retained quantile.
 * If the given *q* is smaller than the smallest quantile retained by the sketch, the sketch will return a rank of zero.
 
-For example *q = 30; r(30) = 5*
+<b>Examples using normalized ranks:</b>
 
-| Quantile[]:     | 10    | 20    | q1=20 | q2=30 | 30    | 30    | 40    | 50    |
-|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
-| Natural Rank[]: | 1     | 3     | r=5   |  7    | 9     | 11    | 13    | 14    |
+* *r(55) = 1.0* 
+* *r(5) = 0.0*
+* *r(30) = .357* (Illustrated in table)
 
+| Quantile[]:        | 10    | 20    | 20    | 30    | 30    | 30    | 40    | 50     |
+|--------------------|-------|-------|-------|-------|-------|-------|-------|--------|
+| Natural Rank[]:    | 1     | 3     | 5     | 7     | 9     | 11    | 13    | 14     |
+| Normalized Rank[]: | .071  | .214  | .357  | .500  | .643  | .786  | .929  | 1.000  |
+| Quantile input     |       |       |       |  30   | 30    | 30    |       |        |
+| Qualifying pair    |       |       |       |       |       |  q1   | q2    |        |
+| Rank result        |       |       |       |       |       | .786  |       |        |
 
-### The ***inclusive*** criterion for ***r(q)*** (a.k.a. the ***LE*** criterion):
+--------
 
-<b>Definition:</b>
-Given *q*, return the rank, *r*, of the largest quantile that is less than or equal to *q*.
+### ***rank(quantile, INCLUSIVE)*** or ***r(q, LE)*** :=<br>Given *q*, return the rank, *r*, of the largest quantile that is less than or equal to *q*.
 
 <b>Implementation:</b>
 Given *q*, search the quantile array until we find the adjacent pair *{q1, q2}* where *q1 <= q < q2*. Return the rank, *r*, associated with *q1*, the first of the pair. 
 
-<b>NOTES:</b>
+<b>Boundary Notes:</b>
 
-* If the given *q* is larger than the largest quantile retained by the sketch, the sketch will return the rank of the largest retained quantile.
-* If the given *q* is smaller than the smallest quantile retained by the sketch, the sketch will return a rank of zero.
+* If the given *q* is larger than the largest quantile retained by the sketch, the function will return the rank of the largest retained quantile.
+* If the given *q* is smaller than the smallest quantile retained by the sketch, the function will return a rank of zero.
 
+<b>Examples using normalized ranks:</b>
 
-For example *q = 30; r(30) = 11*
+* *r(55) = 1.0*
+* *r(5) = 0.0*
+* *r(30) = .786* (Illustrated in table)
 
-| Quantile[]:     | 10    | 20    | 20    | 30    | 30    | q1=30 | q2=40 | 50    |
-|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
-| Natural Rank[]: | 1     | 3     | 5     |  7    | 9     | r=11  | 13    | 14    |
+| Quantile[]:        | 10    | 20    | 20    | 30    | 30    | 30    | 40    | 50     |
+|--------------------|-------|-------|-------|-------|-------|-------|-------|--------|
+| Natural Rank[]:    | 1     | 3     | 5     | 7     | 9     | 11    | 13    | 14     |
+| Normalized Rank[]: | .071  | .214  | .357  | .500  | .643  | .786  | .929  | 1.000  |
+| Quantile input     |       |       |       |  30   | 30    | 30    |       |        |
+| Qualifying pair    |       |       |       |       |       |  q1   | q2    |        |
+| Rank result        |       |       |       |       |       | .786  |       |        |
 
 
-## Two search conventions when finding quantiles, q(r)
+## The quantile functions with inequalities
 
-### The ***non inclusive*** criterion for ***q(r)*** (a.k.a. the ***GT*** criterion):
-
-<b>Definition:</b>
-Given *r*, return the quantile of the smallest rank that is strictly greater than *r*.
+### ***quantile(rank, NON_INCLUSIVE)*** or ***q(r, GT)*** :=<br>Given *r*, return the quantile, *q*, of the smallest rank that is strictly Greater Than *r*.
 
 <b>Implementation:</b>
 Given *r*, search the rank array until we find the adjacent pair *{r1, r2}* where *r1 <= r < r2*. Return the quantile associated with *r2*, the second of the pair.
 
-<b>NOTES:</b>
+<b>Boundary Notes:</b>
+
+* If the given normalized rank, *r*, is equal to 1.0, there is no quantile that satisfies this criterion. However, for convenience, the function will return the largest quantile retained by the sketch.
+* If the given normalized rank, *r*, is less than the smallest rank, the function will return the smallest quantile.
+
+<b>Examples using normalized ranks:</b>
+
+* *q(1.0) = 50*
+* *q(0.0) = 10*
+* *q(.357) = 30* (Illustrated in table)
+
+| Quantile[]:        | 10    | 20    | 20    | 30    | 30    | 30    | 40    | 50     |
+|--------------------|-------|-------|-------|-------|-------|-------|-------|--------|
+| Natural Rank[]:    | 1     | 3     | 5     | 7     | 9     | 11    | 13    | 14     |
+| Normalized Rank[]: | .071  | .214  | .357  | .500  | .643  | .786  | .929  | 1.000  |
+| Rank input         |       |       | .357  |       |       |       |       |        |
+| Qualifying pair    |       |       |  r1   | r2    |       |       |       |        |
+| Quantile result    |       |       |       |  30   |       |       |       |        |
+
+--------
 
-* If the given normalized rank, *r*, is equal to 1.0, there is no quantile that satisfies this criterion. This function may choose to return either a *NaN* value, or return the largest quantile retained by the sketch.
+### ***quantile(rank, NON_INCLUSIVE_STRICT)*** or ***q(r, GT_STRICT)*** :=<br>Given *r*, return the quantile, *q*, of the smallest rank that is strictly Greater Than *r*.
 
-For example *r = 5; q(5) = 30*
+In <b>STRICT</b> mode, the only difference is the following:
 
-| Natural Rank[]: | 1     | 3     | r1=5  |  r2=7 | 9     | 11    | 13    | 14    |
-|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
-| Quantile[]:     | 10    | 20    | 20    | q=30  | 30    | 30    | 40    | 50    |
+<b>Boundary Notes:</b>
 
-### The ***inclusive*** criterion for ***q(r)***  (a.k.a. the ***GE*** criterion):
+* If the given normalized rank, *r*, is equal to 1.0, there is no quantile that satisfies this criterion. The function will return *NaN*.
 
-<b>Definition:</b>
-Given *r*, return the quantile of the smallest rank that is strictly greater than or equal to *r*.
+
+--------
+
+### ***quantile(rank, INCLUSIVE)*** or ***q(r, GE)*** :=<br>Given *r*, return the quantile, *q*, of the smallest rank that is strictly Greater than or Equal to *r*.
 
 <b>Implementation:</b>
-Given *r*, search the rank array until we find the adjacent pair *{r1, r2}* where *r1 < r <= r2*. Return the quantile associated with *r2*, the second of the pair.
+Given *r*, search the rank array until we find the adjacent pair *{r1, r2}* where *r1 < r <= r2*. Return the quantile, *q*, associated with *r2*, the second of the pair.
+
+<b>Boundary Notes:</b>
+
+* If the given normalized rank, *r*, is equal to 1.0, the function will return the largest quantile retained by the sketch.
+* If the given normalized rank, *r*, is less than the smallest rank, the function will return the smallest quantile.
+
+<b>Examples using normalized ranks:</b>
+
+For example *q(.786) = 30*
+
+| Quantile[]:        | 10    | 20    | 20    | 30    | 30    | 30    | 40    | 50     |
+|--------------------|-------|-------|-------|-------|-------|-------|-------|--------|
+| Natural Rank[]:    | 1     | 3     | 5     | 7     | 9     | 11    | 13    | 14     |
+| Normalized Rank[]: | .071  | .214  | .357  | .500  | .643  | .786  | .929  | 1.000  |
+| Rank input         |       |       |       |       |       | .786  |       |        |
+| Qualifying pair    |       |       |       |       |   r1  | r2    |       |        |
+| Quantile result    |       |       |       |       |       | 30    |       |        |
+
 
-For example *q(11) = 30*
+## These inequality functions maintain the 1:1 functional relationship
 
-| Natural Rank[]: | 1     | 3     | 5     |  7    | r1=9  | r2=11 | 13    | 14    |
-|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
-| Quantile[]:     | 10    | 20    | 20    | 30    | 30    | q=30  | 40    | 50    |
+### The non inclusive search for q(r) is the inverse of the non inclusive search for r(q). 
 
-## These conventions maintain the 1:1 functional relationship
+##### Therefore, *q = q(r(q))* and *r = r(q(r))*.
 
-### The non inclusive search for q(r) is the inverse of the non inclusive search for r(q). Therefore, *q = q(r(q))* and *r = r(q(r))*.
+### The inclusive search for q(r) is the inverse of the inclusive search for r(q). 
 
-### The inclusive search for q(r) is the inverse of the inclusive search for r(q). Therefore, *q = q(r(q))* and *r = r(q(r))*.
+##### Therefore, *q = q(r(q))* and *r = r(q(r))*.
 
 
 ## Summary


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org