You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by al...@apache.org on 2019/07/03 00:33:14 UTC

[incubator-datasketches-postgresql] branch master updated: KLL and Frequent Strings merge examples

This is an automated email from the ASF dual-hosted git repository.

alsay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-datasketches-postgresql.git


The following commit(s) were added to refs/heads/master by this push:
     new 524521b  KLL and Frequent Strings merge examples
524521b is described below

commit 524521b8f3a61ade76454debcabbc89ab1a8532d
Author: AlexanderSaydakov <Al...@users.noreply.github.com>
AuthorDate: Tue Jul 2 17:32:57 2019 -0700

    KLL and Frequent Strings merge examples
---
 README.md | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 9ada639..5c70b5e 100644
--- a/README.md
+++ b/README.md
@@ -146,8 +146,9 @@ Non-aggregate union:
 <h2>Estimating quanitles, ranks and histograms with KLL sketch</h2>
 
 Table "normal" has 1 million values from the normal (Gaussian) distribution with mean=0 and stddev=1.
-We can build a sketch, which represents the distribution (create table kll\_float\_sketch\_test(sketch kll\_float\_sketch)):
+We can build a sketch, which represents the distribution:
 
+	create table kll_float_sketch_test(sketch kll_float_sketch);
 	$ psql test -c "insert into kll_float_sketch_test select kll_float_sketch_build(value) from normal"
 	INSERT 0 1
 
@@ -190,6 +191,15 @@ In this simple example we know the value of N since we constructed this sketch,
 
 Note that the normal distribution was used just to show the basic usage. The sketch does not make any assumptions about the distribution.
 
+Let's create two more sketches to show merging kll_float_sketch:
+
+	insert into kll_float_sketch_test select kll_float_sketch_build(value) from normal;
+	insert into kll_float_sketch_test select kll_float_sketch_build(value) from normal;
+	select kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5) from kll_float_sketch_test;
+	 kll_float_sketch_get_quantile
+	-------------------------------
+	                    0.00332207
+
 <h2>Frequent strings</h2>
 
 Consider a numeric Zipfian distribution with parameter alpha=1.1 (high skew)
@@ -248,4 +258,26 @@ Here is an equivalent exact computation:
 	real	0m18.362s
 
 In this particular case the exact computation happens to be faster. This is
-just to show the basic usage. Most importantly, the sketch can be used as an "additive" metric in a data cube, and can be easily merged across dimensions.
\ No newline at end of file
+just to show the basic usage. Most importantly, the sketch can be used as an "additive" metric in a data cube, and can be easily merged across dimensions.
+
+Merging frequent_strings_sketch:
+
+	create table frequent_strings_sketch_test(sketch frequent_strings_sketch);
+	insert into frequent_strings_sketch_test select frequent_strings_sketch_build(9, value) from zipf_1p1_8k_100m;
+	insert into frequent_strings_sketch_test select frequent_strings_sketch_build(9, value) from zipf_1p1_8k_100m;
+	insert into frequent_strings_sketch_test select frequent_strings_sketch_build(9, value) from zipf_1p1_8k_100m;
+	select frequent_strings_sketch_result_no_false_negatives(frequent_strings_sketch_merge(9, sketch), 3000000) from frequent_strings_sketch_test;
+	 frequent_strings_sketch_result_no_false_negatives
+	---------------------------------------------------
+	 (1,45986859,45627006,45986859)
+	 (2,21468195,21108342,21468195)
+	 (3,13735083,13375230,13735083)
+	 (4,10004424,9644571,10004424)
+	 (5,7825689,7465836,7825689)
+	 (6,6407145,6047292,6407145)
+	 (7,5405883,5046030,5405883)
+	 (8,4672299,4312446,4672299)
+	 (9,4105338,3745485,4105338)
+	 (10,3649596,3289743,3649596)
+	 (11,3294912,2935059,3294912)
+	(11 rows)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org