You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by fj...@apache.org on 2019/04/03 22:21:39 UTC

[incubator-druid] branch master updated: Update theta/hll sketch doc comparison (#7407)

This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git


The following commit(s) were added to refs/heads/master by this push:
     new 0f6cb1e  Update theta/hll sketch doc comparison (#7407)
0f6cb1e is described below

commit 0f6cb1e7e032081a569bbcf0c89220f0e6b53472
Author: Jonathan Wei <jo...@users.noreply.github.com>
AuthorDate: Wed Apr 3 15:21:33 2019 -0700

    Update theta/hll sketch doc comparison (#7407)
---
 docs/content/querying/aggregations.md | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/docs/content/querying/aggregations.md b/docs/content/querying/aggregations.md
index eef4c68..2e253fe 100644
--- a/docs/content/querying/aggregations.md
+++ b/docs/content/querying/aggregations.md
@@ -275,19 +275,28 @@ The [DataSketches Theta Sketch](../development/extensions-core/datasketches-thet
 
 #### DataSketches HLL Sketch
 
-The [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.html) extension-provided aggregator gives distinct count estimates using the HyperLogLog algorithm. The HLL Sketch is faster and requires less storage than the Theta Sketch, but does not support intersection or difference operations.
+The [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.html) extension-provided aggregator gives distinct count estimates using the HyperLogLog algorithm.
+
+Compared to the Theta sketch, the HLL sketch does not support set operations and has slightly slower update and merge speed, but requires significantly less space.
 
 #### Cardinality/HyperUnique (Deprecated)
 
 <div class="note caution">
-The Cardinality and HyperUnique aggregators are deprecated. Please use <a href="../development/extensions-core/datasketches-hll.html">DataSketches HLL Sketch</a> instead.
+The Cardinality and HyperUnique aggregators are deprecated. Please use <a href="../development/extensions-core/datasketches-theta.html">DataSketches Theta Sketch</a> or <a href="../development/extensions-core/datasketches-hll.html">DataSketches HLL Sketch</a> instead.
 </div>
 
-The [Cardinality and HyperUnique](../querying/hll-old.html) aggregators are older aggregator implementations available by default in Druid that also provide distinct count estimates using the HyperLogLog algorithm. The newer [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.html) extension-provided aggregator has superior accuracy and performance and is recommended instead. 
+The [Cardinality and HyperUnique](../querying/hll-old.html) aggregators are older aggregator implementations available by default in Druid that also provide distinct count estimates using the HyperLogLog algorithm. The newer DataSketches Theta and HLL extension-provided aggregators described above have superior accuracy and performance and are recommended instead. 
 
 The DataSketches team has published a [comparison study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we have deprecated Druid's original HLL aggregator.
 
-Please note that DataSketches HLL aggregators and `hyperUnique` aggregators are not mutually compatible.
+Please note that `hyperUnique` aggregators are not mutually compatible with Datasketches HLL or Theta sketches.
+
+##### Multi-column handling
+
+Note the DataSketches Theta and HLL aggregators currently only support single-column inputs. If you were previously using the Cardinality aggregator with multiple-column inputs, equivalent operations using Theta or HLL sketches are described below:
+
+* Multi-column `byValue` Cardinality can be replaced with a union of Theta sketches on the individual input columns
+* Multi-column `byRow` Cardinality can be replaced with a Theta or HLL sketch on a single [virtual column]((../querying/virtual-columns.html) that combines the individual input columns.
 
 ### Histograms and quantiles
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org