You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by fr...@apache.org on 2022/09/27 02:29:45 UTC

[druid] branch master updated: Add a note to the documentation about pre-built HLLSketches (#13088)

This is an automated email from the ASF dual-hosted git repository.

frankchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
     new 0d7bf66578 Add a note to the documentation about pre-built HLLSketches (#13088)
0d7bf66578 is described below

commit 0d7bf66578479afd920ba96db4c44a01e912833b
Author: David Palmer <cl...@users.noreply.github.com>
AuthorDate: Tue Sep 27 15:29:39 2022 +1300

    Add a note to the documentation about pre-built HLLSketches (#13088)
    
    * add a note to the documentation about pre-built HLLSketches
    
    Druid actually supports ingesting a pre-generated sketch column by using
    the HLLSketchMerge aggregator. However, this functionality was
    previously not made clear in the documentation.
    
    * copyedit from the King's English to American English
    
    * add suggested style changes
    
    Co-authored-by: Charles Smith <te...@gmail.com>
    
    Co-authored-by: Charles Smith <te...@gmail.com>
---
 docs/development/extensions-core/datasketches-hll.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/development/extensions-core/datasketches-hll.md b/docs/development/extensions-core/datasketches-hll.md
index 334cff16d6..07cc7da8b2 100644
--- a/docs/development/extensions-core/datasketches-hll.md
+++ b/docs/development/extensions-core/datasketches-hll.md
@@ -59,6 +59,9 @@ druid.extensions.loadList=["druid-datasketches"]
  }
 ```
 
+The `HLLSketchBuild` aggregator builds an HLL sketch object from the specified input column. When used during ingestion, Druid stores pre-generated HLL sketch objects in the datasource instead of the raw data from the input column.
+When applied at query time on an existing dimension, you can use the resulting column as an intermediate dimension by the [post-aggregators](#post-aggregators).
+
 > It is very common to use `HLLSketchBuild` in combination with [rollup](../../ingestion/rollup.md) to create a [metric](../../ingestion/ingestion-spec.html#metricsspec) on high-cardinality columns.  In this example, a metric called `userid_hll` is included in the `metricsSpec`.  This will perform a HLL sketch on the `userid` field at ingestion time, allowing for highly-performant approximate `COUNT DISTINCT` query operations and improving roll-up ratios when `userid` is then left out of [...]
 >
 > ```
@@ -89,6 +92,8 @@ druid.extensions.loadList=["druid-datasketches"]
  }
 ```
 
+You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches from an input dataset. For example, you can set up a batch processing job to generate the sketches before sending the data to Druid. You must serialize the sketches in the input dataset to Base64-encoded bytes. Then, specify `HLLSketchMerge` for the input column in the native ingestion `metricsSpec`.
+
 ### Post Aggregators
 
 #### Estimate


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org