You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by fr...@apache.org on 2022/09/27 02:29:45 UTC
[druid] branch master updated: Add a note to the documentation about pre-built HLLSketches (#13088)
This is an automated email from the ASF dual-hosted git repository.
frankchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 0d7bf66578 Add a note to the documentation about pre-built HLLSketches (#13088)
0d7bf66578 is described below
commit 0d7bf66578479afd920ba96db4c44a01e912833b
Author: David Palmer <cl...@users.noreply.github.com>
AuthorDate: Tue Sep 27 15:29:39 2022 +1300
Add a note to the documentation about pre-built HLLSketches (#13088)
* add a note to the documentation about pre-built HLLSketches
Druid actually supports ingesting a pre-generated sketch column by using
the HLLSketchMerge aggregator. However, this functionality was
previously not made clear in the documentation.
* copyedit from the King's English to American English
* add suggested style changes
Co-authored-by: Charles Smith <te...@gmail.com>
Co-authored-by: Charles Smith <te...@gmail.com>
---
docs/development/extensions-core/datasketches-hll.md | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/docs/development/extensions-core/datasketches-hll.md b/docs/development/extensions-core/datasketches-hll.md
index 334cff16d6..07cc7da8b2 100644
--- a/docs/development/extensions-core/datasketches-hll.md
+++ b/docs/development/extensions-core/datasketches-hll.md
@@ -59,6 +59,9 @@ druid.extensions.loadList=["druid-datasketches"]
}
```
+The `HLLSketchBuild` aggregator builds an HLL sketch object from the specified input column. When used during ingestion, Druid stores pre-generated HLL sketch objects in the datasource instead of the raw data from the input column.
+When applied at query time on an existing dimension, you can use the resulting column as an intermediate dimension by the [post-aggregators](#post-aggregators).
+
> It is very common to use `HLLSketchBuild` in combination with [rollup](../../ingestion/rollup.md) to create a [metric](../../ingestion/ingestion-spec.html#metricsspec) on high-cardinality columns. In this example, a metric called `userid_hll` is included in the `metricsSpec`. This will perform a HLL sketch on the `userid` field at ingestion time, allowing for highly-performant approximate `COUNT DISTINCT` query operations and improving roll-up ratios when `userid` is then left out of [...]
>
> ```
@@ -89,6 +92,8 @@ druid.extensions.loadList=["druid-datasketches"]
}
```
+You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches from an input dataset. For example, you can set up a batch processing job to generate the sketches before sending the data to Druid. You must serialize the sketches in the input dataset to Base64-encoded bytes. Then, specify `HLLSketchMerge` for the input column in the native ingestion `metricsSpec`.
+
### Post Aggregators
#### Estimate
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org