You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/16 01:16:12 UTC

[GitHub] [druid] techdocsmith commented on a diff in pull request #13088: Add a note to the documentation about pre-built HLLSketches

techdocsmith commented on code in PR #13088:
URL: https://github.com/apache/druid/pull/13088#discussion_r972529744


##########
docs/development/extensions-core/datasketches-hll.md:
##########
@@ -59,6 +59,11 @@ druid.extensions.loadList=["druid-datasketches"]
  }
 ```
 
+The `HLLSketchBuild` aggregator builds a datasketch from the input column specified. If used during ingestion, this
+will result in Druid storing pre-generated HLL Sketch objects in the datasource, rather than the original value itself.
+If used at query time on an existing dimension, the resulting column can be used as an intermediate dimension by the
+post-aggregators below.
+

Review Comment:
   ```suggestion
   The `HLLSketchBuild` aggregator builds a datasketch from the specified input column. When used during ingestion, Druid stores pre-generated HLL Sketch objects in the datasource instead of the original values.
   When applied at query time on an existing dimension, you can use the resulting column as an intermediate dimension by the [post-aggregators](#post-aggregators).
   
   ```
   Thanks for the clarification/ contribution @cloventt ! I've suggest some stylistic changes.



##########
docs/development/extensions-core/datasketches-hll.md:
##########
@@ -89,6 +94,11 @@ druid.extensions.loadList=["druid-datasketches"]
  }
 ```
 
+The `HLLSketchMerge` aggregator can be used to ingest pre-generated sketches from an input dataset. For example, an
+earlier batch processing job can be used to generate the sketches before the data is sent to Druid. To support this
+behaviour, the sketches in the input dataset must be serialised to base64-encoded bytes. Then, in the native ingestion
+`MetricsSpec` the `HLLSketchMerge` must be specified for the input column as shown above.
+

Review Comment:
   ```suggestion
   You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches from an input dataset. For example, you can set up a batch processing job to generate the sketches before sending the data to Druid. You must serialize the sketches in the input dataset to base-64 encoded bytes. Then, specify `HLLSketchMerge` for the input column in the native ingestion`MetricsSpec`.
   
   ```
   Stylistic suggestions. Also wonder if it might be helpful to have an example of the `MetricsSpec`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org