You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "vtlim (via GitHub)" <gi...@apache.org> on 2023/03/01 01:02:36 UTC

[GitHub] [druid] vtlim commented on a diff in pull request #13819: Add Post Aggregators for Tuple Sketches

vtlim commented on code in PR #13819:
URL: https://github.com/apache/druid/pull/13819#discussion_r1120990146


##########
docs/development/extensions-core/datasketches-tuple.md:
##########
@@ -207,3 +207,39 @@ Returns a human-readable summary of a given ArrayOfDoublesSketch. This is a stri
   "field"  : <post aggregator that refers to an ArrayOfDoublesSketch (fieldAccess or another post aggregator)>
 }
 ```
+
+
+### Constant ArrayOfDoublesSketch 
+
+This post aggregator adds a Base64-encoded constant ArrayOfDoublesSketch value that you can use in other post aggregators.
+```json
+{
+  "type": "arrayOfDoublesSketchConstant",
+  "name": DESTINATION_COLUMN_NAME,
+  "value": CONSTANT_SKETCH_VALUE
+}
+```
+
+### Base64 output of ArrayOfDoublesSketch 
+
+This post aggregator outputs an ArrayOfDoublesSketch as a Base64-encoded string storing the constant tuple sketch value that you can use in other post aggregators. 
+
+```json
+{
+  "type": "arrayOfDoublesSketchToBase64String",
+  "name": DESTINATION_COLUMN_NAME,
+  "field": <post aggregator that refers to a ArrayOfDoublesSketch (fieldAccess or another post aggregator)>
+}
+```
+
+### Estimated metrics values for each column of ArrayOfDoublesSketch
+
+This post aggregator returns a list of estimated sum for each metric value from a given ArrayOfDoublesSketch. The result is _N_ double values, where _N_ is the number of double values kept in the sketch per key.

Review Comment:
   ```suggestion
   For each key-value pair in the given ArrayOfDoublesSketch, this post aggregator estimates the sum of the values associated with the key. The post aggregator returns _N_ double values, where _N_ is the number of double values associated with each key.
   ```
   
   Based on the example in https://github.com/apache/druid/pull/13819#discussion_r1115305144, if the sum is done for each key (row-wise), then I think we should say `_N_ is the number of keys in the ArrayOfDoublesSketch` but if the sum is done across different keys (column-wise), then the description for _N_ is fine as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org