You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/14 23:40:05 UTC

[GitHub] [pinot] Jackie-Jiang opened a new pull request #8351: Add more aggregations that can be solved with dictionary

Jackie-Jiang opened a new pull request #8351:
URL: https://github.com/apache/pinot/pull/8351


   Add the following aggregation functions that can be solved with dictionary if there is no filter:
   - MINMV
   - MAXMV
   - MINMAXRANGEMV
   - DISTINCTCOUNTMV
   - DISTINCTCOUNTHLL
   - DISTINCTCOUNTHLLMV
   - DISTINCTCOUNTRAWHLL
   - DISTINCTCOUNTRAWHLLMV


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r826732654



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedAggregationOperator.java
##########
@@ -76,18 +80,52 @@ protected IntermediateResultsBlock getNextBlock() {
       int dictionarySize = dictionary.length();
       switch (aggregationFunction.getType()) {
         case MIN:
+        case MINMV:
           aggregationResults.add(toDouble(dictionary.getMinVal()));
           break;
         case MAX:
+        case MAXMV:
           aggregationResults.add(toDouble(dictionary.getMaxVal()));
           break;
         case MINMAXRANGE:
+        case MINMAXRANGEMV:
           aggregationResults.add(
               new MinMaxRangePair(toDouble(dictionary.getMinVal()), toDouble(dictionary.getMaxVal())));
           break;
         case DISTINCTCOUNT:
+        case DISTINCTCOUNTMV:
           aggregationResults.add(getDistinctValueSet(dictionary));
           break;
+        case DISTINCTCOUNTHLL:
+        case DISTINCTCOUNTHLLMV:
+        case DISTINCTCOUNTRAWHLL:
+        case DISTINCTCOUNTRAWHLLMV: {
+          HyperLogLog hll;
+          if (dictionary.getValueType() == FieldSpec.DataType.BYTES) {
+            // Treat BYTES value as serialized HyperLogLog
+            try {
+              hll = ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(0));
+              for (int dictId = 1; dictId < dictionarySize; dictId++) {
+                hll.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(dictId)));
+              }
+            } catch (Exception e) {
+              throw new RuntimeException("Caught exception while merging HyperLogLogs", e);
+            }
+          } else {
+            int log2m;
+            if (aggregationFunction instanceof DistinctCountHLLAggregationFunction) {
+              log2m = ((DistinctCountHLLAggregationFunction) aggregationFunction).getLog2m();
+            } else {
+              log2m = ((DistinctCountRawHLLAggregationFunction) aggregationFunction).getLog2m();
+            }
+            hll = new HyperLogLog(log2m);
+            for (int dictId = 0; dictId < dictionarySize; dictId++) {
+              hll.offer(dictionary.get(dictId));
+            }
+          }
+          aggregationResults.add(hll);
+          break;
+        }

Review comment:
       By virtue of it taking 25 lines, I think we can say this logic is nontrivial. Embedding this logic ere makes it harder for the reader to quickly see which keys are handled by this switch statement (needs to search through this block of code for the `break`). Since which keys are actually handled in this switch statement is informative to the reader, we should factor out nontrivial blocks into well named methods to separate contexts. I suggest doing the same with the `DISTINCTCOUNTSMARTHLL` below, so this switch statement just switches on the function type, creates the function at a high level, and adds it to the list. Curious readers can drill in to the more complex construction routines if they ever need to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r827247517



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedAggregationOperator.java
##########
@@ -76,18 +80,52 @@ protected IntermediateResultsBlock getNextBlock() {
       int dictionarySize = dictionary.length();
       switch (aggregationFunction.getType()) {
         case MIN:
+        case MINMV:
           aggregationResults.add(toDouble(dictionary.getMinVal()));
           break;
         case MAX:
+        case MAXMV:
           aggregationResults.add(toDouble(dictionary.getMaxVal()));
           break;
         case MINMAXRANGE:
+        case MINMAXRANGEMV:
           aggregationResults.add(
               new MinMaxRangePair(toDouble(dictionary.getMinVal()), toDouble(dictionary.getMaxVal())));
           break;
         case DISTINCTCOUNT:
+        case DISTINCTCOUNTMV:
           aggregationResults.add(getDistinctValueSet(dictionary));
           break;
+        case DISTINCTCOUNTHLL:
+        case DISTINCTCOUNTHLLMV:
+        case DISTINCTCOUNTRAWHLL:
+        case DISTINCTCOUNTRAWHLLMV: {
+          HyperLogLog hll;
+          if (dictionary.getValueType() == FieldSpec.DataType.BYTES) {
+            // Treat BYTES value as serialized HyperLogLog
+            try {
+              hll = ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(0));
+              for (int dictId = 1; dictId < dictionarySize; dictId++) {
+                hll.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(dictId)));
+              }
+            } catch (Exception e) {
+              throw new RuntimeException("Caught exception while merging HyperLogLogs", e);
+            }
+          } else {
+            int log2m;
+            if (aggregationFunction instanceof DistinctCountHLLAggregationFunction) {
+              log2m = ((DistinctCountHLLAggregationFunction) aggregationFunction).getLog2m();
+            } else {
+              log2m = ((DistinctCountRawHLLAggregationFunction) aggregationFunction).getLog2m();
+            }
+            hll = new HyperLogLog(log2m);
+            for (int dictId = 0; dictId < dictionarySize; dictId++) {
+              hll.offer(dictionary.get(dictId));
+            }
+          }
+          aggregationResults.add(hll);
+          break;
+        }

Review comment:
       Good point, extracted the logic into a function




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#issuecomment-1067447164


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8351](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3b56370) into [master](https://codecov.io/gh/apache/pinot/commit/91c2ebbf297c4bf3fecb5f98413e9f00e324e2dc?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (91c2ebb) will **increase** coverage by `42.22%`.
   > The diff coverage is `63.15%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8351/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8351       +/-   ##
   =============================================
   + Coverage     27.62%   69.84%   +42.22%     
   - Complexity        0     4258     +4258     
   =============================================
     Files          1624     1636       +12     
     Lines         85450    85820      +370     
     Branches      12882    12924       +42     
   =============================================
   + Hits          23604    59940    +36336     
   + Misses        59631    21721    -37910     
   - Partials       2215     4159     +1944     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.90% <42.10%> (?)` | |
   | integration2 | `?` | |
   | unittests1 | `66.95% <63.15%> (?)` | |
   | unittests2 | `14.17% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ator/query/DictionaryBasedAggregationOperator.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9xdWVyeS9EaWN0aW9uYXJ5QmFzZWRBZ2dyZWdhdGlvbk9wZXJhdG9yLmphdmE=) | `78.31% <53.33%> (+38.60%)` | :arrow_up: |
   | [...rg/apache/pinot/core/plan/AggregationPlanNode.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL0FnZ3JlZ2F0aW9uUGxhbk5vZGUuamF2YQ==) | `91.08% <100.00%> (+38.61%)` | :arrow_up: |
   | [.../function/DistinctCountHLLAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50SExMQWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `43.96% <100.00%> (+26.48%)` | :arrow_up: |
   | [...nction/DistinctCountRawHLLAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50UmF3SExMQWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `100.00% <100.00%> (+100.00%)` | :arrow_up: |
   | [...apache/pinot/common/helix/ExtraInstanceConfig.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vaGVsaXgvRXh0cmFJbnN0YW5jZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...t/core/plan/StreamingInstanceResponsePlanNode.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL1N0cmVhbWluZ0luc3RhbmNlUmVzcG9uc2VQbGFuTm9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ore/operator/streaming/StreamingResponseUtils.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9zdHJlYW1pbmcvU3RyZWFtaW5nUmVzcG9uc2VVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ager/realtime/PeerSchemeSplitSegmentCommitter.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUGVlclNjaGVtZVNwbGl0U2VnbWVudENvbW1pdHRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...he/pinot/core/plan/StreamingSelectionPlanNode.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL1N0cmVhbWluZ1NlbGVjdGlvblBsYW5Ob2RlLmphdmE=) | `0.00% <0.00%> (-88.89%)` | :arrow_down: |
   | [...ator/streaming/StreamingSelectionOnlyOperator.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9zdHJlYW1pbmcvU3RyZWFtaW5nU2VsZWN0aW9uT25seU9wZXJhdG9yLmphdmE=) | `0.00% <0.00%> (-87.81%)` | :arrow_down: |
   | ... and [1222 more](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [91c2ebb...3b56370](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r826733493



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
##########
@@ -58,9 +58,12 @@
 @SuppressWarnings("rawtypes")
 public class AggregationPlanNode implements PlanNode {
   private static final EnumSet<AggregationFunctionType> DICTIONARY_BASED_FUNCTIONS =
-      EnumSet.of(AggregationFunctionType.MIN, AggregationFunctionType.MAX, AggregationFunctionType.MINMAXRANGE,
-          AggregationFunctionType.DISTINCTCOUNT, AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT,
-          AggregationFunctionType.DISTINCTCOUNTSMARTHLL);
+      EnumSet.of(AggregationFunctionType.MIN, AggregationFunctionType.MINMV, AggregationFunctionType.MAX,
+          AggregationFunctionType.MAXMV, AggregationFunctionType.MINMAXRANGE, AggregationFunctionType.MINMAXRANGEMV,
+          AggregationFunctionType.DISTINCTCOUNT, AggregationFunctionType.DISTINCTCOUNTMV,
+          AggregationFunctionType.DISTINCTCOUNTHLL, AggregationFunctionType.DISTINCTCOUNTHLLMV,
+          AggregationFunctionType.DISTINCTCOUNTRAWHLL, AggregationFunctionType.DISTINCTCOUNTRAWHLLMV,
+          AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT, AggregationFunctionType.DISTINCTCOUNTSMARTHLL);

Review comment:
       I think that a static import would actually improve all enum related code so that it reads similarly to a switch statement.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r827265605



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
##########
@@ -58,9 +58,12 @@
 @SuppressWarnings("rawtypes")
 public class AggregationPlanNode implements PlanNode {
   private static final EnumSet<AggregationFunctionType> DICTIONARY_BASED_FUNCTIONS =
-      EnumSet.of(AggregationFunctionType.MIN, AggregationFunctionType.MAX, AggregationFunctionType.MINMAXRANGE,
-          AggregationFunctionType.DISTINCTCOUNT, AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT,
-          AggregationFunctionType.DISTINCTCOUNTSMARTHLL);
+      EnumSet.of(AggregationFunctionType.MIN, AggregationFunctionType.MINMV, AggregationFunctionType.MAX,
+          AggregationFunctionType.MAXMV, AggregationFunctionType.MINMAXRANGE, AggregationFunctionType.MINMAXRANGEMV,
+          AggregationFunctionType.DISTINCTCOUNT, AggregationFunctionType.DISTINCTCOUNTMV,
+          AggregationFunctionType.DISTINCTCOUNTHLL, AggregationFunctionType.DISTINCTCOUNTHLLMV,
+          AggregationFunctionType.DISTINCTCOUNTRAWHLL, AggregationFunctionType.DISTINCTCOUNTRAWHLLMV,
+          AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT, AggregationFunctionType.DISTINCTCOUNTSMARTHLL);

Review comment:
       Reverted this change as it violates our checkstyle rules




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r826732654



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedAggregationOperator.java
##########
@@ -76,18 +80,52 @@ protected IntermediateResultsBlock getNextBlock() {
       int dictionarySize = dictionary.length();
       switch (aggregationFunction.getType()) {
         case MIN:
+        case MINMV:
           aggregationResults.add(toDouble(dictionary.getMinVal()));
           break;
         case MAX:
+        case MAXMV:
           aggregationResults.add(toDouble(dictionary.getMaxVal()));
           break;
         case MINMAXRANGE:
+        case MINMAXRANGEMV:
           aggregationResults.add(
               new MinMaxRangePair(toDouble(dictionary.getMinVal()), toDouble(dictionary.getMaxVal())));
           break;
         case DISTINCTCOUNT:
+        case DISTINCTCOUNTMV:
           aggregationResults.add(getDistinctValueSet(dictionary));
           break;
+        case DISTINCTCOUNTHLL:
+        case DISTINCTCOUNTHLLMV:
+        case DISTINCTCOUNTRAWHLL:
+        case DISTINCTCOUNTRAWHLLMV: {
+          HyperLogLog hll;
+          if (dictionary.getValueType() == FieldSpec.DataType.BYTES) {
+            // Treat BYTES value as serialized HyperLogLog
+            try {
+              hll = ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(0));
+              for (int dictId = 1; dictId < dictionarySize; dictId++) {
+                hll.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(dictionary.getBytesValue(dictId)));
+              }
+            } catch (Exception e) {
+              throw new RuntimeException("Caught exception while merging HyperLogLogs", e);
+            }
+          } else {
+            int log2m;
+            if (aggregationFunction instanceof DistinctCountHLLAggregationFunction) {
+              log2m = ((DistinctCountHLLAggregationFunction) aggregationFunction).getLog2m();
+            } else {
+              log2m = ((DistinctCountRawHLLAggregationFunction) aggregationFunction).getLog2m();
+            }
+            hll = new HyperLogLog(log2m);
+            for (int dictId = 0; dictId < dictionarySize; dictId++) {
+              hll.offer(dictionary.get(dictId));
+            }
+          }
+          aggregationResults.add(hll);
+          break;
+        }

Review comment:
       By virtue of it taking 25 lines, I think we can say this logic is nontrivial. Embedding this logic here makes it harder for the reader to quickly see which keys are handled by this switch statement (needs to search through this block of code for the `break`). Since which keys are actually handled in this switch statement is informative to the reader, we should factor out nontrivial blocks into well named methods to separate contexts. I suggest doing the same with the `DISTINCTCOUNTSMARTHLL` below, so this switch statement just switches on the function type, creates the function at a high level, and adds it to the list. Curious readers can drill in to the more complex construction routines if they ever need to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#discussion_r827248348



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
##########
@@ -58,9 +58,12 @@
 @SuppressWarnings("rawtypes")
 public class AggregationPlanNode implements PlanNode {
   private static final EnumSet<AggregationFunctionType> DICTIONARY_BASED_FUNCTIONS =
-      EnumSet.of(AggregationFunctionType.MIN, AggregationFunctionType.MAX, AggregationFunctionType.MINMAXRANGE,
-          AggregationFunctionType.DISTINCTCOUNT, AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT,
-          AggregationFunctionType.DISTINCTCOUNTSMARTHLL);
+      EnumSet.of(AggregationFunctionType.MIN, AggregationFunctionType.MINMV, AggregationFunctionType.MAX,
+          AggregationFunctionType.MAXMV, AggregationFunctionType.MINMAXRANGE, AggregationFunctionType.MINMAXRANGEMV,
+          AggregationFunctionType.DISTINCTCOUNT, AggregationFunctionType.DISTINCTCOUNTMV,
+          AggregationFunctionType.DISTINCTCOUNTHLL, AggregationFunctionType.DISTINCTCOUNTHLLMV,
+          AggregationFunctionType.DISTINCTCOUNTRAWHLL, AggregationFunctionType.DISTINCTCOUNTRAWHLLMV,
+          AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT, AggregationFunctionType.DISTINCTCOUNTSMARTHLL);

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #8351:
URL: https://github.com/apache/pinot/pull/8351


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8351: Add more aggregations that can be solved with dictionary

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8351:
URL: https://github.com/apache/pinot/pull/8351#issuecomment-1067447164


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8351](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3b56370) into [master](https://codecov.io/gh/apache/pinot/commit/91c2ebbf297c4bf3fecb5f98413e9f00e324e2dc?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (91c2ebb) will **increase** coverage by `43.26%`.
   > The diff coverage is `63.15%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8351/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8351       +/-   ##
   =============================================
   + Coverage     27.62%   70.89%   +43.26%     
   - Complexity        0     4258     +4258     
   =============================================
     Files          1624     1636       +12     
     Lines         85450    85820      +370     
     Branches      12882    12924       +42     
   =============================================
   + Hits          23604    60839    +37235     
   + Misses        59631    20797    -38834     
   - Partials       2215     4184     +1969     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.90% <42.10%> (?)` | |
   | integration2 | `27.58% <42.10%> (-0.04%)` | :arrow_down: |
   | unittests1 | `66.95% <63.15%> (?)` | |
   | unittests2 | `14.17% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ator/query/DictionaryBasedAggregationOperator.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9xdWVyeS9EaWN0aW9uYXJ5QmFzZWRBZ2dyZWdhdGlvbk9wZXJhdG9yLmphdmE=) | `78.31% <53.33%> (+38.60%)` | :arrow_up: |
   | [...rg/apache/pinot/core/plan/AggregationPlanNode.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL0FnZ3JlZ2F0aW9uUGxhbk5vZGUuamF2YQ==) | `91.08% <100.00%> (+38.61%)` | :arrow_up: |
   | [.../function/DistinctCountHLLAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50SExMQWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `43.96% <100.00%> (+26.48%)` | :arrow_up: |
   | [...nction/DistinctCountRawHLLAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50UmF3SExMQWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `100.00% <100.00%> (+100.00%)` | :arrow_up: |
   | [...ller/helix/core/minion/TaskTypeMetricsUpdater.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9UYXNrVHlwZU1ldHJpY3NVcGRhdGVyLmphdmE=) | `80.00% <0.00%> (-6.67%)` | :arrow_down: |
   | [...pache/pinot/core/query/utils/idset/EmptyIdSet.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS91dGlscy9pZHNldC9FbXB0eUlkU2V0LmphdmE=) | `25.00% <0.00%> (ø)` | |
   | [...anager/realtime/SegmentBuildTimeLeaseExtender.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvU2VnbWVudEJ1aWxkVGltZUxlYXNlRXh0ZW5kZXIuamF2YQ==) | `63.23% <0.00%> (ø)` | |
   | [...apache/pinot/ingestion/jobs/SegmentUriPushJob.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1iYXRjaC1pbmdlc3Rpb24vdjBfZGVwcmVjYXRlZC9waW5vdC1pbmdlc3Rpb24tY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9pbmdlc3Rpb24vam9icy9TZWdtZW50VXJpUHVzaEpvYi5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...rg/apache/pinot/ingestion/jobs/BaseSegmentJob.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1iYXRjaC1pbmdlc3Rpb24vdjBfZGVwcmVjYXRlZC9waW5vdC1pbmdlc3Rpb24tY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9pbmdlc3Rpb24vam9icy9CYXNlU2VnbWVudEpvYi5qYXZh) | `31.57% <0.00%> (ø)` | |
   | [...he/pinot/ingestion/utils/JobPreparationHelper.java](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1iYXRjaC1pbmdlc3Rpb24vdjBfZGVwcmVjYXRlZC9waW5vdC1pbmdlc3Rpb24tY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9pbmdlc3Rpb24vdXRpbHMvSm9iUHJlcGFyYXRpb25IZWxwZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [1186 more](https://codecov.io/gh/apache/pinot/pull/8351/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [91c2ebb...3b56370](https://codecov.io/gh/apache/pinot/pull/8351?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org