You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "vvivekiyer (via GitHub)" <gi...@apache.org> on 2023/11/02 08:26:02 UTC

[PR] groupBy Accuracy Reporting Improvements [pinot]

vvivekiyer opened a new pull request, #11932:
URL: https://github.com/apache/pinot/pull/11932

   This PR contains a couple of usability improvements for GroupBy queries
   
   **Change1: `numGroupsLimitReached`**
   * One of the uses of this metadata is to help users determine if their groupBy results could be partial or potentially inaccurate. 
   * This metadata is currently set if the number of groupBy keys collected matches the `num.groups.limit` irrespective of whether some groupBy keys have been ignored or not.
   * Added a fix for this in this PR
   
   **Change2: Better Visibility for User for potentially inaccurate GroupBy**
   * Due to segment trimming and server-level trimming, it's possible that the groupBy results are not perfectly accuracte. 
   * In such case, we should expose a metadata to inform the user about this potential inaccuracy. 
   * Added a metadata field called `isAccuracteGroupBy` which will be set to true only if no trimming/ignoring has taken place at the server or segment level.
   
   **Change3: User option to fail query if groupBy is potentially inaccurate**
   * The user should be able to potentially fail queries if they wish to perform perfectly accurate GroupBy. We can perhaps provide a queryOption for this.
   * I'm working on this part. Will update PR with these changes. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] groupBy Accuracy Reporting Improvements [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on PR #11932:
URL: https://github.com/apache/pinot/pull/11932#issuecomment-1816876827

   based on the QueryContext definition
   ```
     // Limit of number of groups stored in each segment
     private int _numGroupsLimit = InstancePlanMakerImplV2.DEFAULT_NUM_GROUPS_LIMIT;
     // Minimum number of groups to keep per segment when trimming groups for SQL GROUP BY
     private int _minSegmentGroupTrimSize = InstancePlanMakerImplV2.DEFAULT_MIN_SEGMENT_GROUP_TRIM_SIZE;
     // Minimum number of groups to keep across segments when trimming groups for SQL GROUP BY
     private int _minServerGroupTrimSize = InstancePlanMakerImplV2.DEFAULT_MIN_SERVER_GROUP_TRIM_SIZE;
     // Trim threshold to use for server combine for SQL GROUP BY
     private int _groupTrimThreshold = InstancePlanMakerImplV2.DEFAULT_GROUPBY_TRIM_THRESHOLD;
   ```
   
   looks like the differences of these configurations are basically controlling the maximum number of groups to keep in 
   1. segment level; 2. server level combine 
   
   however my argument: if any of these threshold were hit, then we have no accurate results yes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] groupBy Accuracy Reporting Improvements [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11932:
URL: https://github.com/apache/pinot/pull/11932#discussion_r1397700997


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByCombineOperator.java:
##########
@@ -188,6 +186,14 @@ protected void processSegments() {
             mergedKeys++;
           }
         }
+
+        // Set groups limit reached flag.
+        if (resultsBlock.isNumGroupsLimitReached()) {

Review Comment:
   both the numsGroupsLimit config and the trim threshold on the lookupMap of the index table are describing the maximum number of keys allows in the group by operator. 
   
   i was wondering what's the reason to have the 2 separate in the first place? logically speaking they seems the same to me, can't we use just 1?



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByCombineOperator.java:
##########
@@ -188,6 +186,14 @@ protected void processSegments() {
             mergedKeys++;
           }
         }
+
+        // Set groups limit reached flag.
+        if (resultsBlock.isNumGroupsLimitReached()) {

Review Comment:
   both the numsGroupsLimit config and the trim threshold on the lookupMap of the index table are describing the maximum number of keys allows in the group by operator. 
   
   i was wondering what's the reason to have the 2 separate in the first place? logically speaking they seems the same to me, can't we unify them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] groupBy Accuracy Reporting Improvements [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11932:
URL: https://github.com/apache/pinot/pull/11932#issuecomment-1790348712

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11932](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (0c697af) into [master](https://app.codecov.io/gh/apache/pinot/commit/fee11d6dc5678600c021a5993ead4b9c5102c76c?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (fee11d6) will **decrease** coverage by `61.42%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #11932       +/-   ##
   =============================================
   - Coverage     61.41%    0.00%   -61.42%     
   =============================================
     Files          2378     2302       -76     
     Lines        128865   125214     -3651     
     Branches      19927    19390      -537     
   =============================================
   - Hits          79143        0    -79143     
   - Misses        44001   125214    +81213     
   + Partials       5721        0     -5721     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.39%)` | :arrow_down: |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.41%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.42%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11932/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...common/response/broker/BrokerResponseNativeV2.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzcG9uc2UvYnJva2VyL0Jyb2tlclJlc3BvbnNlTmF0aXZlVjIuamF2YQ==) | `0.00% <ø> (-33.34%)` | :arrow_down: |
   | [...ot/common/response/broker/BrokerResponseStats.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzcG9uc2UvYnJva2VyL0Jyb2tlclJlc3BvbnNlU3RhdHMuamF2YQ==) | `0.00% <ø> (-80.65%)` | :arrow_down: |
   | [...e/query/aggregation/groupby/GroupKeyGenerator.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9ncm91cGJ5L0dyb3VwS2V5R2VuZXJhdG9yLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/trace/RequestContext.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdHJhY2UvUmVxdWVzdENvbnRleHQuamF2YQ==) | `0.00% <ø> (ø)` | |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/datatable/DataTable.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZS5qYXZh) | `0.00% <0.00%> (-92.07%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/metrics/BrokerMeter.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJNZXRlci5qYXZh) | `0.00% <0.00%> (-93.88%)` | :arrow_down: |
   | [...ot/core/query/reduce/ExecutionStatsAggregator.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZWR1Y2UvRXhlY3V0aW9uU3RhdHNBZ2dyZWdhdG9yLmphdmE=) | `0.00% <0.00%> (-87.78%)` | :arrow_down: |
   | [...e/operator/blocks/results/GroupByResultsBlock.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9ibG9ja3MvcmVzdWx0cy9Hcm91cEJ5UmVzdWx0c0Jsb2NrLmphdmE=) | `0.00% <0.00%> (-82.93%)` | :arrow_down: |
   | [...t/core/operator/query/FilteredGroupByOperator.java](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9xdWVyeS9GaWx0ZXJlZEdyb3VwQnlPcGVyYXRvci5qYXZh) | `0.00% <0.00%> (-70.59%)` | :arrow_down: |
   | ... and [13 more](https://app.codecov.io/gh/apache/pinot/pull/11932?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | |
   
   ... and [1967 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11932/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] groupBy Accuracy Reporting Improvements [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11932:
URL: https://github.com/apache/pinot/pull/11932#discussion_r1397700997


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByCombineOperator.java:
##########
@@ -188,6 +186,14 @@ protected void processSegments() {
             mergedKeys++;
           }
         }
+
+        // Set groups limit reached flag.
+        if (resultsBlock.isNumGroupsLimitReached()) {

Review Comment:
   both the numsGroupsLimit config and the trim threshold on the lookupMap of the index table are describing the maximum number of keys allows in the group by operator. 
   
   i was wondering what's the reason to have the 2 separate in the first place? can't we use just 1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] groupBy Accuracy Reporting Improvements [pinot]

Posted by "vvivekiyer (via GitHub)" <gi...@apache.org>.
vvivekiyer commented on PR #11932:
URL: https://github.com/apache/pinot/pull/11932#issuecomment-1790275774

   @siddharthteotia @Jackie-Jiang  please let me know if you have any thoughts/preferences for these 3 changes. 
   Meanwhile, I'll update the PR with **Change3** and some tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org