You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/27 01:07:54 UTC

[GitHub] [incubator-pinot] wuwenw opened a new pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

wuwenw opened a new pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991


   ## Description
   <!-- Add a description of your PR here.
   A good description should include pointers to an issue or design document, etc.
   -->
   One of the major bottlenecks for the current GroupBy OrderBy query on high cardinality columns is the merge phase. Essentially every segment brings a large number of intermediate results to a global concurrent map for further aggregation and merge, which takes up a lot of space and is very time-consuming. This PR introduces an optimization option that each segment trims its intermediate results to a given size. The size is configurable by the user and is guaranteed to be max(limit N * 5, 5000). It won't affect accuracy much but reduces the running time for high cardinality dataset. ~5 times faster for String data with 10M cardinality. This option is turned off by default to ensure backward compatibility. 
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [x] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   Optimized GroupBy OrderBy queries by introducing an in-segment trim option that can significantly reduce the size of intermediate results and speed up the execution.
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1416bf7) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **decrease** coverage by `7.65%`.
   > The diff coverage is `86.97%`.
   
   > :exclamation: Current head 1416bf7 differs from pull request most recent head 36b304d. Consider uploading reports for the commit 36b304d to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   - Coverage     73.23%   65.58%   -7.66%     
     Complexity       12       12              
   ============================================
     Files          1439     1441       +2     
     Lines         71333    71549     +216     
     Branches      10334    10368      +34     
   ============================================
   - Hits          52243    46923    -5320     
   - Misses        15578    21234    +5656     
   + Partials       3512     3392     -120     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.58% <86.97%> (+0.23%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ces/PinotSegmentUploadDownloadRestletResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1Bpbm90U2VnbWVudFVwbG9hZERvd25sb2FkUmVzdGxldFJlc291cmNlLmphdmE=) | `10.00% <0.00%> (-45.03%)` | :arrow_down: |
   | [...erator/transform/PassThroughTransformOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vUGFzc1Rocm91Z2hUcmFuc2Zvcm1PcGVyYXRvci5qYXZh) | `85.71% <0.00%> (-14.29%)` | :arrow_down: |
   | [...t/server/api/resources/SegmentMetadataFetcher.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvYXBpL3Jlc291cmNlcy9TZWdtZW50TWV0YWRhdGFGZXRjaGVyLmphdmE=) | `45.00% <21.42%> (-24.05%)` | :arrow_down: |
   | [...ment/processing/collector/SortOrderComparator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3NpbmcvY29sbGVjdG9yL1NvcnRPcmRlckNvbXBhcmF0b3IuamF2YQ==) | `58.33% <58.33%> (ø)` | |
   | [...t/core/plan/AggregationGroupByOrderByPlanNode.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL0FnZ3JlZ2F0aW9uR3JvdXBCeU9yZGVyQnlQbGFuTm9kZS5qYXZh) | `45.00% <80.00%> (+2.89%)` | :arrow_up: |
   | [...ry/aggregation/groupby/DefaultGroupByExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9ncm91cGJ5L0RlZmF1bHRHcm91cEJ5RXhlY3V0b3IuamF2YQ==) | `98.27% <87.50%> (-1.73%)` | :arrow_down: |
   | [.../segment/processing/collector/ConcatCollector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3NpbmcvY29sbGVjdG9yL0NvbmNhdENvbGxlY3Rvci5qYXZh) | `88.88% <88.88%> (-0.86%)` | :arrow_down: |
   | [...segment/processing/serde/GenericRowSerializer.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3Npbmcvc2VyZGUvR2VuZXJpY1Jvd1NlcmlhbGl6ZXIuamF2YQ==) | `93.22% <93.22%> (ø)` | |
   | [...perator/combine/GroupByOrderByCombineOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9jb21iaW5lL0dyb3VwQnlPcmRlckJ5Q29tYmluZU9wZXJhdG9yLmphdmE=) | `77.77% <94.11%> (+2.39%)` | :arrow_up: |
   | [...gment/processing/serde/GenericRowDeserializer.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3Npbmcvc2VyZGUvR2VuZXJpY1Jvd0Rlc2VyaWFsaXplci5qYXZh) | `94.91% <94.91%> (ø)` | |
   | ... and [353 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...36b304d](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r646820519



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
##########
@@ -58,6 +63,14 @@ public AggregationGroupByOrderByPlanNode(IndexSegment indexSegment, QueryContext
     List<ExpressionContext> groupByExpressions = queryContext.getGroupByExpressions();
     assert groupByExpressions != null;
     _groupByExpressions = groupByExpressions.toArray(new ExpressionContext[0]);
+    _queryContext = queryContext;
+
+    // Only trim if there is OrderBy and has a positive trim size
+    if (queryContext.getOrderByExpressions() != null && minSegmentTrimSize > 0) {

Review comment:
       Because we have to do the actual comparison in the operator (during query run time), so imo more clear if we group the comparison in one place




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (aa5d367) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **decrease** coverage by `7.76%`.
   > The diff coverage is `64.74%`.
   
   > :exclamation: Current head aa5d367 differs from pull request most recent head 03beb3e. Consider uploading reports for the commit 03beb3e to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   - Coverage     73.23%   65.47%   -7.77%     
     Complexity       12       12              
   ============================================
     Files          1439     1454      +15     
     Lines         71333    72091     +758     
     Branches      10334    10441     +107     
   ============================================
   - Hits          52243    47199    -5044     
   - Misses        15578    21488    +5910     
   + Partials       3512     3404     -108     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.47% <64.74%> (+0.12%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...roker/requesthandler/BaseBrokerRequestHandler.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcmVxdWVzdGhhbmRsZXIvQmFzZUJyb2tlclJlcXVlc3RIYW5kbGVyLmphdmE=) | `17.37% <0.00%> (-53.88%)` | :arrow_down: |
   | [...e/pinot/common/minion/MergeRollupTaskMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01lcmdlUm9sbHVwVGFza01ldGFkYXRhLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/common/minion/MinionTaskMetadataUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01pbmlvblRhc2tNZXRhZGF0YVV0aWxzLmphdmE=) | `0.00% <0.00%> (-75.00%)` | :arrow_down: |
   | [...mmon/restlet/resources/SegmentServerDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvU2VnbWVudFNlcnZlckRlYnVnSW5mby5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/controller/api/debug/TableDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvZGVidWcvVGFibGVEZWJ1Z0luZm8uamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ces/PinotSegmentUploadDownloadRestletResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1Bpbm90U2VnbWVudFVwbG9hZERvd25sb2FkUmVzdGxldFJlc291cmNlLmphdmE=) | `10.00% <0.00%> (-45.03%)` | :arrow_down: |
   | [...t/controller/api/resources/TableDebugResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1RhYmxlRGVidWdSZXNvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...troller/helix/core/minion/ClusterInfoAccessor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9DbHVzdGVySW5mb0FjY2Vzc29yLmphdmE=) | `32.14% <0.00%> (-47.86%)` | :arrow_down: |
   | [...x/core/minion/generator/TaskGeneratorRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclJlZ2lzdHJ5LmphdmE=) | `76.00% <ø> (-4.00%)` | :arrow_down: |
   | [.../org/apache/pinot/core/common/MinionConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vTWluaW9uQ29uc3RhbnRzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | ... and [430 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...03beb3e](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c2b1628) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **decrease** coverage by `31.17%`.
   > The diff coverage is `42.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #6991       +/-   ##
   =============================================
   - Coverage     73.23%   42.05%   -31.18%     
   + Complexity       12        7        -5     
   =============================================
     Files          1439     1454       +15     
     Lines         71333    72109      +776     
     Branches      10334    10440      +106     
   =============================================
   - Hits          52243    30329    -21914     
   - Misses        15578    39171    +23593     
   + Partials       3512     2609      -903     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `42.05% <42.56%> (+0.20%)` | :arrow_up: |
   | unittests | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/common/minion/Granularity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL0dyYW51bGFyaXR5LmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/common/minion/MergeRollupTaskMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01lcmdlUm9sbHVwVGFza01ldGFkYXRhLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/common/minion/MinionTaskMetadataUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01pbmlvblRhc2tNZXRhZGF0YVV0aWxzLmphdmE=) | `50.00% <0.00%> (-25.00%)` | :arrow_down: |
   | [...not/common/restlet/resources/SegmentErrorInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvU2VnbWVudEVycm9ySW5mby5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...mmon/restlet/resources/SegmentServerDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvU2VnbWVudFNlcnZlckRlYnVnSW5mby5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/controller/api/debug/TableDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvZGVidWcvVGFibGVEZWJ1Z0luZm8uamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...t/controller/api/resources/TableDebugResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1RhYmxlRGVidWdSZXNvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...troller/helix/core/minion/ClusterInfoAccessor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9DbHVzdGVySW5mb0FjY2Vzc29yLmphdmE=) | `57.14% <0.00%> (-22.86%)` | :arrow_down: |
   | [...x/core/minion/generator/TaskGeneratorRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclJlZ2lzdHJ5LmphdmE=) | `80.00% <ø> (ø)` | |
   | [.../org/apache/pinot/core/common/MinionConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vTWluaW9uQ29uc3RhbnRzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | ... and [1006 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...c2b1628](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r648527466



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/groupby/GroupByInSegmentTrimTest.java
##########
@@ -0,0 +1,292 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.groupby;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.PriorityQueue;
+import java.util.Random;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.core.data.table.IntermediateRecord;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.AggregationGroupByOrderByOperator;
+import org.apache.pinot.core.plan.AggregationGroupByOrderByPlanNode;
+import org.apache.pinot.core.query.request.context.QueryContext;
+import org.apache.pinot.core.query.request.context.utils.QueryContextConverterUtils;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.MetricFieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.utils.Pair;
+import org.apache.pinot.spi.utils.ReadMode;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.DataProvider;
+import org.testng.annotations.Test;
+
+import static java.lang.Math.max;
+import static org.testng.Assert.assertEquals;
+
+
+/**
+ * Unit test for GroupBy Trim functionalities.
+ * - Builds a segment with random data.
+ * - Uses AggregationGroupByOrderByPlanNode class to construct an AggregationGroupByOrderByOperator
+ * - Perform aggregationGroupBy and OrderBy on the data
+ * - Also computes those results itself.
+ * - Asserts that the aggregation results returned by the class are the same as
+ *   returned by the local computations.
+ *
+ * Currently tests 'max' functions, and can be easily extended to
+ * test other conditions such as GroupBy without OrderBy
+ */
+public class GroupByInSegmentTrimTest {

Review comment:
       This test could be flaky, fix as discussed offline

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -38,28 +44,34 @@
 @SuppressWarnings("rawtypes")
 public class AggregationGroupByOrderByOperator extends BaseOperator<IntermediateResultsBlock> {
   private static final String OPERATOR_NAME = "AggregationGroupByOrderByOperator";
+  private static final int TRIM_OFF = -1;

Review comment:
       Seems not used?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +53,53 @@
 import org.slf4j.LoggerFactory;
 
 
+

Review comment:
       (nit) remove




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r640240892



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -117,6 +119,19 @@ public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
     _dataSchema = dataSchema;
   }
 
+  /**
+   * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult}.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
+                                  @Nullable AggregationGroupByResult aggregationGroupByResults,

Review comment:
       Follow https://docs.pinot.apache.org/developers/developers-and-contributors/code-setup to setup the code style in your IDE

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -61,7 +61,11 @@
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
+  public static final String SEGMENT_TRIM_SIZE = "num.segment.trim.limit";
+  public static final String SEGMENT_TRIM_OPT = "bool.segment.trim";

Review comment:
       ```suggestion
     public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,22 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public Collection<TableResizer.fullIntermediateResult> trimGroupByResult(int threshold, TableResizer tableResizer) {
+    // Check if a trim is necessary
+    int keyNum = 0;
+    if (_hasMVGroupByExpression) {
+      keyNum = _mvGroupKeys.length;
+    } else {
+      keyNum = _svGroupKeys.length;
+    }
+    Iterator<GroupKeyGenerator.GroupKey> groupKeyIterator = _groupKeyGenerator.getGroupKeys();
+    if (keyNum > threshold && threshold != -1) {
+      return tableResizer.trimInSegmentResults(groupKeyIterator, _groupByResultHolders, threshold);

Review comment:
       threshold should be calculated based on the `limit` instead of hardcoded

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -357,4 +372,88 @@ public Comparable extract(Record record) {
       }
     }
   }
+
+  public static class fullIntermediateResult extends IntermediateRecord {

Review comment:
       You can directly change the `IntermediateRecord` to include the record

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -61,7 +61,11 @@
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
+  public static final String SEGMENT_TRIM_SIZE = "num.segment.trim.limit";

Review comment:
       Let's not make this configurable in step 1. Use `max(limit * 5, 5000)` (same as inter segment trim). You can use `GroupByUtils` to get the trim size from the `limit`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -357,4 +372,88 @@ public Comparable extract(Record record) {
       }
     }
   }
+
+  public static class fullIntermediateResult extends IntermediateRecord {

Review comment:
       Class name should be capitalized




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r646787364



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
##########
@@ -58,6 +63,14 @@ public AggregationGroupByOrderByPlanNode(IndexSegment indexSegment, QueryContext
     List<ExpressionContext> groupByExpressions = queryContext.getGroupByExpressions();
     assert groupByExpressions != null;
     _groupByExpressions = groupByExpressions.toArray(new ExpressionContext[0]);
+    _queryContext = queryContext;
+
+    // Only trim if there is OrderBy and has a positive trim size
+    if (queryContext.getOrderByExpressions() != null && minSegmentTrimSize > 0) {

Review comment:
       Any reason for this behavior rather than figuring out in the plan?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r641182090



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -61,7 +61,11 @@
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
+  public static final String SEGMENT_TRIM_SIZE = "num.segment.trim.limit";
+  public static final String SEGMENT_TRIM_OPT = "bool.segment.trim";

Review comment:
       Changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r645271450



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);

Review comment:
       This test aims to trigger the in-segment trim by providing a low limit number and a low trimSize, so the size of intermediateResult is greater than max(5*limit, trimSize). intermediateResult will be trimmed by the in-segment trim feature and then a limit will be applied (this is unavoidable). Still, we can test to see if everything works in the in-segment trim phase 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r641166342



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,22 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public Collection<TableResizer.fullIntermediateResult> trimGroupByResult(int threshold, TableResizer tableResizer) {
+    // Check if a trim is necessary
+    int keyNum = 0;
+    if (_hasMVGroupByExpression) {
+      keyNum = _mvGroupKeys.length;
+    } else {
+      keyNum = _svGroupKeys.length;
+    }
+    Iterator<GroupKeyGenerator.GroupKey> groupKeyIterator = _groupKeyGenerator.getGroupKeys();
+    if (keyNum > threshold && threshold != -1) {
+      return tableResizer.trimInSegmentResults(groupKeyIterator, _groupByResultHolders, threshold);

Review comment:
       Now the threshold is calculated in the instance plan maker
   
   > threshold should be calculated based on the `limit` instead of hardcoded
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r644984885



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.table;
+
+/**
+ * Helper class to store a subset of Record fields
+ * IntermediateRecord is derived from a Record
+ * Some of the main properties of an IntermediateRecord are:
+ *
+ * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+ * 2. For values, IntermediateRecord should only have the columns needed for order by
+ * 3. Inside the values, the columns should be ordered by the order by sequence
+ * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable

Review comment:
       ```suggestion
    * 4. For order by on aggregations, final results are extracted
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DictionaryBasedGroupKeyGenerator.java
##########
@@ -93,6 +93,7 @@
 
   private final int _globalGroupIdUpperBound;
   private final RawKeyHolder _rawKeyHolder;
+  private int _keyNum;

Review comment:
       Remove

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -203,34 +206,55 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
     }
     int numRecordsToRetain = Math.min(numRecords, trimToSize);
     // make PQ of sorted records to retain
-    PriorityQueue<IntermediateRecord> priorityQueue = convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
+    PriorityQueue<IntermediateRecord> priorityQueue =
+        convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
     Record[] sortedArray = new Record[numRecordsToRetain];
     while (!priorityQueue.isEmpty()) {
       IntermediateRecord intermediateRecord = priorityQueue.poll();
-      Record record = recordsMap.get(intermediateRecord._key);
-      sortedArray[--numRecordsToRetain] = record;
+      sortedArray[--numRecordsToRetain] = intermediateRecord._record;;
     }
     return Arrays.asList(sortedArray);
   }
 
   /**
-   * Helper class to store a subset of Record fields
-   * IntermediateRecord is derived from a Record
-   * Some of the main properties of an IntermediateRecord are:
-   *
-   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
-   * 2. For values, IntermediateRecord should only have the columns needed for order by
-   * 3. Inside the values, the columns should be ordered by the order by sequence
-   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * Trim the aggregation results using a priority queue and returns a the priority queue.
+   * This method is to be called from individual segment if the intermediate results need to be trimmed.
+   * The use case now is Multi-Segment GroupBy OrderBy query.
    */
-  private static class IntermediateRecord {
-    final Key _key;
-    final Comparable[] _values;
+  public PriorityQueue<IntermediateRecord> trimInSegmentResults(Iterator<GroupKeyGenerator.GroupKey> groupKeyIterator,
+      GroupByResultHolder[] _groupByResultHolders, int size) {
+    if (!groupKeyIterator.hasNext() || _groupByResultHolders.length == 0 || size == 0) {

Review comment:
       These checks should have already been performed on the caller side

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -65,6 +76,33 @@ public void setUp() {
         new Key(new Object[]{"c", 50, 4.0}),
         new Key(new Object[]{"c", 300, 5.0})
     );
+
+    // GroupKeys: d1: a0 ~ a14, d2: 10.0 ~ 24.0, d3: 1.0 ~ 15.0
+    _groupKeys = new LinkedList<>();
+    for (int i = 0; i < 15; ++i) {
+      GroupKeyGenerator.GroupKey groupKey = new GroupKeyGenerator.GroupKey();
+      groupKey._keys = new Object[]{"a" + i, 10.0 + i, 1.0 + i};
+      groupKey._groupId = i;
+
+      _groupKeys.add(groupKey);
+    }
+
+    // Aggregation results: sum(m1) = 10 % d3, max(m2) = 100 % d3,
+    //                      distinctcount(m3) = set(new int[]{j}), avg(m4) = 10 / (j + 1)
+    _groupByResultHolders = new GroupByResultHolder[NUM_RESULT_HOLDER];
+    _groupByResultHolders[0] = new DoubleGroupByResultHolder(_groupKeys.size(), _groupKeys.size(), 0.0);
+    _groupByResultHolders[1] = new DoubleGroupByResultHolder(_groupKeys.size(), _groupKeys.size(), 0.0);
+    _groupByResultHolders[2] = new ObjectGroupByResultHolder(_groupKeys.size(), _groupKeys.size());
+    _groupByResultHolders[3] = new ObjectGroupByResultHolder(_groupKeys.size(), _groupKeys.size());
+    for (int j = 0; j < _groupKeys.size(); ++j) {
+      _groupByResultHolders[0]
+          .setValueForKey(_groupKeys.get(j)._groupId, 10 % ((Double) _groupKeys.get(j)._keys[2]));
+      _groupByResultHolders[1]
+          .setValueForKey(_groupKeys.get(j)._groupId, 100 % ((Double) _groupKeys.get(j)._keys[2]));
+      _groupByResultHolders[2].setValueForKey(_groupKeys.get(j)._groupId, new IntOpenHashSet(new int[]{j}));
+      _groupByResultHolders[3].setValueForKey(_groupKeys.get(j)._groupId, new AvgPair(10, j + 1));
+    }
+
     //@formatter:on

Review comment:
       Move this line to line 79

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderBySingleValueQueriesTest.java
##########
@@ -72,6 +73,46 @@ public void testGroupByOrderByPQLResponse(String query, List<String[]> expectedG
         expectedValues, true);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderBySQLResultTableProvider")
+  public void testGroupByOrderByTrimOptSQLLowLimitResponse(String query, List<Object[]> expectedResults,
+      long expectedNumDocsScanned, long expectedNumEntriesScannedInFilter, long expectedNumEntriesScannedPostFilter,
+      long expectedNumTotalDocs, DataSchema expectedDataSchema) {
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);

Review comment:
       Same as the comments in `InterSegmentOrderByMultiValueQueriesTest`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java
##########
@@ -36,6 +36,15 @@ public static int getTableCapacity(int limit) {
     return Math.max(limit * 5, NUM_RESULTS_LOWER_LIMIT);
   }
 
+  /**
+   * Returns the capacity of the table required by the given query.
+   * NOTE: It returns {@code max(limit * 5, resultLowerLimit)} where resultLowerLimit is configured by the user
+   *      (Default: 5000)
+   */
+  public static int getTableCapacity(int limit, int resultLowerLimit) {

Review comment:
       Also change `NUM_RESULTS_LOWER_LIMIT` to public `DEFAULT_MIN_NUM_GROUPS` which can be accessed by the plan maker

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;

Review comment:
       ```suggestion
     public static final String MIN_SEGMENT_GROUP_TRIM_SIZE = "min.segment.group.trim.size";
     public static final int DEFAULT_MIN_SEGMENT_GROUP_TRIM_SIZE = -1;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.table;
+
+/**
+ * Helper class to store a subset of Record fields
+ * IntermediateRecord is derived from a Record
+ * Some of the main properties of an IntermediateRecord are:
+ *
+ * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+ * 2. For values, IntermediateRecord should only have the columns needed for order by
+ * 3. Inside the values, the columns should be ordered by the order by sequence
+ * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+ * 5. There is an optional field to store the original record. Each segment can keep the intermediate result in this
+ * form to prevent constructing IntermediateRecord again in the server.
+ */
+public class IntermediateRecord {
+  public final Key _key;
+  public final Comparable[] _values;
+  public final Record _record;
+
+  IntermediateRecord(Key key, Comparable[] values, Record record) {
+    _key = key;
+    _values = values;
+    _record = record;
+  }
+

Review comment:
       (nit) remove empty line

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -203,34 +206,55 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
     }
     int numRecordsToRetain = Math.min(numRecords, trimToSize);
     // make PQ of sorted records to retain
-    PriorityQueue<IntermediateRecord> priorityQueue = convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
+    PriorityQueue<IntermediateRecord> priorityQueue =
+        convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
     Record[] sortedArray = new Record[numRecordsToRetain];
     while (!priorityQueue.isEmpty()) {
       IntermediateRecord intermediateRecord = priorityQueue.poll();
-      Record record = recordsMap.get(intermediateRecord._key);
-      sortedArray[--numRecordsToRetain] = record;
+      sortedArray[--numRecordsToRetain] = intermediateRecord._record;;
     }
     return Arrays.asList(sortedArray);
   }
 
   /**
-   * Helper class to store a subset of Record fields
-   * IntermediateRecord is derived from a Record
-   * Some of the main properties of an IntermediateRecord are:
-   *
-   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
-   * 2. For values, IntermediateRecord should only have the columns needed for order by
-   * 3. Inside the values, the columns should be ordered by the order by sequence
-   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * Trim the aggregation results using a priority queue and returns a the priority queue.
+   * This method is to be called from individual segment if the intermediate results need to be trimmed.
+   * The use case now is Multi-Segment GroupBy OrderBy query.

Review comment:
       ```suggestion
      * Trims the aggregation results using a priority queue and returns the priority queue.
      * This method is to be called from individual segment if the intermediate results need to be trimmed.
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.table;
+
+/**
+ * Helper class to store a subset of Record fields
+ * IntermediateRecord is derived from a Record
+ * Some of the main properties of an IntermediateRecord are:
+ *
+ * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+ * 2. For values, IntermediateRecord should only have the columns needed for order by
+ * 3. Inside the values, the columns should be ordered by the order by sequence
+ * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+ * 5. There is an optional field to store the original record. Each segment can keep the intermediate result in this

Review comment:
       The record is mandatory but not optional right?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
   private final int _maxInitialResultHolderCapacity;
   // Limit on number of groups stored for each segment, beyond which no new group will be created
   private final int _numGroupsLimit;
   // Used for SQL GROUP BY (server combine)
   private final int _groupByTrimThreshold;
+  private final boolean _enableSegmentGroupTrim;
+  private final int _minSegmentTrimSize;

Review comment:
       ```suggestion
     private final int _minSegmentGroupTrimSize;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,14 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public int getResultNum() {

Review comment:
       ```suggestion
     public int getNumGroups() {
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -281,6 +294,14 @@ public void setNumGroupsLimitReached(boolean numGroupsLimitReached) {
     _numGroupsLimitReached = numGroupsLimitReached;
   }
 
+  /**
+   * Get the collection of intermediate records
+   */
+  @Nullable
+  public Collection<IntermediateRecord> getIntermediateResult() {

Review comment:
       ```suggestion
     public Collection<IntermediateRecord> getIntermediateRecords() {
   ```

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -65,6 +76,33 @@ public void setUp() {
         new Key(new Object[]{"c", 50, 4.0}),
         new Key(new Object[]{"c", 300, 5.0})
     );
+
+    // GroupKeys: d1: a0 ~ a14, d2: 10.0 ~ 24.0, d3: 1.0 ~ 15.0
+    _groupKeys = new LinkedList<>();

Review comment:
       Let's make `_groupKeys` and `_groupByResultHolders` consistent with `_records` and `_keys` (5 groups, values in holder the same as in records)

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -43,10 +49,15 @@
       new DataSchema(new String[]{"d1", "d2", "d3", "sum(m1)", "max(m2)", "distinctcount(m3)", "avg(m4)"},
           new DataSchema.ColumnDataType[]{DataSchema.ColumnDataType.STRING, DataSchema.ColumnDataType.INT, DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.OBJECT, DataSchema.ColumnDataType.OBJECT});
   private static final int TRIM_TO_SIZE = 3;
+  private static final int NUM_RESULT_HOLDER = 4;

Review comment:
       We should be able to use the same `TRIM_TO_SIZE` for inner segment trim

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java
##########
@@ -36,6 +36,15 @@ public static int getTableCapacity(int limit) {
     return Math.max(limit * 5, NUM_RESULTS_LOWER_LIMIT);
   }
 
+  /**
+   * Returns the capacity of the table required by the given query.
+   * NOTE: It returns {@code max(limit * 5, resultLowerLimit)} where resultLowerLimit is configured by the user
+   *      (Default: 5000)
+   */
+  public static int getTableCapacity(int limit, int resultLowerLimit) {

Review comment:
       ```suggestion
     public static int getTableCapacity(int limit, int minNumGroups) {
   ```

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);

Review comment:
       I don't see the value of this test. This test only changes the limit of the query

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);
+
+    if (query.toUpperCase().contains("LIMIT")) {
+      String[] keyWords = query.split(" ");
+      keyWords[keyWords.length - 1] = String.valueOf(expectedResults.size());
+      query = String.join(" ", keyWords);
+    } else {
+      query += " LIMIT " + expectedResults.size();
+    }
+
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size(), true);
+    BrokerResponseNative brokerResponse = getBrokerResponseForSqlQuery(query, planMaker);
+    QueriesTestUtils
+        .testInterSegmentResultTable(brokerResponse, 400000L, 0, expectedNumEntriesScannedPostFilter, 400000L,
+            expectedResults, expectedResults.size(), expectedDataSchema);
+  }
+
+  /**
+   * Tests the in-segment build option for GroupBy OrderBy query. (No trim)
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptHighLimitSQLResults(String query, List<Object[]> expectedResults,

Review comment:
       ```suggestion
     public void testGroupByOrderByMVSegmentTrimSQLResults(String query, List<Object[]> expectedResults,
   ```

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);
+
+    if (query.toUpperCase().contains("LIMIT")) {
+      String[] keyWords = query.split(" ");
+      keyWords[keyWords.length - 1] = String.valueOf(expectedResults.size());
+      query = String.join(" ", keyWords);
+    } else {
+      query += " LIMIT " + expectedResults.size();
+    }
+
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size(), true);
+    BrokerResponseNative brokerResponse = getBrokerResponseForSqlQuery(query, planMaker);
+    QueriesTestUtils
+        .testInterSegmentResultTable(brokerResponse, 400000L, 0, expectedNumEntriesScannedPostFilter, 400000L,
+            expectedResults, expectedResults.size(), expectedDataSchema);
+  }
+
+  /**
+   * Tests the in-segment build option for GroupBy OrderBy query. (No trim)
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptHighLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size() + 1, true);

Review comment:
       Here we can simply put a very low `minSegmentGroupTrimSize` and expect the same result
   ```suggestion
       InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(1, true);
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);

Review comment:
       Move this line back for better readability

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderBySingleValueQueriesTest.java
##########
@@ -72,6 +73,46 @@ public void testGroupByOrderByPQLResponse(String query, List<String[]> expectedG
         expectedValues, true);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderBySQLResultTableProvider")
+  public void testGroupByOrderByTrimOptSQLLowLimitResponse(String query, List<Object[]> expectedResults,
+      long expectedNumDocsScanned, long expectedNumEntriesScannedInFilter, long expectedNumEntriesScannedPostFilter,
+      long expectedNumTotalDocs, DataSchema expectedDataSchema) {
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);
+
+    if (query.toUpperCase().contains("LIMIT")) {
+      String[] keyWords = query.split(" ");
+      keyWords[keyWords.length - 1] = String.valueOf(expectedResults.size());
+      query = String.join(" ", keyWords);
+    } else {
+      query += " LIMIT " + expectedResults.size();
+    }
+
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size(), true);
+    BrokerResponseNative brokerResponse = getBrokerResponseForSqlQuery(query, planMaker);
+    QueriesTestUtils
+        .testInterSegmentResultTable(brokerResponse, expectedNumDocsScanned, expectedNumEntriesScannedInFilter,
+            expectedNumEntriesScannedPostFilter, expectedNumTotalDocs, expectedResults, expectedResults.size(),
+            expectedDataSchema);
+  }
+
+  /**
+   * Tests the in-segment build option for GroupBy OrderBy query. (No trim)
+   */
+  @Test(dataProvider = "orderBySQLResultTableProvider")
+  public void testGroupByOrderByTrimOptHighLimitSQLResponse(String query, List<Object[]> expectedResults,

Review comment:
       Same as the comments in `InterSegmentOrderByMultiValueQueriesTest`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -105,8 +121,61 @@ public InstancePlanMakerImplV2(QueryExecutorConfig queryExecutorConfig) {
     Preconditions.checkState(_maxInitialResultHolderCapacity <= _numGroupsLimit,
         "Invalid configuration: maxInitialResultHolderCapacity: %d must be smaller or equal to numGroupsLimit: %d",
         _maxInitialResultHolderCapacity, _numGroupsLimit);
-    LOGGER.info("Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}",
-        _maxInitialResultHolderCapacity, _numGroupsLimit);
+    _enableSegmentGroupTrim =
+        queryExecutorConfig.getConfig().getProperty(ENABLE_SEGMENT_GROUP_TRIM, DEFAULT_ENABLE_SEGMENT_GROUP_TRIM);
+    LOGGER.info(
+        "Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}, enableSegmentTrim: {}",
+        _maxInitialResultHolderCapacity, _numGroupsLimit, _enableSegmentGroupTrim);
+  }
+
+  /**
+   * Returns {@code true} if the given aggregation-only without filter QueryContext can be solved with segment metadata,
+   * {@code false} otherwise.
+   * <p>Aggregations supported: COUNT
+   */
+  @VisibleForTesting
+  static boolean isFitForMetadataBasedPlan(QueryContext queryContext) {

Review comment:
       Don't enable the `rearrange code` option during the reformat. It will introduce too much unnecessary change

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -106,8 +116,14 @@ protected IntermediateResultsBlock getNextBlock() {
       groupByExecutor.process(transformBlock);
     }
 
-    // Build intermediate result block based on aggregation group-by result from the executor
-    return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), _dataSchema);
+    // Trim is off or no need to trim
+    if (_trimSize == TRIM_OFF || groupByExecutor.getResultNum() <= _trimSize) {

Review comment:
       ```suggestion
       if (_trimSize <= 0 || groupByExecutor.getNumGroups() <= _trimSize) {
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
   private final int _maxInitialResultHolderCapacity;
   // Limit on number of groups stored for each segment, beyond which no new group will be created
   private final int _numGroupsLimit;
   // Used for SQL GROUP BY (server combine)
   private final int _groupByTrimThreshold;
+  private final boolean _enableSegmentGroupTrim;
+  private final int _minSegmentTrimSize;

Review comment:
       As discussed offline, we can remove `_enableSegmentGroupTrim` and only maintain the trim size here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r644984885



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.table;
+
+/**
+ * Helper class to store a subset of Record fields
+ * IntermediateRecord is derived from a Record
+ * Some of the main properties of an IntermediateRecord are:
+ *
+ * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+ * 2. For values, IntermediateRecord should only have the columns needed for order by
+ * 3. Inside the values, the columns should be ordered by the order by sequence
+ * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable

Review comment:
       ```suggestion
    * 4. For order by on aggregations, final results are extracted
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DictionaryBasedGroupKeyGenerator.java
##########
@@ -93,6 +93,7 @@
 
   private final int _globalGroupIdUpperBound;
   private final RawKeyHolder _rawKeyHolder;
+  private int _keyNum;

Review comment:
       Remove

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -203,34 +206,55 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
     }
     int numRecordsToRetain = Math.min(numRecords, trimToSize);
     // make PQ of sorted records to retain
-    PriorityQueue<IntermediateRecord> priorityQueue = convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
+    PriorityQueue<IntermediateRecord> priorityQueue =
+        convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
     Record[] sortedArray = new Record[numRecordsToRetain];
     while (!priorityQueue.isEmpty()) {
       IntermediateRecord intermediateRecord = priorityQueue.poll();
-      Record record = recordsMap.get(intermediateRecord._key);
-      sortedArray[--numRecordsToRetain] = record;
+      sortedArray[--numRecordsToRetain] = intermediateRecord._record;;
     }
     return Arrays.asList(sortedArray);
   }
 
   /**
-   * Helper class to store a subset of Record fields
-   * IntermediateRecord is derived from a Record
-   * Some of the main properties of an IntermediateRecord are:
-   *
-   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
-   * 2. For values, IntermediateRecord should only have the columns needed for order by
-   * 3. Inside the values, the columns should be ordered by the order by sequence
-   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * Trim the aggregation results using a priority queue and returns a the priority queue.
+   * This method is to be called from individual segment if the intermediate results need to be trimmed.
+   * The use case now is Multi-Segment GroupBy OrderBy query.
    */
-  private static class IntermediateRecord {
-    final Key _key;
-    final Comparable[] _values;
+  public PriorityQueue<IntermediateRecord> trimInSegmentResults(Iterator<GroupKeyGenerator.GroupKey> groupKeyIterator,
+      GroupByResultHolder[] _groupByResultHolders, int size) {
+    if (!groupKeyIterator.hasNext() || _groupByResultHolders.length == 0 || size == 0) {

Review comment:
       These checks should have already been performed on the caller side

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -65,6 +76,33 @@ public void setUp() {
         new Key(new Object[]{"c", 50, 4.0}),
         new Key(new Object[]{"c", 300, 5.0})
     );
+
+    // GroupKeys: d1: a0 ~ a14, d2: 10.0 ~ 24.0, d3: 1.0 ~ 15.0
+    _groupKeys = new LinkedList<>();
+    for (int i = 0; i < 15; ++i) {
+      GroupKeyGenerator.GroupKey groupKey = new GroupKeyGenerator.GroupKey();
+      groupKey._keys = new Object[]{"a" + i, 10.0 + i, 1.0 + i};
+      groupKey._groupId = i;
+
+      _groupKeys.add(groupKey);
+    }
+
+    // Aggregation results: sum(m1) = 10 % d3, max(m2) = 100 % d3,
+    //                      distinctcount(m3) = set(new int[]{j}), avg(m4) = 10 / (j + 1)
+    _groupByResultHolders = new GroupByResultHolder[NUM_RESULT_HOLDER];
+    _groupByResultHolders[0] = new DoubleGroupByResultHolder(_groupKeys.size(), _groupKeys.size(), 0.0);
+    _groupByResultHolders[1] = new DoubleGroupByResultHolder(_groupKeys.size(), _groupKeys.size(), 0.0);
+    _groupByResultHolders[2] = new ObjectGroupByResultHolder(_groupKeys.size(), _groupKeys.size());
+    _groupByResultHolders[3] = new ObjectGroupByResultHolder(_groupKeys.size(), _groupKeys.size());
+    for (int j = 0; j < _groupKeys.size(); ++j) {
+      _groupByResultHolders[0]
+          .setValueForKey(_groupKeys.get(j)._groupId, 10 % ((Double) _groupKeys.get(j)._keys[2]));
+      _groupByResultHolders[1]
+          .setValueForKey(_groupKeys.get(j)._groupId, 100 % ((Double) _groupKeys.get(j)._keys[2]));
+      _groupByResultHolders[2].setValueForKey(_groupKeys.get(j)._groupId, new IntOpenHashSet(new int[]{j}));
+      _groupByResultHolders[3].setValueForKey(_groupKeys.get(j)._groupId, new AvgPair(10, j + 1));
+    }
+
     //@formatter:on

Review comment:
       Move this line to line 79

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderBySingleValueQueriesTest.java
##########
@@ -72,6 +73,46 @@ public void testGroupByOrderByPQLResponse(String query, List<String[]> expectedG
         expectedValues, true);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderBySQLResultTableProvider")
+  public void testGroupByOrderByTrimOptSQLLowLimitResponse(String query, List<Object[]> expectedResults,
+      long expectedNumDocsScanned, long expectedNumEntriesScannedInFilter, long expectedNumEntriesScannedPostFilter,
+      long expectedNumTotalDocs, DataSchema expectedDataSchema) {
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);

Review comment:
       Same as the comments in `InterSegmentOrderByMultiValueQueriesTest`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java
##########
@@ -36,6 +36,15 @@ public static int getTableCapacity(int limit) {
     return Math.max(limit * 5, NUM_RESULTS_LOWER_LIMIT);
   }
 
+  /**
+   * Returns the capacity of the table required by the given query.
+   * NOTE: It returns {@code max(limit * 5, resultLowerLimit)} where resultLowerLimit is configured by the user
+   *      (Default: 5000)
+   */
+  public static int getTableCapacity(int limit, int resultLowerLimit) {

Review comment:
       Also change `NUM_RESULTS_LOWER_LIMIT` to public `DEFAULT_MIN_NUM_GROUPS` which can be accessed by the plan maker

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;

Review comment:
       ```suggestion
     public static final String MIN_SEGMENT_GROUP_TRIM_SIZE = "min.segment.group.trim.size";
     public static final int DEFAULT_MIN_SEGMENT_GROUP_TRIM_SIZE = -1;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.table;
+
+/**
+ * Helper class to store a subset of Record fields
+ * IntermediateRecord is derived from a Record
+ * Some of the main properties of an IntermediateRecord are:
+ *
+ * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+ * 2. For values, IntermediateRecord should only have the columns needed for order by
+ * 3. Inside the values, the columns should be ordered by the order by sequence
+ * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+ * 5. There is an optional field to store the original record. Each segment can keep the intermediate result in this
+ * form to prevent constructing IntermediateRecord again in the server.
+ */
+public class IntermediateRecord {
+  public final Key _key;
+  public final Comparable[] _values;
+  public final Record _record;
+
+  IntermediateRecord(Key key, Comparable[] values, Record record) {
+    _key = key;
+    _values = values;
+    _record = record;
+  }
+

Review comment:
       (nit) remove empty line

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -203,34 +206,55 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
     }
     int numRecordsToRetain = Math.min(numRecords, trimToSize);
     // make PQ of sorted records to retain
-    PriorityQueue<IntermediateRecord> priorityQueue = convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
+    PriorityQueue<IntermediateRecord> priorityQueue =
+        convertToIntermediateRecordsPQ(recordsMap, numRecordsToRetain, _intermediateRecordComparator.reversed());
     Record[] sortedArray = new Record[numRecordsToRetain];
     while (!priorityQueue.isEmpty()) {
       IntermediateRecord intermediateRecord = priorityQueue.poll();
-      Record record = recordsMap.get(intermediateRecord._key);
-      sortedArray[--numRecordsToRetain] = record;
+      sortedArray[--numRecordsToRetain] = intermediateRecord._record;;
     }
     return Arrays.asList(sortedArray);
   }
 
   /**
-   * Helper class to store a subset of Record fields
-   * IntermediateRecord is derived from a Record
-   * Some of the main properties of an IntermediateRecord are:
-   *
-   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
-   * 2. For values, IntermediateRecord should only have the columns needed for order by
-   * 3. Inside the values, the columns should be ordered by the order by sequence
-   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * Trim the aggregation results using a priority queue and returns a the priority queue.
+   * This method is to be called from individual segment if the intermediate results need to be trimmed.
+   * The use case now is Multi-Segment GroupBy OrderBy query.

Review comment:
       ```suggestion
      * Trims the aggregation results using a priority queue and returns the priority queue.
      * This method is to be called from individual segment if the intermediate results need to be trimmed.
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.table;
+
+/**
+ * Helper class to store a subset of Record fields
+ * IntermediateRecord is derived from a Record
+ * Some of the main properties of an IntermediateRecord are:
+ *
+ * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+ * 2. For values, IntermediateRecord should only have the columns needed for order by
+ * 3. Inside the values, the columns should be ordered by the order by sequence
+ * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+ * 5. There is an optional field to store the original record. Each segment can keep the intermediate result in this

Review comment:
       The record is mandatory but not optional right?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
   private final int _maxInitialResultHolderCapacity;
   // Limit on number of groups stored for each segment, beyond which no new group will be created
   private final int _numGroupsLimit;
   // Used for SQL GROUP BY (server combine)
   private final int _groupByTrimThreshold;
+  private final boolean _enableSegmentGroupTrim;
+  private final int _minSegmentTrimSize;

Review comment:
       ```suggestion
     private final int _minSegmentGroupTrimSize;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,14 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public int getResultNum() {

Review comment:
       ```suggestion
     public int getNumGroups() {
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -281,6 +294,14 @@ public void setNumGroupsLimitReached(boolean numGroupsLimitReached) {
     _numGroupsLimitReached = numGroupsLimitReached;
   }
 
+  /**
+   * Get the collection of intermediate records
+   */
+  @Nullable
+  public Collection<IntermediateRecord> getIntermediateResult() {

Review comment:
       ```suggestion
     public Collection<IntermediateRecord> getIntermediateRecords() {
   ```

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -65,6 +76,33 @@ public void setUp() {
         new Key(new Object[]{"c", 50, 4.0}),
         new Key(new Object[]{"c", 300, 5.0})
     );
+
+    // GroupKeys: d1: a0 ~ a14, d2: 10.0 ~ 24.0, d3: 1.0 ~ 15.0
+    _groupKeys = new LinkedList<>();

Review comment:
       Let's make `_groupKeys` and `_groupByResultHolders` consistent with `_records` and `_keys` (5 groups, values in holder the same as in records)

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -43,10 +49,15 @@
       new DataSchema(new String[]{"d1", "d2", "d3", "sum(m1)", "max(m2)", "distinctcount(m3)", "avg(m4)"},
           new DataSchema.ColumnDataType[]{DataSchema.ColumnDataType.STRING, DataSchema.ColumnDataType.INT, DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.OBJECT, DataSchema.ColumnDataType.OBJECT});
   private static final int TRIM_TO_SIZE = 3;
+  private static final int NUM_RESULT_HOLDER = 4;

Review comment:
       We should be able to use the same `TRIM_TO_SIZE` for inner segment trim

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java
##########
@@ -36,6 +36,15 @@ public static int getTableCapacity(int limit) {
     return Math.max(limit * 5, NUM_RESULTS_LOWER_LIMIT);
   }
 
+  /**
+   * Returns the capacity of the table required by the given query.
+   * NOTE: It returns {@code max(limit * 5, resultLowerLimit)} where resultLowerLimit is configured by the user
+   *      (Default: 5000)
+   */
+  public static int getTableCapacity(int limit, int resultLowerLimit) {

Review comment:
       ```suggestion
     public static int getTableCapacity(int limit, int minNumGroups) {
   ```

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);

Review comment:
       I don't see the value of this test. This test only changes the limit of the query

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);
+
+    if (query.toUpperCase().contains("LIMIT")) {
+      String[] keyWords = query.split(" ");
+      keyWords[keyWords.length - 1] = String.valueOf(expectedResults.size());
+      query = String.join(" ", keyWords);
+    } else {
+      query += " LIMIT " + expectedResults.size();
+    }
+
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size(), true);
+    BrokerResponseNative brokerResponse = getBrokerResponseForSqlQuery(query, planMaker);
+    QueriesTestUtils
+        .testInterSegmentResultTable(brokerResponse, 400000L, 0, expectedNumEntriesScannedPostFilter, 400000L,
+            expectedResults, expectedResults.size(), expectedDataSchema);
+  }
+
+  /**
+   * Tests the in-segment build option for GroupBy OrderBy query. (No trim)
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptHighLimitSQLResults(String query, List<Object[]> expectedResults,

Review comment:
       ```suggestion
     public void testGroupByOrderByMVSegmentTrimSQLResults(String query, List<Object[]> expectedResults,
   ```

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);
+
+    if (query.toUpperCase().contains("LIMIT")) {
+      String[] keyWords = query.split(" ");
+      keyWords[keyWords.length - 1] = String.valueOf(expectedResults.size());
+      query = String.join(" ", keyWords);
+    } else {
+      query += " LIMIT " + expectedResults.size();
+    }
+
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size(), true);
+    BrokerResponseNative brokerResponse = getBrokerResponseForSqlQuery(query, planMaker);
+    QueriesTestUtils
+        .testInterSegmentResultTable(brokerResponse, 400000L, 0, expectedNumEntriesScannedPostFilter, 400000L,
+            expectedResults, expectedResults.size(), expectedDataSchema);
+  }
+
+  /**
+   * Tests the in-segment build option for GroupBy OrderBy query. (No trim)
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptHighLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size() + 1, true);

Review comment:
       Here we can simply put a very low `minSegmentGroupTrimSize` and expect the same result
   ```suggestion
       InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(1, true);
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);

Review comment:
       Move this line back for better readability

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderBySingleValueQueriesTest.java
##########
@@ -72,6 +73,46 @@ public void testGroupByOrderByPQLResponse(String query, List<String[]> expectedG
         expectedValues, true);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderBySQLResultTableProvider")
+  public void testGroupByOrderByTrimOptSQLLowLimitResponse(String query, List<Object[]> expectedResults,
+      long expectedNumDocsScanned, long expectedNumEntriesScannedInFilter, long expectedNumEntriesScannedPostFilter,
+      long expectedNumTotalDocs, DataSchema expectedDataSchema) {
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);
+
+    if (query.toUpperCase().contains("LIMIT")) {
+      String[] keyWords = query.split(" ");
+      keyWords[keyWords.length - 1] = String.valueOf(expectedResults.size());
+      query = String.join(" ", keyWords);
+    } else {
+      query += " LIMIT " + expectedResults.size();
+    }
+
+    InstancePlanMakerImplV2 planMaker = new InstancePlanMakerImplV2(expectedResults.size(), true);
+    BrokerResponseNative brokerResponse = getBrokerResponseForSqlQuery(query, planMaker);
+    QueriesTestUtils
+        .testInterSegmentResultTable(brokerResponse, expectedNumDocsScanned, expectedNumEntriesScannedInFilter,
+            expectedNumEntriesScannedPostFilter, expectedNumTotalDocs, expectedResults, expectedResults.size(),
+            expectedDataSchema);
+  }
+
+  /**
+   * Tests the in-segment build option for GroupBy OrderBy query. (No trim)
+   */
+  @Test(dataProvider = "orderBySQLResultTableProvider")
+  public void testGroupByOrderByTrimOptHighLimitSQLResponse(String query, List<Object[]> expectedResults,

Review comment:
       Same as the comments in `InterSegmentOrderByMultiValueQueriesTest`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -105,8 +121,61 @@ public InstancePlanMakerImplV2(QueryExecutorConfig queryExecutorConfig) {
     Preconditions.checkState(_maxInitialResultHolderCapacity <= _numGroupsLimit,
         "Invalid configuration: maxInitialResultHolderCapacity: %d must be smaller or equal to numGroupsLimit: %d",
         _maxInitialResultHolderCapacity, _numGroupsLimit);
-    LOGGER.info("Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}",
-        _maxInitialResultHolderCapacity, _numGroupsLimit);
+    _enableSegmentGroupTrim =
+        queryExecutorConfig.getConfig().getProperty(ENABLE_SEGMENT_GROUP_TRIM, DEFAULT_ENABLE_SEGMENT_GROUP_TRIM);
+    LOGGER.info(
+        "Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}, enableSegmentTrim: {}",
+        _maxInitialResultHolderCapacity, _numGroupsLimit, _enableSegmentGroupTrim);
+  }
+
+  /**
+   * Returns {@code true} if the given aggregation-only without filter QueryContext can be solved with segment metadata,
+   * {@code false} otherwise.
+   * <p>Aggregations supported: COUNT
+   */
+  @VisibleForTesting
+  static boolean isFitForMetadataBasedPlan(QueryContext queryContext) {

Review comment:
       Don't enable the `rearrange code` option during the reformat. It will introduce too much unnecessary change

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -106,8 +116,14 @@ protected IntermediateResultsBlock getNextBlock() {
       groupByExecutor.process(transformBlock);
     }
 
-    // Build intermediate result block based on aggregation group-by result from the executor
-    return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), _dataSchema);
+    // Trim is off or no need to trim
+    if (_trimSize == TRIM_OFF || groupByExecutor.getResultNum() <= _trimSize) {

Review comment:
       ```suggestion
       if (_trimSize <= 0 || groupByExecutor.getNumGroups() <= _trimSize) {
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -52,39 +52,56 @@
 import org.slf4j.LoggerFactory;
 
 
+
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
+  public static final String SIZE_SEGMENT_GROUP_TRIM = "size.segment.group.trim";
+  public static final int DEFAULT_SEGMENT_TRIM_SIZE = -1;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
   private final int _maxInitialResultHolderCapacity;
   // Limit on number of groups stored for each segment, beyond which no new group will be created
   private final int _numGroupsLimit;
   // Used for SQL GROUP BY (server combine)
   private final int _groupByTrimThreshold;
+  private final boolean _enableSegmentGroupTrim;
+  private final int _minSegmentTrimSize;

Review comment:
       As discussed offline, we can remove `_enableSegmentGroupTrim` and only maintain the trim size here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (818c0ea) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **increase** coverage by `0.12%`.
   > The diff coverage is `66.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   + Coverage     73.23%   73.35%   +0.12%     
     Complexity       12       12              
   ============================================
     Files          1439     1454      +15     
     Lines         71333    72105     +772     
     Branches      10334    10441     +107     
   ============================================
   + Hits          52243    52895     +652     
   - Misses        15578    15678     +100     
   - Partials       3512     3532      +20     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `42.00% <42.52%> (+0.15%)` | :arrow_up: |
   | unittests | `65.44% <64.74%> (+0.09%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/common/minion/MergeRollupTaskMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01lcmdlUm9sbHVwVGFza01ldGFkYXRhLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/common/minion/MinionTaskMetadataUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01pbmlvblRhc2tNZXRhZGF0YVV0aWxzLmphdmE=) | `50.00% <0.00%> (-25.00%)` | :arrow_down: |
   | [...mmon/restlet/resources/SegmentServerDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvU2VnbWVudFNlcnZlckRlYnVnSW5mby5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/controller/api/debug/TableDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvZGVidWcvVGFibGVEZWJ1Z0luZm8uamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...t/controller/api/resources/TableDebugResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1RhYmxlRGVidWdSZXNvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...troller/helix/core/minion/ClusterInfoAccessor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9DbHVzdGVySW5mb0FjY2Vzc29yLmphdmE=) | `71.42% <0.00%> (-8.58%)` | :arrow_down: |
   | [...x/core/minion/generator/TaskGeneratorRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclJlZ2lzdHJ5LmphdmE=) | `80.00% <ø> (ø)` | |
   | [.../org/apache/pinot/core/common/MinionConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vTWluaW9uQ29uc3RhbnRzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ore/minion/rollup/MergeRollupSegmentConverter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vcm9sbHVwL01lcmdlUm9sbHVwU2VnbWVudENvbnZlcnRlci5qYXZh) | `89.28% <ø> (ø)` | |
   | [...erator/transform/PassThroughTransformOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vUGFzc1Rocm91Z2hUcmFuc2Zvcm1PcGVyYXRvci5qYXZh) | `85.71% <0.00%> (-14.29%)` | :arrow_down: |
   | ... and [136 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...818c0ea](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r640251385



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -61,7 +61,11 @@
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
+  public static final String SEGMENT_TRIM_SIZE = "num.segment.trim.limit";

Review comment:
       Not making it configurable will change the default behavior? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6a55708) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **decrease** coverage by `7.66%`.
   > The diff coverage is `86.34%`.
   
   > :exclamation: Current head 6a55708 differs from pull request most recent head 3470a93. Consider uploading reports for the commit 3470a93 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   - Coverage     73.23%   65.57%   -7.67%     
     Complexity       12       12              
   ============================================
     Files          1439     1441       +2     
     Lines         71333    71559     +226     
     Branches      10334    10368      +34     
   ============================================
   - Hits          52243    46926    -5317     
   - Misses        15578    21244    +5666     
   + Partials       3512     3389     -123     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.57% <86.34%> (+0.22%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ces/PinotSegmentUploadDownloadRestletResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1Bpbm90U2VnbWVudFVwbG9hZERvd25sb2FkUmVzdGxldFJlc291cmNlLmphdmE=) | `10.00% <0.00%> (-45.03%)` | :arrow_down: |
   | [...x/core/minion/generator/TaskGeneratorRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclJlZ2lzdHJ5LmphdmE=) | `76.00% <ø> (-4.00%)` | :arrow_down: |
   | [...erator/transform/PassThroughTransformOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vUGFzc1Rocm91Z2hUcmFuc2Zvcm1PcGVyYXRvci5qYXZh) | `85.71% <0.00%> (-14.29%)` | :arrow_down: |
   | [...not/minion/event/EventObserverFactoryRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXZlbnQvRXZlbnRPYnNlcnZlckZhY3RvcnlSZWdpc3RyeS5qYXZh) | `0.00% <0.00%> (-56.53%)` | :arrow_down: |
   | [...ig/table/SegmentsValidationAndRetentionConfig.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1NlZ21lbnRzVmFsaWRhdGlvbkFuZFJldGVudGlvbkNvbmZpZy5qYXZh) | `100.00% <ø> (ø)` | |
   | [...t/server/api/resources/SegmentMetadataFetcher.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvYXBpL3Jlc291cmNlcy9TZWdtZW50TWV0YWRhdGFGZXRjaGVyLmphdmE=) | `45.00% <21.42%> (-24.05%)` | :arrow_down: |
   | [...ment/processing/collector/SortOrderComparator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3NpbmcvY29sbGVjdG9yL1NvcnRPcmRlckNvbXBhcmF0b3IuamF2YQ==) | `58.33% <58.33%> (ø)` | |
   | [...t/core/plan/AggregationGroupByOrderByPlanNode.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL0FnZ3JlZ2F0aW9uR3JvdXBCeU9yZGVyQnlQbGFuTm9kZS5qYXZh) | `46.34% <83.33%> (+4.23%)` | :arrow_up: |
   | [...pinot/core/plan/maker/InstancePlanMakerImplV2.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL21ha2VyL0luc3RhbmNlUGxhbk1ha2VySW1wbFYyLmphdmE=) | `80.41% <83.72%> (-12.28%)` | :arrow_down: |
   | [...ry/aggregation/groupby/DefaultGroupByExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9ncm91cGJ5L0RlZmF1bHRHcm91cEJ5RXhlY3V0b3IuamF2YQ==) | `98.27% <87.50%> (-1.73%)` | :arrow_down: |
   | ... and [359 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...3470a93](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r645266054



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -65,6 +76,33 @@ public void setUp() {
         new Key(new Object[]{"c", 50, 4.0}),
         new Key(new Object[]{"c", 300, 5.0})
     );
+
+    // GroupKeys: d1: a0 ~ a14, d2: 10.0 ~ 24.0, d3: 1.0 ~ 15.0
+    _groupKeys = new LinkedList<>();

Review comment:
       Changed _groupKeys and _groupResultholders to align with the old tests

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
##########
@@ -41,6 +42,43 @@ public void testGroupByOrderByMVSQLResults(String query, List<Object[]> expected
             expectedResults, expectedResults.size(), expectedDataSchema);
   }
 
+  /**
+   * Tests the in-segment trim option for GroupBy OrderBy query
+   */
+  @Test(dataProvider = "orderByDataProvider")
+  public void testGroupByOrderByMVTrimOptLowLimitSQLResults(String query, List<Object[]> expectedResults,
+      long expectedNumEntriesScannedPostFilter, DataSchema expectedDataSchema) {
+
+    expectedResults = expectedResults.subList(0, expectedResults.size() / 2);

Review comment:
       This test aims to trigger the in-segment trim by providing a low limit number and a low trimSize, so the size of intermediateResult is greater than max(5*limit, trimSize). intermediateResult will be trimmed by the in-segment trim feature and then a limit will be applied (this is unavoidable). Still, we can test to see if everything works in the in-segment trim phase 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r646822602



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
##########
@@ -58,6 +63,14 @@ public AggregationGroupByOrderByPlanNode(IndexSegment indexSegment, QueryContext
     List<ExpressionContext> groupByExpressions = queryContext.getGroupByExpressions();
     assert groupByExpressions != null;
     _groupByExpressions = groupByExpressions.toArray(new ExpressionContext[0]);
+    _queryContext = queryContext;
+
+    // Only trim if there is OrderBy and has a positive trim size
+    if (queryContext.getOrderByExpressions() != null && minSegmentTrimSize > 0) {

Review comment:
       I see




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r645266054



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/table/TableResizerTest.java
##########
@@ -65,6 +76,33 @@ public void setUp() {
         new Key(new Object[]{"c", 50, 4.0}),
         new Key(new Object[]{"c", 300, 5.0})
     );
+
+    // GroupKeys: d1: a0 ~ a14, d2: 10.0 ~ 24.0, d3: 1.0 ~ 15.0
+    _groupKeys = new LinkedList<>();

Review comment:
       Changed _groupKeys and _groupResultholders to align with the old tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r645876332



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -106,8 +116,14 @@ protected IntermediateResultsBlock getNextBlock() {
       groupByExecutor.process(transformBlock);
     }
 
-    // Build intermediate result block based on aggregation group-by result from the executor
-    return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), _dataSchema);
+    // Trim is off or no need to trim
+    if (_trimSize <= 0 || groupByExecutor.getNumGroups() <= _trimSize) {
+      // Build intermediate result block based on aggregation group-by result from the executor
+      return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), _dataSchema);
+    }
+    TableResizer tableResizer = new TableResizer(_dataSchema, _queryContext);
+    Collection<IntermediateRecord> intermediate = groupByExecutor.trimGroupByResult(_trimSize, tableResizer);

Review comment:
       (nit)
   ```suggestion
       Collection<IntermediateRecord> intermediateRecords = groupByExecutor.trimGroupByResult(_trimSize, tableResizer);
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
##########
@@ -58,6 +63,14 @@ public AggregationGroupByOrderByPlanNode(IndexSegment indexSegment, QueryContext
     List<ExpressionContext> groupByExpressions = queryContext.getGroupByExpressions();
     assert groupByExpressions != null;
     _groupByExpressions = groupByExpressions.toArray(new ExpressionContext[0]);
+    _queryContext = queryContext;
+
+    // Only trim if there is OrderBy and has a positive trim size
+    if (queryContext.getOrderByExpressions() != null && minSegmentTrimSize > 0) {

Review comment:
       I feel we might want to pass `minSegmentTrimSize` to the operator, and the operator can decide whether to trim it or not

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -38,28 +42,34 @@
 @SuppressWarnings("rawtypes")
 public class AggregationGroupByOrderByOperator extends BaseOperator<IntermediateResultsBlock> {
   private static final String OPERATOR_NAME = "AggregationGroupByOrderByOperator";
+  private static final int TRIM_OFF = -1;

Review comment:
       Remove

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -187,6 +216,7 @@ public PlanNode makeStreamingSegmentPlanNode(IndexSegment indexSegment, QueryCon
     }
   }
 
+

Review comment:
       (nit) remove




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r640285940



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -61,7 +61,11 @@
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
+  public static final String SEGMENT_TRIM_SIZE = "num.segment.trim.limit";

Review comment:
       We want to configure whether to enable it, but not the threshold in step 1. If it is off, then it is the same as the current behavior




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r644096037



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -105,8 +121,61 @@ public InstancePlanMakerImplV2(QueryExecutorConfig queryExecutorConfig) {
     Preconditions.checkState(_maxInitialResultHolderCapacity <= _numGroupsLimit,
         "Invalid configuration: maxInitialResultHolderCapacity: %d must be smaller or equal to numGroupsLimit: %d",
         _maxInitialResultHolderCapacity, _numGroupsLimit);
-    LOGGER.info("Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}",
-        _maxInitialResultHolderCapacity, _numGroupsLimit);
+    _enableSegmentGroupTrim =
+        queryExecutorConfig.getConfig().getProperty(ENABLE_SEGMENT_GROUP_TRIM, DEFAULT_ENABLE_SEGMENT_GROUP_TRIM);
+    LOGGER.info(
+        "Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}, enableSegmentTrim: {}",
+        _maxInitialResultHolderCapacity, _numGroupsLimit, _enableSegmentGroupTrim);
+  }
+
+  /**
+   * Returns {@code true} if the given aggregation-only without filter QueryContext can be solved with segment metadata,
+   * {@code false} otherwise.
+   * <p>Aggregations supported: COUNT
+   */
+  @VisibleForTesting
+  static boolean isFitForMetadataBasedPlan(QueryContext queryContext) {

Review comment:
       These 2 methods are moved by the format-tools




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (818c0ea) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **decrease** coverage by `7.78%`.
   > The diff coverage is `64.74%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   - Coverage     73.23%   65.44%   -7.79%     
     Complexity       12       12              
   ============================================
     Files          1439     1454      +15     
     Lines         71333    72105     +772     
     Branches      10334    10441     +107     
   ============================================
   - Hits          52243    47192    -5051     
   - Misses        15578    21508    +5930     
   + Partials       3512     3405     -107     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.44% <64.74%> (+0.09%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...roker/requesthandler/BaseBrokerRequestHandler.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcmVxdWVzdGhhbmRsZXIvQmFzZUJyb2tlclJlcXVlc3RIYW5kbGVyLmphdmE=) | `17.37% <0.00%> (-53.88%)` | :arrow_down: |
   | [...e/pinot/common/minion/MergeRollupTaskMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01lcmdlUm9sbHVwVGFza01ldGFkYXRhLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/common/minion/MinionTaskMetadataUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01pbmlvblRhc2tNZXRhZGF0YVV0aWxzLmphdmE=) | `0.00% <0.00%> (-75.00%)` | :arrow_down: |
   | [...mmon/restlet/resources/SegmentServerDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvU2VnbWVudFNlcnZlckRlYnVnSW5mby5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/controller/api/debug/TableDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvZGVidWcvVGFibGVEZWJ1Z0luZm8uamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ces/PinotSegmentUploadDownloadRestletResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1Bpbm90U2VnbWVudFVwbG9hZERvd25sb2FkUmVzdGxldFJlc291cmNlLmphdmE=) | `10.00% <0.00%> (-45.03%)` | :arrow_down: |
   | [...t/controller/api/resources/TableDebugResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1RhYmxlRGVidWdSZXNvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...troller/helix/core/minion/ClusterInfoAccessor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9DbHVzdGVySW5mb0FjY2Vzc29yLmphdmE=) | `32.14% <0.00%> (-47.86%)` | :arrow_down: |
   | [...x/core/minion/generator/TaskGeneratorRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclJlZ2lzdHJ5LmphdmE=) | `76.00% <ø> (-4.00%)` | :arrow_down: |
   | [.../org/apache/pinot/core/common/MinionConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vTWluaW9uQ29uc3RhbnRzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | ... and [429 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...818c0ea](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (03beb3e) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **decrease** coverage by `7.81%`.
   > The diff coverage is `64.74%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   - Coverage     73.23%   65.42%   -7.82%     
     Complexity       12       12              
   ============================================
     Files          1439     1454      +15     
     Lines         71333    72091     +758     
     Branches      10334    10441     +107     
   ============================================
   - Hits          52243    47162    -5081     
   - Misses        15578    21517    +5939     
   + Partials       3512     3412     -100     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.42% <64.74%> (+0.07%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...roker/requesthandler/BaseBrokerRequestHandler.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcmVxdWVzdGhhbmRsZXIvQmFzZUJyb2tlclJlcXVlc3RIYW5kbGVyLmphdmE=) | `17.37% <0.00%> (-53.88%)` | :arrow_down: |
   | [...e/pinot/common/minion/MergeRollupTaskMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01lcmdlUm9sbHVwVGFza01ldGFkYXRhLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/common/minion/MinionTaskMetadataUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01pbmlvblRhc2tNZXRhZGF0YVV0aWxzLmphdmE=) | `0.00% <0.00%> (-75.00%)` | :arrow_down: |
   | [...mmon/restlet/resources/SegmentServerDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvU2VnbWVudFNlcnZlckRlYnVnSW5mby5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/controller/api/debug/TableDebugInfo.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvZGVidWcvVGFibGVEZWJ1Z0luZm8uamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ces/PinotSegmentUploadDownloadRestletResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1Bpbm90U2VnbWVudFVwbG9hZERvd25sb2FkUmVzdGxldFJlc291cmNlLmphdmE=) | `10.00% <0.00%> (-45.03%)` | :arrow_down: |
   | [...t/controller/api/resources/TableDebugResource.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9hcGkvcmVzb3VyY2VzL1RhYmxlRGVidWdSZXNvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...troller/helix/core/minion/ClusterInfoAccessor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9DbHVzdGVySW5mb0FjY2Vzc29yLmphdmE=) | `32.14% <0.00%> (-47.86%)` | :arrow_down: |
   | [...x/core/minion/generator/TaskGeneratorRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclJlZ2lzdHJ5LmphdmE=) | `76.00% <ø> (-4.00%)` | :arrow_down: |
   | [.../org/apache/pinot/core/common/MinionConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vTWluaW9uQ29uc3RhbnRzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | ... and [435 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...03beb3e](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r641181681



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -357,4 +372,88 @@ public Comparable extract(Record record) {
       }
     }
   }
+
+  public static class fullIntermediateResult extends IntermediateRecord {

Review comment:
       Now directly extend IntermediateRecord
   
   > You can directly change the `IntermediateRecord` to include the record
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r641181215



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -117,6 +119,19 @@ public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
     _dataSchema = dataSchema;
   }
 
+  /**
+   * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult}.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
+                                  @Nullable AggregationGroupByResult aggregationGroupByResults,

Review comment:
       Followed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#issuecomment-849847415


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6991](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1416bf7) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0185482d9da2ac299b4b15bcd2998165ccbbdf71?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0185482) will **increase** coverage by `0.14%`.
   > The diff coverage is `87.85%`.
   
   > :exclamation: Current head 1416bf7 differs from pull request most recent head 36b304d. Consider uploading reports for the commit 36b304d to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6991/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6991      +/-   ##
   ============================================
   + Coverage     73.23%   73.38%   +0.14%     
     Complexity       12       12              
   ============================================
     Files          1439     1441       +2     
     Lines         71333    71549     +216     
     Branches      10334    10368      +34     
   ============================================
   + Hits          52243    52508     +265     
   + Misses        15578    15513      -65     
   - Partials       3512     3528      +16     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `42.02% <63.79%> (+0.17%)` | :arrow_up: |
   | unittests | `65.58% <86.97%> (+0.23%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...erator/transform/PassThroughTransformOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vUGFzc1Rocm91Z2hUcmFuc2Zvcm1PcGVyYXRvci5qYXZh) | `85.71% <0.00%> (-14.29%)` | :arrow_down: |
   | [...t/server/api/resources/SegmentMetadataFetcher.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvYXBpL3Jlc291cmNlcy9TZWdtZW50TWV0YWRhdGFGZXRjaGVyLmphdmE=) | `45.00% <21.42%> (-24.05%)` | :arrow_down: |
   | [...ment/processing/collector/SortOrderComparator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3NpbmcvY29sbGVjdG9yL1NvcnRPcmRlckNvbXBhcmF0b3IuamF2YQ==) | `58.33% <58.33%> (ø)` | |
   | [...t/core/plan/AggregationGroupByOrderByPlanNode.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL0FnZ3JlZ2F0aW9uR3JvdXBCeU9yZGVyQnlQbGFuTm9kZS5qYXZh) | `45.00% <80.00%> (+2.89%)` | :arrow_up: |
   | [...ry/aggregation/groupby/DefaultGroupByExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9ncm91cGJ5L0RlZmF1bHRHcm91cEJ5RXhlY3V0b3IuamF2YQ==) | `98.27% <87.50%> (-1.73%)` | :arrow_down: |
   | [.../segment/processing/collector/ConcatCollector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3NpbmcvY29sbGVjdG9yL0NvbmNhdENvbGxlY3Rvci5qYXZh) | `88.88% <88.88%> (-0.86%)` | :arrow_down: |
   | [...segment/processing/serde/GenericRowSerializer.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3Npbmcvc2VyZGUvR2VuZXJpY1Jvd1NlcmlhbGl6ZXIuamF2YQ==) | `93.22% <93.22%> (ø)` | |
   | [...perator/combine/GroupByOrderByCombineOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9jb21iaW5lL0dyb3VwQnlPcmRlckJ5Q29tYmluZU9wZXJhdG9yLmphdmE=) | `77.77% <94.11%> (+2.39%)` | :arrow_up: |
   | [...gment/processing/serde/GenericRowDeserializer.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zZWdtZW50L3Byb2Nlc3Npbmcvc2VyZGUvR2VuZXJpY1Jvd0Rlc2VyaWFsaXplci5qYXZh) | `94.91% <94.91%> (ø)` | |
   | [...org/apache/pinot/core/data/table/TableResizer.java](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1RhYmxlUmVzaXplci5qYXZh) | `93.95% <96.96%> (+1.70%)` | :arrow_up: |
   | ... and [35 more](https://codecov.io/gh/apache/incubator-pinot/pull/6991/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0185482...36b304d](https://codecov.io/gh/apache/incubator-pinot/pull/6991?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r641733538



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,22 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public Collection<TableResizer.IntermediateRecord> trimGroupByResult(boolean enableSegmentGroupTrim,
+      int threshold, TableResizer tableResizer) {
+    // Check if a trim is necessary
+    int keyNum = 0;
+    if (_hasMVGroupByExpression) {
+      keyNum = _mvGroupKeys.length;
+    } else {
+      keyNum = _svGroupKeys.length;
+    }

Review comment:
       This part is incorrect. `_svGroupKeys` and `_mvGroupKeys` are just buffers and the length is always 10000
   You might want to add a API in `GroupKeyGenerator` to return the number of keys

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -249,6 +327,44 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
     Comparable extract(Record record);
   }
 
+  /**
+   * Helper class to store a subset of Record fields
+   * IntermediateRecord is derived from a Record
+   * Some of the main properties of an IntermediateRecord are:
+   *
+   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
+   * 2. For values, IntermediateRecord should only have the columns needed for order by
+   * 3. Inside the values, the columns should be ordered by the order by sequence
+   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * 5. There is an optional field to store the original record. Each segment can keep the intermediate result in this
+   * form to prevent constructing IntermediateRecord again in the server.
+   */
+  public static class IntermediateRecord {

Review comment:
       Make this a separate class since it is shared by multiple classes

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -57,6 +58,7 @@
   private AggregationGroupByResult _aggregationGroupByResult;
   private List<Map<String, Object>> _combinedAggregationGroupByResult;
   private List<ProcessingException> _processingExceptions;
+  private Collection<TableResizer.IntermediateRecord> _intermediateCollection;

Review comment:
       ```suggestion
     private Collection<TableResizer.IntermediateRecord> _intermediateRecords;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -43,23 +46,29 @@
   private final ExpressionContext[] _groupByExpressions;
   private final int _maxInitialResultHolderCapacity;
   private final int _numGroupsLimit;
+  private final int _inSegmentResultLimit;

Review comment:
       `_inSegmentResultLimit` and `_enableSegmentGroupTrim` can be combined into `_trimSize`, and use negative value to denote that trim is off

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -49,6 +53,7 @@
   private final Map<ExpressionContext, Integer> _groupByExpressionIndexMap;
   private final AggregationFunction[] _aggregationFunctions;
   private final Map<FunctionContext, Integer> _aggregationFunctionIndexMap;
+  private final boolean _hasOrderBy;

Review comment:
       We should only use `TableResizer` when the query has order-by. Please check it on the caller side before creating the `TableResizer`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -214,23 +230,85 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
   }
 
   /**
-   * Helper class to store a subset of Record fields
-   * IntermediateRecord is derived from a Record
-   * Some of the main properties of an IntermediateRecord are:
-   *
-   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
-   * 2. For values, IntermediateRecord should only have the columns needed for order by
-   * 3. Inside the values, the columns should be ordered by the order by sequence
-   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * Helper class to make an IntermediateRecord with the record.
    */
-  private static class IntermediateRecord {
-    final Key _key;
-    final Comparable[] _values;
+  private IntermediateRecord getInSegmentIntermediateRecord(Key key, Record record) {

Review comment:
       Remove this method and directly modify `getIntermediateRecord()`. The cached record can be used to simplify line 184: `recordsMap.get(recordToRetain._key)` -> `recordToRetain._record`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
##########
@@ -214,23 +230,85 @@ private IntermediateRecord getIntermediateRecord(Key key, Record record) {
   }
 
   /**
-   * Helper class to store a subset of Record fields
-   * IntermediateRecord is derived from a Record
-   * Some of the main properties of an IntermediateRecord are:
-   *
-   * 1. Key in IntermediateRecord is expected to be identical to the one in the Record
-   * 2. For values, IntermediateRecord should only have the columns needed for order by
-   * 3. Inside the values, the columns should be ordered by the order by sequence
-   * 4. For order by on aggregations, final results should extracted if the intermediate result is non-comparable
+   * Helper class to make an IntermediateRecord with the record.
    */
-  private static class IntermediateRecord {
-    final Key _key;
-    final Comparable[] _values;
+  private IntermediateRecord getInSegmentIntermediateRecord(Key key, Record record) {
+    Comparable[] intermediateRecordValues = new Comparable[_numOrderByExpressions];
+    for (int i = 0; i < _numOrderByExpressions; i++) {
+      intermediateRecordValues[i] = _orderByValueExtractors[i].extract(record);
+    }
+    return new IntermediateRecord(key, intermediateRecordValues, record);
+  }
 
-    IntermediateRecord(Key key, Comparable[] values) {
-      _key = key;
-      _values = values;
+  /**
+   * Trim the aggregation results using a priority queue and returns a the priority queue.
+   * This method is to be called from individual segment if the intermediate results need to be trimmed.
+   * The use case now is Multi-Segment GroupBy OrderBy query.
+   */
+  public PriorityQueue<IntermediateRecord> trimInSegmentResults(Iterator<GroupKeyGenerator.GroupKey> groupKeyIterator,
+      GroupByResultHolder[] _groupByResultHolders, int size) {
+    if (!groupKeyIterator.hasNext() || _groupByResultHolders.length == 0 || size == 0) {
+      return new PriorityQueue<>();
+    }
+    int numAggregationFunctions = _aggregationFunctions.length;
+    int numColumns = numAggregationFunctions + _numGroupByExpressions;
+
+    // Get comparator
+    Comparator<IntermediateRecord> comparator = _intermediateRecordComparator.reversed();
+    PriorityQueue<IntermediateRecord> priorityQueue = new PriorityQueue<>(size, comparator);
+    while (groupKeyIterator.hasNext()) {
+      // Iterate over keys
+      GroupKeyGenerator.GroupKey groupKey = groupKeyIterator.next();
+      Object[] keys = groupKey._keys;
+      Object[] values = Arrays.copyOf(keys, numColumns);
+      int groupId = groupKey._groupId;
+      for (int i = 0; i < numAggregationFunctions; i++) {
+        values[_numGroupByExpressions + i] =
+            _aggregationFunctions[i].extractGroupByResult(_groupByResultHolders[i], groupId);
+      }
+      // {key, intermediate_record, record}
+      IntermediateRecord intermediateRecord = getInSegmentIntermediateRecord(new Key(keys), new Record(values));
+      if (priorityQueue.size() < size) {
+        priorityQueue.offer(intermediateRecord);
+      } else {
+        IntermediateRecord peek = priorityQueue.peek();
+        if (comparator.compare(peek, intermediateRecord) < 0) {
+          priorityQueue.poll();
+          priorityQueue.offer(intermediateRecord);
+        }
+      }
     }
+    return priorityQueue;
+  }
+
+  /**
+   * Build a list of intermediate record and return the list.
+   * This method is to be called from individual segment if the intermediate results doesn't need to be trimmed.
+   * The use case now is Multi-Segment GroupBy OrderBy query.
+   */
+  public List<IntermediateRecord> buildInSegmentResults(Iterator<GroupKeyGenerator.GroupKey> groupKeyIterator,

Review comment:
       When trim is off, we should avoid creating the list because it could lead to memory issue

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,22 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public Collection<TableResizer.IntermediateRecord> trimGroupByResult(boolean enableSegmentGroupTrim,
+      int threshold, TableResizer tableResizer) {

Review comment:
       ```suggestion
         int trimSize, TableResizer tableResizer) {
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -117,6 +119,19 @@ public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
     _dataSchema = dataSchema;
   }
 
+  /**
+   * Constructor for aggregation group-by order-by result with {@link AggregationGroupByResult} and
+   * with a collection of intermediate records.
+   */
+  public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
+      @Nullable AggregationGroupByResult aggregationGroupByResults,

Review comment:
       Do we need to pass `aggregationGroupByResults` here? Also the intermediate records should never be null

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,22 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public Collection<TableResizer.IntermediateRecord> trimGroupByResult(boolean enableSegmentGroupTrim,

Review comment:
       Remove `enableSegmentGroupTrim`. When this method is invoked, it should always trim the result

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -51,40 +51,56 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import static org.apache.pinot.core.util.GroupByUtils.getTableCapacity;
+
 
 /**
  * The <code>InstancePlanMakerImplV2</code> class is the default implementation of {@link PlanMaker}.
  */
 public class InstancePlanMakerImplV2 implements PlanMaker {
-  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
-
   public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY_KEY = "max.init.group.holder.capacity";
   public static final int DEFAULT_MAX_INITIAL_RESULT_HOLDER_CAPACITY = 10_000;
   public static final String NUM_GROUPS_LIMIT = "num.groups.limit";
+  public static final String ENABLE_SEGMENT_GROUP_TRIM = "enable.segment.group.trim";
   public static final int DEFAULT_NUM_GROUPS_LIMIT = 100_000;
-
+  public static final boolean DEFAULT_ENABLE_SEGMENT_GROUP_TRIM = false;
   // set as pinot.server.query.executor.groupby.trim.threshold
   public static final String GROUPBY_TRIM_THRESHOLD = "groupby.trim.threshold";
   public static final int DEFAULT_GROUPBY_TRIM_THRESHOLD = 1_000_000;
-
+  private static final Logger LOGGER = LoggerFactory.getLogger(InstancePlanMakerImplV2.class);
   private final int _maxInitialResultHolderCapacity;
   // Limit on number of groups stored for each segment, beyond which no new group will be created
   private final int _numGroupsLimit;
   // Used for SQL GROUP BY (server combine)
   private final int _groupByTrimThreshold;
+  private final boolean _enableSegmentGroupTrim;
+
+  @VisibleForTesting
+  private int _inSegmentTrimLimit = -1;

Review comment:
       Since we need it for testing purpose, let's directly make it configurable as `_minSegmentTrimSize` and default to 5000

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
##########
@@ -281,6 +296,16 @@ public void setNumGroupsLimitReached(boolean numGroupsLimitReached) {
     _numGroupsLimitReached = numGroupsLimitReached;
   }
 
+  /**
+   * Get an iterator for the intermediate record collection. Should only be called if _intermediateCollection is present
+   */
+  public Iterator<TableResizer.IntermediateRecord> getIntermediateResultIterator() {

Review comment:
       Directly return the collection instead of the iterator, and annotate it with nullable

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -106,8 +116,13 @@ protected IntermediateResultsBlock getNextBlock() {
       groupByExecutor.process(transformBlock);
     }
 
+    if (!_tableResizer.getOrderByStatus()) {
+      return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), _dataSchema);
+    }
+    Collection<TableResizer.IntermediateRecord> intermediate =
+        groupByExecutor.trimGroupByResult(_enableSegmentGroupTrim, _inSegmentResultLimit, _tableResizer);
     // Build intermediate result block based on aggregation group-by result from the executor
-    return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), _dataSchema);
+    return new IntermediateResultsBlock(_aggregationFunctions, groupByExecutor.getResult(), intermediate, _dataSchema);

Review comment:
       Fall back to the old behavior when there is no segment trim. Keeping a list can increase the memory overhead

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
##########
@@ -85,6 +94,7 @@ public AggregationGroupByOrderByOperator(AggregationFunction[] aggregationFuncti
     }
 
     _dataSchema = new DataSchema(columnNames, columnDataTypes);
+    _tableResizer = new TableResizer(_dataSchema, queryContext);

Review comment:
       Don't create table resizer here, create it in `getNextBlock()` only when needed

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -137,8 +206,15 @@ public PlanNode makeSegmentPlanNode(IndexSegment indexSegment, QueryContext quer
         QueryOptions queryOptions = new QueryOptions(queryContext.getQueryOptions());
         // new Combine operator only when GROUP_BY_MODE explicitly set to SQL
         if (queryOptions.isGroupByModeSQL()) {
+          // Calculate trim limit = max(limit * 5, 5000)

Review comment:
       Don't calculate it here, move it into the `AggregationGroupByOrderByPlanNode`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
##########
@@ -105,8 +121,61 @@ public InstancePlanMakerImplV2(QueryExecutorConfig queryExecutorConfig) {
     Preconditions.checkState(_maxInitialResultHolderCapacity <= _numGroupsLimit,
         "Invalid configuration: maxInitialResultHolderCapacity: %d must be smaller or equal to numGroupsLimit: %d",
         _maxInitialResultHolderCapacity, _numGroupsLimit);
-    LOGGER.info("Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}",
-        _maxInitialResultHolderCapacity, _numGroupsLimit);
+    _enableSegmentGroupTrim =
+        queryExecutorConfig.getConfig().getProperty(ENABLE_SEGMENT_GROUP_TRIM, DEFAULT_ENABLE_SEGMENT_GROUP_TRIM);
+    LOGGER.info(
+        "Initializing plan maker with maxInitialResultHolderCapacity: {}, numGroupsLimit: {}, enableSegmentTrim: {}",
+        _maxInitialResultHolderCapacity, _numGroupsLimit, _enableSegmentGroupTrim);
+  }
+
+  /**
+   * Returns {@code true} if the given aggregation-only without filter QueryContext can be solved with segment metadata,
+   * {@code false} otherwise.
+   * <p>Aggregations supported: COUNT
+   */
+  @VisibleForTesting
+  static boolean isFitForMetadataBasedPlan(QueryContext queryContext) {

Review comment:
       Why moving these 2 methods? Seems unrelated

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
##########
@@ -43,21 +43,27 @@
   private final IndexSegment _indexSegment;
   private final int _maxInitialResultHolderCapacity;
   private final int _numGroupsLimit;
+  private final boolean _enableSegmentGroupTrim;
+  private final int _inSegmentTrimLimit;

Review comment:
       Change it to `_minSegmentTrimSize` and make it default to 5000, then we can pass it to `GroupByUtils` to calculate the actual trim size




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] wuwenw commented on a change in pull request #6991: Introduce in-Segment Trim for GroupBy OrderBy Query

Posted by GitBox <gi...@apache.org>.
wuwenw commented on a change in pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991#discussion_r644097433



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
##########
@@ -144,4 +147,22 @@ protected void aggregate(TransformBlock transformBlock, int length, int function
   public AggregationGroupByResult getResult() {
     return new AggregationGroupByResult(_groupKeyGenerator, _aggregationFunctions, _groupByResultHolders);
   }
+
+  @Override
+  public Collection<TableResizer.IntermediateRecord> trimGroupByResult(boolean enableSegmentGroupTrim,

Review comment:
       > Remove `enableSegmentGroupTrim`. When this method is invoked, it should always trim the result
   
   I believe a simple option to turn on/off is good for users




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org