You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/07/15 22:22:23 UTC

[GitHub] [incubator-pinot] mqliang opened a new pull request #7167: Bump up data table version to V4 to optimize space usage.

mqliang opened a new pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167


   ## Description
   <!-- Add a description of your PR here.
   A good description should include pointers to an issue or design document, etc.
   -->
   
   ref: https://github.com/apache/incubator-pinot/issues/6726
   
   This PR bumps up data table version to V4 to address TODOs in DataTableBuilder:
   
    * Fix float size: V2/V3 allocate 8 bytes to store float, while only use 4 bytes. V4 allocate and use 4 bytes.
    * V2/V3 use one dictionary for each columns to map String->Integer, while V4 use a common dictionary for all columns.
    * V4 store bytes as variable size data instead of String.
   
   In terms of the TODO of "Given a data schema, write all values one by one instead of using rowId and colId to position (save time)." Based on the benchmark of https://github.com/apache/incubator-pinot/issues/6720, it's over-engineering thus not  needed. 
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881150849


   Adding @yupeng9 and @elonazoulay for visibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang closed pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang closed pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881103381


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7167](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (82e1dda) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/6d0ab91de409fec8ddbf816c2348c8b95a37e393?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6d0ab91) will **decrease** coverage by `31.77%`.
   > The diff coverage is `80.78%`.
   
   > :exclamation: Current head 82e1dda differs from pull request most recent head 5d1b603. Consider uploading reports for the commit 5d1b603 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/7167/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7167       +/-   ##
   =============================================
   - Coverage     73.49%   41.72%   -31.78%     
   + Complexity       92        7       -85     
   =============================================
     Files          1500     1501        +1     
     Lines         73758    73998      +240     
     Branches      10637    10667       +30     
   =============================================
   - Hits          54209    30872    -23337     
   - Misses        16005    40511    +24506     
   + Partials       3544     2615      -929     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `41.72% <80.78%> (-0.18%)` | :arrow_down: |
   | unittests | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../java/org/apache/pinot/common/utils/DataTable.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVRhYmxlLmphdmE=) | `88.88% <ø> (ø)` | |
   | [...e/pinot/core/common/datatable/DataTableImplV2.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMi5qYXZh) | `0.00% <0.00%> (-77.78%)` | :arrow_down: |
   | [...e/pinot/core/common/datatable/DataTableImplV3.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMy5qYXZh) | `0.00% <0.00%> (-96.03%)` | :arrow_down: |
   | [...he/pinot/core/common/datatable/DataTableUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZVV0aWxzLmphdmE=) | `77.77% <0.00%> (-13.68%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/CommonConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQ29tbW9uQ29uc3RhbnRzLmphdmE=) | `0.00% <ø> (-35.00%)` | :arrow_down: |
   | [.../pinot/core/common/datatable/DataTableBuilder.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUJ1aWxkZXIuamF2YQ==) | `61.11% <35.48%> (-24.24%)` | :arrow_down: |
   | [...e/pinot/core/common/datatable/DataTableImplV4.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWNC5qYXZh) | `96.29% <96.29%> (ø)` | |
   | [...che/pinot/core/common/datatable/BaseDataTable.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0Jhc2VEYXRhVGFibGUuamF2YQ==) | `56.29% <100.00%> (-25.39%)` | :arrow_down: |
   | [.../pinot/core/common/datatable/DataTableFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUZhY3RvcnkuamF2YQ==) | `28.57% <100.00%> (-38.10%)` | :arrow_down: |
   | [...c/main/java/org/apache/pinot/common/tier/Tier.java](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [955 more](https://codecov.io/gh/apache/incubator-pinot/pull/7167/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6d0ab91...5d1b603](https://codecov.io/gh/apache/incubator-pinot/pull/7167?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-882711007


   test failed due to flaky issue. close and re-open to trigger a re-run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881080649


   > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   
   @xiangfu0 
   We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying a old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881078182


   test failed due to flaky issue. close and re-open to trigger a re-run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-882711007


   test failed due to flaky issue. close and re-open to trigger a re-run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#discussion_r673474531



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -44,21 +44,24 @@
   protected DataSchema _dataSchema;
   protected int[] _columnOffsets;
   protected int _rowSizeInBytes;
+  // _dictionaryMap is only used in V2/V3, where each column use one dictionary to map String->Integer
   protected Map<String, Map<Integer, String>> _dictionaryMap;
+  // _dictionary is only used in V4, where all columns use a common dictionary to map String->Integer
+  protected Map<Integer, String> _dictionary;

Review comment:
       We can further optimize this into a `List`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881570987


   FWIW. I was going to suggest that we deprecate V2 in this commit (and remove support for it in 0.9+)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-882711007


   test failed due to flaky issue. close and re-open to trigger a re-run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang closed pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang closed pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv edited a comment on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mayankshriv edited a comment on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881150485


   > > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   > > So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   > 
   > @xiangfu0 @mayankshriv
   > We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying an old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.
   
   @mqliang We do not own the Presto code. On top of that, there are two of them PrestoDB and Trino. And is being used in production at Uber and City Storage Systems. Also, we are still dealing with the v3 change that was recently made.
   
   FWIW, I think either DataTable should not have been exposed outside of Pinot, or it should have been treated as an external interface that should not be changed in backward incompatible way. But given that we are in the situation, we should probably acknowledge that and avoid making further backward incompatible changes until we resolve the situation across the communities.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] dharakk commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
dharakk commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881596283


   > > > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   > > > So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   > > 
   > > 
   > > @xiangfu0 @mayankshriv
   > > We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying an old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.
   > 
   > @mqliang We do not own the Presto code. On top of that, there are two of them PrestoDB and Trino. And is being used in production at Uber and City Storage Systems. Also, we are still dealing with the v3 change that was recently made.
   > 
   > FWIW, I think either DataTable should not have been exposed outside of Pinot, or it should have been treated as an external interface that should not be changed in backward incompatible way. But given that we are in the situation, we should probably acknowledge that and avoid making further backward incompatible changes until we resolve the situation across the communities.
   > 
   > Thoughts?
   
   Hi +1 to this! Uber has use cases using the direct segment query path via Presto and backward incompatible change will be troublesome. Would be great to have the protocol to specify the version first and then the datatable upgrade in the following releases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881150485


   > > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   > > So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   > 
   > @xiangfu0 @mayankshriv
   > We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying an old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.
   
   @mqliang We do not own the Presto code. On top of that, there are two of them PrestoDB and Trino. And is being used in production at Uber and City Storage Systems. Also, we are still dealing with the v3 change that was recently made.
   
   FWIW, I thikn either DataTable should not have been exposed outside of Pinot, or it should have been treated as an external interface that should not be changed in backward incompatible way. But given that we are in the situation, we should probably acknowledge that and avoid making further backward incompatible changes until we resolve the situation across the communities.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881570533


   > > > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   > > > So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   > > 
   > > 
   > > @xiangfu0 @mayankshriv
   > > We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying an old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.
   > 
   > @mqliang We do not own the Presto code. On top of that, there are two of them PrestoDB and Trino. And is being used in production at Uber and City Storage Systems. Also, we are still dealing with the v3 change that was recently made.
   > 
   > FWIW, I think either DataTable should not have been exposed outside of Pinot, or it should have been treated as an external interface that should not be changed in backward incompatible way. But given that we are in the situation, we should probably acknowledge that and avoid making further backward incompatible changes until we resolve the situation across the communities.
   > 
   > Thoughts?
   
   Totally agree that we should not have exposed DataTable to begin with.  I believe we do need this optimization, so adding something in the protocol (or, maybe adding a capability handshake in the beginning?) is workable.  Note that adding this in the beginning will mean adding a config to the broker to go ahead and use v2,v3,v4 in its requests. Since we upgrade brokers first, this will also mean  for the server to look (for each request) which version it needs to return with, and exception if it does not support that version (for a robust protocol design). It is also possible for the server to indicate its support in zookeeper under its instance config, and the broker can cache it on a per server basis along with the routing table. Either handshake or zk seems to be a workable solution, probably better than adding something in every query path.
   
   Can we also consider adding an spi interface for datatable, and have the trino/presto community make a one-time move to use that instead? Over time, we can support the concept of a session, and retain such things at session level rather than every query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang closed pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang closed pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] xiangfu0 commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881153648


   > > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   > > So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   > 
   > @mqliang I have the same concern as @xiangfu0. We already ran into this issue when you did the v2 to v3 upgrade. Even though this might be thought of as internal Pinot data structure, there are connectors that use this data table, making it an externally exposed data structure, and we need to be careful with incrementing version numbers in backward incompatible way.
   
   Agreed. I think the current blocker is that both v3/v4 are not part of the 0.7.1 release, so there is no way for external dependencies to upgrade. Ideally, this feature should be checked in for 1 or 2 releases then we can make the switch.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang closed pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang closed pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#discussion_r673478751



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -44,21 +44,24 @@
   protected DataSchema _dataSchema;
   protected int[] _columnOffsets;
   protected int _rowSizeInBytes;
+  // _dictionaryMap is only used in V2/V3, where each column use one dictionary to map String->Integer
   protected Map<String, Map<Integer, String>> _dictionaryMap;
+  // _dictionary is only used in V4, where all columns use a common dictionary to map String->Integer
+  protected Map<Integer, String> _dictionary;

Review comment:
       That's a good idea, using the index of list as an implicit dicId.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881080524


   > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   > So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   
   @mqliang I have the same concern as @xiangfu0. We already ran into this issue when you did the v2 to v3 upgrade. Even though this might be thought of as internal Pinot data structure, there are connectors that use this data table, making it an externally exposed data structure, and we need to be careful with incrementing version numbers in backward incompatible way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] xiangfu0 commented on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881052161


   since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use? 
   So the Pinot server can work with v3 broker/v4 broker or v2 presto.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang edited a comment on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang edited a comment on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881080649


   > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   
   @xiangfu0 @mayankshriv
   We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying a old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang edited a comment on pull request #7167: Bump up data table version to V4 to optimize space usage.

Posted by GitBox <gi...@apache.org>.
mqliang edited a comment on pull request #7167:
URL: https://github.com/apache/incubator-pinot/pull/7167#issuecomment-881080649


   > since presto connector will access pinot server with segment query and wrap up DataTable by itself. Is it possible to make the query itself contain the version of which DataTable protocol to use?
   So the Pinot server can work with v3 broker/v4 broker or v2 presto.
   
   @xiangfu0 @mayankshriv
   We will eventually stop supporting V2 and V3, so it's better to update codebase of presto connector to follow up the most recent pinot. But I do think it a good idea to to allow client specify DataTable version, this will make client happy: client only need make minimal change after pinot is upgraded( by specifying an old DataTable version), this will give client developer more time to upgrade their code. Let's pull @mcvsubbu @siddharthteotia to chime in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org