You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "swaminathanmanish (via GitHub)" <gi...@apache.org> on 2023/12/06 00:58:16 UTC

[PR] (WIP) Fixing partitioning function for byte array [pinot]

swaminathanmanish opened a new pull request, #12102:
URL: https://github.com/apache/pinot/pull/12102

   ##Description
   Our partitioning functions invokes the toString() on an Object prior to computing the partition. For native byte array, toString() translates to an address. Hence two byte arrays that have the same content, will differ, which is incorrect. [This PR ](https://github.com/apache/pinot/pull/10110) addresses issue for the realtime path. Attempting to move the logic to a common/base method to make this change available across all partitioning functions and modes (realtime/offline)
   
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] (WIP) Fixing partitioning function for byte array [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12102:
URL: https://github.com/apache/pinot/pull/12102#discussion_r1418058748


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java:
##########
@@ -32,6 +33,18 @@
  */
 public interface PartitionFunction extends Serializable {
 
+  /**
+   * Translates value to the correct instance type before invoking partitioning function
+   * @param value
+   * @return
+   */
+  default int getValueToPartition(Object value) {

Review Comment:
   getPartitionFromValue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] (WIP) Fixing partitioning function for byte array [pinot]

Posted by "swaminathanmanish (via GitHub)" <gi...@apache.org>.
swaminathanmanish commented on PR #12102:
URL: https://github.com/apache/pinot/pull/12102#issuecomment-1843679278

   @snleee - Please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fixing partitioning function for byte array [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12102:
URL: https://github.com/apache/pinot/pull/12102#issuecomment-1846239961

   See #12115 for more general handling


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fixing partitioning function for byte array [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12102:
URL: https://github.com/apache/pinot/pull/12102#discussion_r1419443916


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java:
##########
@@ -32,6 +33,18 @@
  */
 public interface PartitionFunction extends Serializable {
 
+  /**
+   * Translates value to the correct instance type before invoking partitioning function
+   * @param value
+   * @return
+   */
+  default int getValueToPartition(Object value) {

Review Comment:
   Don't add this new method. `PartitionFunction` is not pluggable, so let's fix all implementations instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fixing partitioning function for byte array [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12102:
URL: https://github.com/apache/pinot/pull/12102#issuecomment-1845905109

   Checking the code, we should always pass string value to `getPartition()`, or it will break in partition pruning


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fixing partitioning function for byte array [pinot]

Posted by "swaminathanmanish (via GitHub)" <gi...@apache.org>.
swaminathanmanish commented on PR #12102:
URL: https://github.com/apache/pinot/pull/12102#issuecomment-1845963047

   > Checking the code, we should always pass string value to `getPartition()`, or it will break in partition pruning
   
   Thanks for taking a look. So the suggestion is to handle the byte array logic in all the partitioning functions individually, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] (WIP) Fixing partitioning function for byte array [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on PR #12102:
URL: https://github.com/apache/pinot/pull/12102#issuecomment-1843825004

   @Jackie-Jiang Can you take a look into this? 
   
   According to @mayankshriv and the previous PR https://github.com/apache/pinot/pull/10110 , it mentions that `In the offline flow, we already convert byte[] type to ByteArray for the MurmurPartition function to handle it correctly.`.
   
   We were not able to find this part. Can you confirm on this?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] (WIP) Fixing partitioning function for byte array [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12102:
URL: https://github.com/apache/pinot/pull/12102#issuecomment-1841932350

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `2 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`d7c76b9`)](https://app.codecov.io/gh/apache/pinot/commit/d7c76b9fbb5a659da0da6276c6d9833ba1894e0d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.62% compared to head [(`101a1f2`)](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 46.77%.
   > Report is 16 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...local/indexsegment/mutable/MutableSegmentImpl.java](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvbXV0YWJsZS9NdXRhYmxlU2VnbWVudEltcGwuamF2YQ==) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12102       +/-   ##
   =============================================
   - Coverage     61.62%   46.77%   -14.86%     
   + Complexity     1152      943      -209     
   =============================================
     Files          2389     1793      -596     
     Lines        129824    94507    -35317     
     Branches      20083    15268     -4815     
   =============================================
   - Hits          80009    44202    -35807     
   - Misses        43989    47161     +3172     
   + Partials       5826     3144     -2682     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <75.00%> (-14.73%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <75.00%> (-14.85%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <75.00%> (-14.86%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <75.00%> (-14.86%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <75.00%> (-0.14%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12102?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org