You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/17 22:02:05 UTC

[GitHub] [pinot] somandal opened a new pull request, #8917: Add support for querying raw multi-value columns for offline and fixed width realtime segments

somandal opened a new pull request, #8917:
URL: https://github.com/apache/pinot/pull/8917

   This PR adds support for querying multi-value columns in RAW format (without dictionary). Support to create segments with multi-value columns with dictionary disabled already existed. On trying to create a multi-value RAW column and querying it, we ran into this issue: https://github.com/apache/pinot/issues/8875
   
   This PR adds MV RAW support for the following types of columns:
   
   - Offline segments with MV columns of types INT, LONG, FLOAT, DOUBLE, and STRING
   - Mutable (realtime) segments with MV columns of INT, LONG, FLOAT, and DOUBLE
   
   Support for variable length MV RAW columns for Mutable segments will be added in the future as it requires a mutable variable length writer.
   
   Test cases for various types of queries have been added to ensure that MV RAW columns can be supported.
   
   cc @siddharthteotia  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
somandal commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1163194506

   > This PR adds support for querying multi-value columns in RAW format (without dictionary). Support to create segments with multi-value columns with dictionary disabled already existed. On trying to create a multi-value RAW column and querying it, we ran into this issue: #8875
   > 
   > This PR adds MV RAW support for the following types of columns:
   > 
   > * Offline segments with MV columns of types INT, LONG, FLOAT, DOUBLE, and STRING
   > * Mutable (realtime) segments with MV columns of INT, LONG, FLOAT, and DOUBLE
   > 
   > TODO - Support for variable length MV RAW columns for Mutable segments will be added in the future as it requires a mutable variable length writer.
   > 
   > Test cases for various types of queries have been added to ensure that MV RAW columns can be supported.
   > 
   > cc @siddharthteotia
   
   Thanks for taking a look and for the suggestion to break this down. I've got the first part out for review in this PR: https://github.com/apache/pinot/pull/8953
   
   
   > This is great, thanks for adding all the tests!
   > 
   > This PR is a little bit too large, and contains changes to lots of modules. I'd suggest breaking them into smaller ones so that each PR only focus on one thing. That can make both review and future tracking easier. We can break the PR into the following smaller scopes:
   > 
   > * Change `getValueType` to `getStoredType`
   > * New APIs to ForwardIndexReader and the implementations
   > * Filter support / group by support etc.
   
   Thanks for the suggestion @Jackie-Jiang! I've opened a new PR for the second one on your list "New APIs to ForwardIndexReader and the implementations" in this PR: https://github.com/apache/pinot/pull/8953. Can you take a look when you get a chance? (I've also tried to address your comments on that PR directly)
   
   I'll be working on the `getStoredType` after the other two.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1160769103

   hi @Jackie-Jiang  / @richardstartin - I had iterated upon the review internally on the branch with @somandal  before the PR was sent out. It will be good if one of you can also help review. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
somandal commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903232409


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   Just to clarify, for the types where we need to typecast or return the values[] of a different stored type, I will still need to copy the data into the values buffer. Is this copying still okay?
   
   e.g.:
   
   existing:
   ```
           for (int i = 0; i < length; i++) {
             int numValues = getLongMV(docIds[i], longValueBuffer, context);
             values[i] = new int[numValues];
             for (int j = 0; j < numValues; j++) {
               values[i][j] = (int) longValueBuffer[j];
             }
           }
   ```
   
   new code (with API change):
   ```
           for (int i = 0; i < length; i++) {
             long[] longValueBuffer = getLongMV(docIds[i], context);
             values[i] = new int[longValueBuffer.length];
             for (int j = 0; j < longValueBuffer.length; j++) {
               values[i][j] = (int) longValueBuffer[j];
             }
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r904639365


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   Yes. If the buffer can be reused, we should still create a buffer to reduce the garbage, and that's why we want to have 2 versions of the API, one with buffer and one without buffer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1162534423

   > This is great, thanks for adding all the tests!
   > 
   > This PR is a little bit too large, and contains changes to lots of modules. I'd suggest breaking them into smaller ones so that each PR only focus on one thing. That can make both review and future tracking easier. We can break the PR into the following smaller scopes:
   > 
   > * Change `getValueType` to `getStoredType`
   > * New APIs to ForwardIndexReader and the implementations
   > * Filter support / group by support etc.
   
   Sounds good. When we were iterating upon it internally, it was more of what all is needed to get this done correctly throughout the code base with all testing so there was some back n forth. Now that we agree with the overall work, I am ok with the above suggestion to breakdown.  cc @somandal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1178330385

   Closing this PR since all the split PRs with tests have been merged in https://github.com/apache/pinot/pull/8953 and https://github.com/apache/pinot/pull/8993 respectively to provide query support for MV raw columns issue https://github.com/apache/pinot/issues/8875
   
   The only remaining TODO is to add the support for mutable segment for var width since it needs to write a new writer. 
   
   Part 3 https://github.com/apache/pinot/pull/9004 which was also done as part of this is not necessarily related to MV raw query support. It was a side-effect / cleanup change we wanted to make as part of this work. Based on the changes in 9004 PR, it has more implications and changes and discussion is needed from `ForwardIndexReader` interface perspective regardless of SV or MV imo.
   
   cc @somandal @Jackie-Jiang @walterddr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
somandal commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903232409


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   Just to clarify, for the types where we need to typecast or return the values[] of a different stored type, I will still need to copy the data into the values buffer. Is this copying still okay?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia closed pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
siddharthteotia closed pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments
URL: https://github.com/apache/pinot/pull/8917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1159286187

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8917?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8917](https://codecov.io/gh/apache/pinot/pull/8917?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ddebed9) into [master](https://codecov.io/gh/apache/pinot/commit/7024a6d5e3be3a33becc0ffa3319b3f1102d884c?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7024a6d) will **decrease** coverage by `6.37%`.
   > The diff coverage is `44.42%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #8917      +/-   ##
   ============================================
   - Coverage     69.72%   63.35%   -6.38%     
   + Complexity     4856     4844      -12     
   ============================================
     Files          1809     1762      -47     
     Lines         94331    92667    -1664     
     Branches      14069    13923     -146     
   ============================================
   - Hits          65777    58713    -7064     
   - Misses        23974    29776    +5802     
   + Partials       4580     4178     -402     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `66.81% <44.42%> (+0.42%)` | :arrow_up: |
   | unittests2 | `14.81% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8917?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...re/common/evaluators/DefaultJsonPathEvaluator.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZXZhbHVhdG9ycy9EZWZhdWx0SnNvblBhdGhFdmFsdWF0b3IuamF2YQ==) | `29.53% <0.00%> (-2.69%)` | :arrow_down: |
   | [...e/operator/dociditerators/SVScanDocIdIterator.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9kb2NpZGl0ZXJhdG9ycy9TVlNjYW5Eb2NJZEl0ZXJhdG9yLmphdmE=) | `75.51% <0.00%> (-13.27%)` | :arrow_down: |
   | [...ocal/indexsegment/mutable/IntermediateSegment.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvbXV0YWJsZS9JbnRlcm1lZGlhdGVTZWdtZW50LmphdmE=) | `76.51% <ø> (ø)` | |
   | [...e/impl/forward/FixedByteSVMutableForwardIndex.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWFsdGltZS9pbXBsL2ZvcndhcmQvRml4ZWRCeXRlU1ZNdXRhYmxlRm9yd2FyZEluZGV4LmphdmE=) | `93.87% <ø> (ø)` | |
   | [...ime/impl/forward/VarByteSVMutableForwardIndex.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWFsdGltZS9pbXBsL2ZvcndhcmQvVmFyQnl0ZVNWTXV0YWJsZUZvcndhcmRJbmRleC5qYXZh) | `66.66% <ø> (ø)` | |
   | [...readers/constant/ConstantMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvY29uc3RhbnQvQ29uc3RhbnRNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `0.00% <ø> (ø)` | |
   | [.../readers/forward/FixedBitMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9GaXhlZEJpdE1WRm9yd2FyZEluZGV4UmVhZGVyLmphdmE=) | `100.00% <ø> (+2.17%)` | :arrow_up: |
   | [.../readers/forward/FixedBitSVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9GaXhlZEJpdFNWRm9yd2FyZEluZGV4UmVhZGVyLmphdmE=) | `0.00% <ø> (ø)` | |
   | [...eaders/forward/FixedBitSVForwardIndexReaderV2.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9GaXhlZEJpdFNWRm9yd2FyZEluZGV4UmVhZGVyVjIuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...rs/forward/VarByteChunkSVForwardIndexReaderV4.java](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFJlYWRlclY0LmphdmE=) | `91.07% <ø> (ø)` | |
   | ... and [454 more](https://codecov.io/gh/apache/pinot/pull/8917/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8917?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8917?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7024a6d...ddebed9](https://codecov.io/gh/apache/pinot/pull/8917?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
somandal commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903232409


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   Just to clarify, for the types where we need to typecast or return the values[] of a different stored type, I will still need to copy the data into the values buffer. Is this copying still okay?
   
   e.g.:
   ```
           for (int i = 0; i < length; i++) {
             int numValues = getLongMV(docIds[i], longValueBuffer, context);
             values[i] = new int[numValues];
             for (int j = 0; j < numValues; j++) {
               values[i][j] = (int) longValueBuffer[j];
             }
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
somandal commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903232409


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   Just to clarify, for the types where we need to typecast or return the values[] of a different stored type, I will still need to copy the data into the values buffer. Is this copying still okay?
   
   e.g.:
   
   existing:
   ```
           for (int i = 0; i < length; i++) {
             int numValues = getLongMV(docIds[i], longValueBuffer, context);
             values[i] = new int[numValues];
             for (int j = 0; j < numValues; j++) {
               values[i][j] = (int) longValueBuffer[j];
             }
           }
   ```
   
   new code (with API change):
   ```
           for (int i = 0; i < length; i++) {
             long[] longValueBuffer = getLongMV(docIds[i], context);
             for (int j = 0; j < longValueBuffer.length; j++) {
               values[i][j] = (int) longValueBuffer[j];
             }
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1169574316

   Part 1 https://github.com/apache/pinot/pull/8953 split from this PR has been merged. Rest of the work to follow-up


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on PR #8917:
URL: https://github.com/apache/pinot/pull/8917#issuecomment-1170868977

   Part 2 split from this PR https://github.com/apache/pinot/pull/8993


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903149184


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   For MV read, let's add APIs to directly read the values without passing in a buffer, e.g. `int[] getIntMV(int docId, T context)`. This API will be very useful to prevent unnecessary copying of the array, or when the `maxNumValuesPerMVEntry` is not available



##########
pinot-core/src/main/java/org/apache/pinot/core/common/DataFetcher.java:
##########
@@ -713,8 +730,51 @@ void readStringValuesMV(TransformEvaluator evaluator, int[] docIds, int length,
 
     public void readNumValuesMV(int[] docIds, int length, int[] numValuesBuffer) {
       Tracing.activeRecording().setInputDataType(_dataType, _singleValue);
-      for (int i = 0; i < length; i++) {
-        numValuesBuffer[i] = _reader.getDictIdMV(docIds[i], _reusableMVDictIds, getReaderContext());
+      if (_dictionary != null) {
+        for (int i = 0; i < length; i++) {
+          numValuesBuffer[i] = _reader.getDictIdMV(docIds[i], _reusableMVDictIds, getReaderContext());
+        }
+      } else {
+        switch (_reader.getStoredType()) {
+          case INT:
+            int[] intValueBuffer = new int[_maxNumValuesPerMVEntry];
+            for (int i = 0; i < length; i++) {
+              numValuesBuffer[i] = _reader.getIntMV(docIds[i], intValueBuffer, getReaderContext());

Review Comment:
   This is adding lots of overhead because we don't really need to read the values. Let's add an API `int getNumValuesMV(int docId, T context)` to the `ForwardIndexReader` which simply returns the values in the MV entry without reading any content



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -47,7 +48,7 @@
    * Returns the data type of the values in the forward index. Returns {@link DataType#INT} for dictionary-encoded
    * forward index.
    */
-  DataType getValueType();
+  DataType getStoredType();

Review Comment:
   Can we put this change as a separate PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Posted by GitBox <gi...@apache.org>.
somandal commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r904938926


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   Got it. I've addressed this and your other comment as part of the new PR I opened: https://github.com/apache/pinot/pull/8953
   Let me know what you think! appreciate the feedback!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org