You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "wirybeaver (via GitHub)" <gi...@apache.org> on 2024/02/22 05:05:51 UTC

[PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

wirybeaver opened a new pull request, #12466:
URL: https://github.com/apache/pinot/pull/12466

   <a href="https://imgur.com/FEzpKz8"><img src="https://i.imgur.com/FEzpKz8.png" title="source: imgur.com" /></a>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1517017215


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -309,12 +242,21 @@ private int getDocId(int flattenedDocId) {
   @Override
   public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
+    Pair<String, MutableRoaringBitmap> result = processArrayIndex(key);
+    key = result.getLeft();
+    MutableRoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+    if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+      return matchingDocsMap;
+    }
     int[] dictIds = getDictIdRangeForKey(key);
-
     for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
       // get docIds from posting list, convert these to the actual docIds
       ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId);
-      PeekableIntIterator it = flattenedDocIds.getIntIterator();
+      PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator()
+              : intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds);

Review Comment:
   Thanks, RoaringBitmap.and works for MutableJsonIndex. But doesn't work for ImmutableJsonIndexReader;
   The RoaringBitmap.and require the input type is RoaringBitmap();
   MutableRoaringBitmap and ImmutableRoaringBitamp are not the subclass of RoaringBitmap.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12466:
URL: https://github.com/apache/pinot/pull/12466#issuecomment-1989304693

   @saurabhd336 Is this PR trying to solve the same problem as yours (#12532)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12466:
URL: https://github.com/apache/pinot/pull/12466#issuecomment-1958741734

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `92 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`1d490c1`)](https://app.codecov.io/gh/apache/pinot/commit/1d490c1ac3268103a16d77ddfa70f8f8602f9e96?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.77% compared to head [(`d61fbbe`)](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 0.00%.
   > Report is 13 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...t/index/readers/json/ImmutableJsonIndexReader.java](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvanNvbi9JbW11dGFibGVKc29uSW5kZXhSZWFkZXIuamF2YQ==) | 0.00% | [57 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...local/realtime/impl/json/MutableJsonIndexImpl.java](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWFsdGltZS9pbXBsL2pzb24vTXV0YWJsZUpzb25JbmRleEltcGwuamF2YQ==) | 0.00% | [35 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12466       +/-   ##
   =============================================
   - Coverage     61.77%    0.00%   -61.78%     
   =============================================
     Files          2436     2360       -76     
     Lines        133130   129470     -3660     
     Branches      20623    20080      -543     
   =============================================
   - Hits          82241        0    -82241     
   - Misses        44850   129470    +84620     
   + Partials       6039        0     -6039     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.66%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.74%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.78%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12466/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12466?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12466:
URL: https://github.com/apache/pinot/pull/12466#issuecomment-1970189921

   @itschrispeck Can you please help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1517017215


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -309,12 +242,21 @@ private int getDocId(int flattenedDocId) {
   @Override
   public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
+    Pair<String, MutableRoaringBitmap> result = processArrayIndex(key);
+    key = result.getLeft();
+    MutableRoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+    if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+      return matchingDocsMap;
+    }
     int[] dictIds = getDictIdRangeForKey(key);
-
     for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
       // get docIds from posting list, convert these to the actual docIds
       ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId);
-      PeekableIntIterator it = flattenedDocIds.getIntIterator();
+      PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator()
+              : intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds);

Review Comment:
   RoaringBitmap.and require the input type is RoaringBitmap();
   MutableRoaringBitmap and ImmutableRoaringBitamp are not the subclass of RoaringBitmap.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1502008766


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -160,94 +161,26 @@ private MutableRoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) {
         "Left-hand side of the predicate must be an identifier, got: %s (%s). Put double quotes around the identifier"
             + " if needed.", lhs, lhs.getType());
     String key = lhs.getIdentifier();
-
-    MutableRoaringBitmap matchingDocIds = null;
+    // Support 2 formats:
+    // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc')
+    // - Legacy format (e.g. "a[1].b"='abc')
     if (_version == BaseJsonIndexCreator.VERSION_2) {
-      // Support 2 formats:
-      // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc')
-      // - Legacy format (e.g. "a[1].b"='abc')
-      if (key.charAt(0) == '$') {
+      if (key.startsWith("$")) {
         key = key.substring(1);
       } else {
         key = JsonUtils.KEY_SEPARATOR + key;
       }
-
-      // Process the array index within the key if exists

Review Comment:
   Keep the key preprocessing but move the generation of matchingDocIds over array index json path to the static function processArrayIndex.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1502008766


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -160,94 +161,26 @@ private MutableRoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) {
         "Left-hand side of the predicate must be an identifier, got: %s (%s). Put double quotes around the identifier"
             + " if needed.", lhs, lhs.getType());
     String key = lhs.getIdentifier();
-
-    MutableRoaringBitmap matchingDocIds = null;
+    // Support 2 formats:
+    // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc')
+    // - Legacy format (e.g. "a[1].b"='abc')
     if (_version == BaseJsonIndexCreator.VERSION_2) {
-      // Support 2 formats:
-      // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc')
-      // - Legacy format (e.g. "a[1].b"='abc')
-      if (key.charAt(0) == '$') {
+      if (key.startsWith("$")) {
         key = key.substring(1);
       } else {
         key = JsonUtils.KEY_SEPARATOR + key;
       }
-
-      // Process the array index within the key if exists

Review Comment:
   Note: Keep the key preprocessing but move the generation of matchingDocIds over array index json path to the static function processArrayIndex.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -209,41 +210,11 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) {
     } else {
       key = JsonUtils.KEY_SEPARATOR + key;
     }
-

Review Comment:
   Note: Move the processing of array index to the static function processArrayIndex for code reusing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on PR #12466:
URL: https://github.com/apache/pinot/pull/12466#issuecomment-1986422383

   The integration test fails on a mutlistage query irrelevant to json. the temeurin-11 succeeds.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1498659095


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -209,41 +210,11 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) {
     } else {
       key = JsonUtils.KEY_SEPARATOR + key;
     }
-

Review Comment:
   Move the processing of array index to the static function processArrayIndex for code reusing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "itschrispeck (via GitHub)" <gi...@apache.org>.
itschrispeck commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1508495654


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java:
##########
@@ -90,9 +90,13 @@ public void init(List<TransformFunction> arguments, Map<String, ColumnContext> c
     }
     String resultsType = ((LiteralTransformFunction) thirdArgument).getStringLiteral().toUpperCase();
     boolean isSingleValue = !resultsType.endsWith("_ARRAY");
+    // TODO: will support mv; the underlying jsonIndexReader.getMatchingDocsMap already works for the json path [*]
     if (!isSingleValue) {
       throw new IllegalArgumentException("jsonExtractIndex only supports single value type");
     }
+    if (isSingleValue && inputJsonPath.contains("[*]")) {
+      throw new IllegalArgumentException("json path should not use [*] if the return type is single value");

Review Comment:
   nit: something like `[*] syntax is unsupported as json_extract_index does not support returning array types` is clearer to a user 



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -370,6 +312,90 @@ private int[] getDictIdRangeForKey(String key) {
     return new int[]{minDictId, maxDictId};
   }
 
+  // If key doesn't contain the array index, return <original key, null bitmap>
+  // Elif the key, i.e. the json path provided by user doesn't match any data, return <null, empty bitmap>
+  // Else, return the json path that is generated by replacing array index with . on the original key
+  // and the associated flattenDocId bitmap

Review Comment:
   nit: javadoc formatting



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -301,12 +272,22 @@ public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
     _readLock.lock();
     try {
+      Pair<String, RoaringBitmap> result = processArrayIndex(key);
+      key = result.getLeft();
+      RoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+      if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+        return matchingDocsMap;
+      }
       for (Map.Entry<String, RoaringBitmap> entry : _postingListMap.entrySet()) {
         if (!entry.getKey().startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
           continue;
         }
-        MutableRoaringBitmap flattenedDocIds = entry.getValue().toMutableRoaringBitmap();
-        PeekableIntIterator it = flattenedDocIds.getIntIterator();
+        RoaringBitmap kvPairFlattenedDocIds = entry.getValue();
+        PeekableIntIterator it = arrayIndexFlattenDocIds == null ? kvPairFlattenedDocIds.getIntIterator()
+                : intersect(arrayIndexFlattenDocIds.clone(), kvPairFlattenedDocIds).getIntIterator();

Review Comment:
   `RoaringBitmap.and(arrayIndexFlattenDocIds, flattenedDocIds).getIntIterator()`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -342,6 +323,51 @@ public String[] getValuesForKeyAndDocs(int[] docIds, Map<String, RoaringBitmap>
     return values;
   }
 
+  private Pair<String, RoaringBitmap> processArrayIndex(String key) {

Review Comment:
   `processArrayIndex` -> `getKeyAndFlattenedDocIds`?
   
   Can you add a comment here about input/output (already done in the immutable reader)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -309,12 +242,21 @@ private int getDocId(int flattenedDocId) {
   @Override
   public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
+    Pair<String, MutableRoaringBitmap> result = processArrayIndex(key);
+    key = result.getLeft();
+    MutableRoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+    if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+      return matchingDocsMap;
+    }
     int[] dictIds = getDictIdRangeForKey(key);
-
     for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
       // get docIds from posting list, convert these to the actual docIds
       ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId);
-      PeekableIntIterator it = flattenedDocIds.getIntIterator();
+      PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator()
+              : intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds);

Review Comment:
   `RoaringBitmap.and(arrayIndexFlattenDocIds, flattenedDocIds).getIntIterator()`



##########
pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunctionTest.java:
##########
@@ -158,6 +159,11 @@ public void setUp()
               + "\"stringVal\":\"%s\"}", RANDOM.nextInt(), RANDOM.nextLong(), RANDOM.nextFloat(), RANDOM.nextDouble(),
           BigDecimal.valueOf(RANDOM.nextDouble()).multiply(BigDecimal.valueOf(RANDOM.nextInt())),
           df.format(RANDOM.nextInt() * RANDOM.nextDouble()));
+      _jsonArrayValues[i] = String.format(
+          "{\"intVals\":[%s,%s], \"longVals\":[%s,%s], \"floatVals\":[%s,%s], \"doubleVals\":[%s,%s], "
+          + "\"bigDecimalVals\":[%s,%s], \"stringVals\":[\"%s\",\"%s\"]}",
+          0, 1, 0L, 1L, 0.0f, 1.0f, 0.0, 1.0, BigDecimal.valueOf(0.0), BigDecimal.valueOf(1.0),
+          df.format(0.0), df.format(1.0));

Review Comment:
   I think it's clearer to combine this w/ `_jsonSVValues`, the column type is still SV (not MV). 
   
   The duplicated test could be removed and that existing data provider can be extended to cover all cases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Extract json individual array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "itschrispeck (via GitHub)" <gi...@apache.org>.
itschrispeck commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1520649238


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -309,12 +242,21 @@ private int getDocId(int flattenedDocId) {
   @Override
   public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
+    Pair<String, MutableRoaringBitmap> result = processArrayIndex(key);
+    key = result.getLeft();
+    MutableRoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+    if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+      return matchingDocsMap;
+    }
     int[] dictIds = getDictIdRangeForKey(key);
-
     for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
       // get docIds from posting list, convert these to the actual docIds
       ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId);
-      PeekableIntIterator it = flattenedDocIds.getIntIterator();
+      PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator()
+              : intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds);

Review Comment:
   How about `MutableRoaringBitmap.and(arrayIndexFlattenDocIds, flattenedDocIds)`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1509270121


##########
pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunctionTest.java:
##########
@@ -158,6 +159,11 @@ public void setUp()
               + "\"stringVal\":\"%s\"}", RANDOM.nextInt(), RANDOM.nextLong(), RANDOM.nextFloat(), RANDOM.nextDouble(),
           BigDecimal.valueOf(RANDOM.nextDouble()).multiply(BigDecimal.valueOf(RANDOM.nextInt())),
           df.format(RANDOM.nextInt() * RANDOM.nextDouble()));
+      _jsonArrayValues[i] = String.format(
+          "{\"intVals\":[%s,%s], \"longVals\":[%s,%s], \"floatVals\":[%s,%s], \"doubleVals\":[%s,%s], "
+          + "\"bigDecimalVals\":[%s,%s], \"stringVals\":[\"%s\",\"%s\"]}",
+          0, 1, 0L, 1L, 0.0f, 1.0f, 0.0, 1.0, BigDecimal.valueOf(0.0), BigDecimal.valueOf(1.0),
+          df.format(0.0), df.format(1.0));

Review Comment:
   will do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1509270121


##########
pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunctionTest.java:
##########
@@ -158,6 +159,11 @@ public void setUp()
               + "\"stringVal\":\"%s\"}", RANDOM.nextInt(), RANDOM.nextLong(), RANDOM.nextFloat(), RANDOM.nextDouble(),
           BigDecimal.valueOf(RANDOM.nextDouble()).multiply(BigDecimal.valueOf(RANDOM.nextInt())),
           df.format(RANDOM.nextInt() * RANDOM.nextDouble()));
+      _jsonArrayValues[i] = String.format(
+          "{\"intVals\":[%s,%s], \"longVals\":[%s,%s], \"floatVals\":[%s,%s], \"doubleVals\":[%s,%s], "
+          + "\"bigDecimalVals\":[%s,%s], \"stringVals\":[\"%s\",\"%s\"]}",
+          0, 1, 0L, 1L, 0.0f, 1.0f, 0.0, 1.0, BigDecimal.valueOf(0.0), BigDecimal.valueOf(1.0),
+          df.format(0.0), df.format(1.0));

Review Comment:
   will do. Initially, I wanted to use it for MV testing in the future



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1517017215


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -309,12 +242,21 @@ private int getDocId(int flattenedDocId) {
   @Override
   public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
+    Pair<String, MutableRoaringBitmap> result = processArrayIndex(key);
+    key = result.getLeft();
+    MutableRoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+    if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+      return matchingDocsMap;
+    }
     int[] dictIds = getDictIdRangeForKey(key);
-
     for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
       // get docIds from posting list, convert these to the actual docIds
       ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId);
-      PeekableIntIterator it = flattenedDocIds.getIntIterator();
+      PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator()
+              : intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds);

Review Comment:
   Thanks, RoaringBitmap.and works for MutableJsonIndex. But doesn't work for ImmutableJsonIndexReader;
   The RoaringBitmap.and() require the input type is RoaringBitmap;
   MutableRoaringBitmap (arrayIndexFlattenDocIds) and ImmutableRoaringBitamp (flattenedDocIds) are not the subclass of RoaringBitmap.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add the support to extract json array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.
wirybeaver commented on code in PR #12466:
URL: https://github.com/apache/pinot/pull/12466#discussion_r1517017215


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -309,12 +242,21 @@ private int getDocId(int flattenedDocId) {
   @Override
   public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
     Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>();
+    Pair<String, MutableRoaringBitmap> result = processArrayIndex(key);
+    key = result.getLeft();
+    MutableRoaringBitmap arrayIndexFlattenDocIds = result.getRight();
+    if (arrayIndexFlattenDocIds != null && arrayIndexFlattenDocIds.isEmpty()) {
+      return matchingDocsMap;
+    }
     int[] dictIds = getDictIdRangeForKey(key);
-
     for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
       // get docIds from posting list, convert these to the actual docIds
       ImmutableRoaringBitmap flattenedDocIds = _invertedIndex.getDocIds(dictId);
-      PeekableIntIterator it = flattenedDocIds.getIntIterator();
+      PeekableIntIterator it = arrayIndexFlattenDocIds == null ? flattenedDocIds.getIntIterator()
+              : intersect(arrayIndexFlattenDocIds.clone(), flattenedDocIds);

Review Comment:
   Thanks, RoaringBitmap.and works for MutableJsonIndex. But doesn't work for ImmutableJsonIndexReader;
   The RoaringBitmap.and() require the input type is RoaringBitmap;
   MutableRoaringBitmap and ImmutableRoaringBitamp are not the subclass of RoaringBitmap.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Extract json individual array elements from json index for the transform function jsonExtractIndex [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12466:
URL: https://github.com/apache/pinot/pull/12466


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org