You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "xiangfu0 (via GitHub)" <gi...@apache.org> on 2023/07/16 02:54:17 UTC

[GitHub] [pinot] xiangfu0 opened a new pull request, #11117: [multistage] bridge v2 query engine for leaf stage v1 group by multi-value column

xiangfu0 opened a new pull request, #11117:
URL: https://github.com/apache/pinot/pull/11117

   Try to bridge v2 query engine for leaf stage v1 group by multi-value column.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271453437


##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
   // date type conversion functions
   CAST("cast"),
 
+  // object type
+  ARRAY_TO_MV("arrayToMV",
+      ReturnTypes.cascade(opBinding -> positionalComponentReturnType(opBinding, 0), SqlTypeTransforms.FORCE_NULLABLE),

Review Comment:
   let's first create a component return type registry on @ScalarFunction so we dont have to modify the transform function side. 



##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
   // date type conversion functions
   CAST("cast"),
 
+  // object type
+  ARRAY_TO_MV("arrayToMV",

Review Comment:
   yeah we need to figure out a proper way b/c out put of a select is an array, but the table config / schema will still call this as MV column as convension. both have some confusion, but as long as the document is proper we should be good. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1636990670

   it feels like the other way around right? multivalue_to_array?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1274111424


##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java:
##########
@@ -216,5 +216,11 @@ public static Object clpDecode(String logtypeFieldName, String dictVarsFieldName
         String defaultValue) {
       throw new UnsupportedOperationException("Placeholder scalar function, should not reach here");
     }
+
+    @ScalarFunction(names = {"arrayToMV", "array_to_mv", "arrayToMultiValue", "array_to_multi_value"},
+        isPlaceholder = true)
+    public static String arrayToMultiValue(Object multiValue) {

Review Comment:
   (minor)
   ```suggestion
       public static String arrayToMV(Object array) {
   ```



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -199,6 +203,12 @@ private static Expression compileFunctionExpression(RexExpression.FunctionCall r
         return compileOrExpression(rexCall, pinotQuery);
       case OTHER_FUNCTION:
         functionName = rexCall.getFunctionName();
+        // Special handle for leaf stage multi-value columns, as the default behavior for filter and group by is not
+        // sql standard, so need to use `array_to_mv` to convert the array to v1 multi-value column for behavior
+        // consistency meanwhile not violating the sql standard.
+        if (ARRAY_TO_MV_ALIAS.contains(functionName)) {

Review Comment:
   Is function name always upper case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270810293


##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
   // date type conversion functions
   CAST("cast"),
 
+  // object type
+  ARRAY_TO_MV("arrayToMV",

Review Comment:
   IMO this is confusing name. the data type is already MV running an ARRAY_TO_MV is a bit weird. 
   should we named it `USE_AS_MV`? and we can say that MV columns are by default `USE_AS_ARRAY`



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+        + "GROUP BY arrayToMV(RandomAirports)";
+    JsonNode jsonNode = postQuery(pinotQuery);
+    Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+  }
+
+  @Test
+  public void testMultiValueColumnGroupByOrderBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   will it work if I run 
   ```
   SELECT count(*), arrayToMV(RandomAirports) 
   FROM mytable 
   WHERE Dest IN (SELECT Dest FROM myTable GROUP BY Dest HAVING count(*) > 10)
   ```
   (later when we implemented the scalar function wrapper)



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   BaseIntegrationTestSets also have (1) hard-coded MV tests queries; and (2) withMultiValue flag in generated queries
   should we turn them on?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271057600


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   v1/v2 behavior are different, we can turn on them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271117940


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   done



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270413027


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {

Review Comment:
   In v1 leaf, this is just the transparent layer from underlying function 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271015899


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+        + "GROUP BY arrayToMV(RandomAirports)";
+    JsonNode jsonNode = postQuery(pinotQuery);
+    Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+  }
+
+  @Test
+  public void testMultiValueColumnGroupByOrderBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   This only works at leaf stage not intermediate stage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1654504252

   Part of this https://github.com/apache/pinot/issues/10658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1272525489


##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
   // date type conversion functions
   CAST("cast"),
 
+  // object type
+  ARRAY_TO_MV("arrayToMV",
+      ReturnTypes.cascade(opBinding -> positionalComponentReturnType(opBinding, 0), SqlTypeTransforms.FORCE_NULLABLE),

Review Comment:
   I will have a different pr for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271118559


##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
   // date type conversion functions
   CAST("cast"),
 
+  // object type
+  ARRAY_TO_MV("arrayToMV",

Review Comment:
   `USE_AS_MV` might be confusing, since we repurposed `MV` as `ARRAY` in v2, `ARRAY_TO_MV` might be more explicit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271057600


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   v1/v2 behavior are different, we can turn on them, but sql syntax will be different.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1636974385

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11117](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (35693aa) into [master](https://app.codecov.io/gh/apache/pinot/commit/c32318360e46a4030bfbff9d11f57b7fa0564397?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (c323183) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff            @@
   ##           master   #11117     +/-   ##
   =========================================
     Coverage    0.11%    0.11%             
   =========================================
     Files        2203     2149     -54     
     Lines      118196   115707   -2489     
     Branches    17887    17581    -306     
   =========================================
     Hits          137      137             
   + Misses     118039   115550   -2489     
     Partials       20       20             
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/common/function/TransformFunctionType.java](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vVHJhbnNmb3JtRnVuY3Rpb25UeXBlLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...m/function/ArrayToMultiValueTransformFunction.java](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vQXJyYXlUb011bHRpVmFsdWVUcmFuc2Zvcm1GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...r/transform/function/TransformFunctionFactory.java](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vVHJhbnNmb3JtRnVuY3Rpb25GYWN0b3J5LmphdmE=) | `0.00% <0.00%> (ø)` | |
   
   ... and [57 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11117/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1646299286

   Test failures requiring fix by https://github.com/apache/pinot/pull/11151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1274152585


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -199,6 +203,12 @@ private static Expression compileFunctionExpression(RexExpression.FunctionCall r
         return compileOrExpression(rexCall, pinotQuery);
       case OTHER_FUNCTION:
         functionName = rexCall.getFunctionName();
+        // Special handle for leaf stage multi-value columns, as the default behavior for filter and group by is not
+        // sql standard, so need to use `array_to_mv` to convert the array to v1 multi-value column for behavior
+        // consistency meanwhile not violating the sql standard.
+        if (ARRAY_TO_MV_ALIAS.contains(functionName)) {

Review Comment:
   Let's canonicalized and just do `ARRAY_TO_MV`



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -50,6 +52,8 @@
 public class CalciteRexExpressionParser {
   private static final Logger LOGGER = LoggerFactory.getLogger(CalciteRexExpressionParser.class);
   private static final Map<String, String> CANONICAL_NAME_TO_SPECIAL_KEY_MAP;
+  private static final Set<String> ARRAY_TO_MV_ALIAS =

Review Comment:
   Do we need this many variance? Once we have unnest this should be removed right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 merged pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 merged PR #11117:
URL: https://github.com/apache/pinot/pull/11117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270353500


##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java:
##########
@@ -216,5 +216,10 @@ public static Object clpDecode(String logtypeFieldName, String dictVarsFieldName
         String defaultValue) {
       throw new UnsupportedOperationException("Placeholder scalar function, should not reach here");
     }
+
+    @ScalarFunction(names = {"arrayToMultiValue", "array_to_multi_value"}, isPlaceholder = true)
+    public static String arrayToMultiValue(Object multiValue) {

Review Comment:
   (minor) Personally prefer `arrayToMV` to be consistent with other MV functions such as `sumMV`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {

Review Comment:
   Is this used in v1 or v2? Is it possible to just remove this function on the leaf?
   With the current approach, filter will be very inefficient even if it works



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271004206


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+        + "GROUP BY arrayToMV(RandomAirports)";
+    JsonNode jsonNode = postQuery(pinotQuery);
+    Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+  }
+
+  @Test
+  public void testMultiValueColumnGroupByOrderBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   You need a `group by arrayToMV(RandomAirports)` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271461855


##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
   // date type conversion functions
   CAST("cast"),
 
+  // object type
+  ARRAY_TO_MV("arrayToMV",
+      ReturnTypes.cascade(opBinding -> positionalComponentReturnType(opBinding, 0), SqlTypeTransforms.FORCE_NULLABLE),

Review Comment:
   actually ignored previous comment, did a bit of research and it seems like registering TransformFunctionType without an actual impl is better than having to parse scalar function annotation, which doesn't really allow anything other than primitives



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270424282


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {

Review Comment:
   Understood, but with transform function no index can be used in filter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270420414


##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java:
##########
@@ -216,5 +216,10 @@ public static Object clpDecode(String logtypeFieldName, String dictVarsFieldName
         String defaultValue) {
       throw new UnsupportedOperationException("Placeholder scalar function, should not reach here");
     }
+
+    @ScalarFunction(names = {"arrayToMultiValue", "array_to_multi_value"}, isPlaceholder = true)
+    public static String arrayToMultiValue(Object multiValue) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1646168883

   > the grand question is whether we continue to support MV as a data type or collapse into ARRAY. if the answer to the above question is NO, this PR is not needed. if the answer is YES. there are several concerns. i thought about it a bit last night. the problem is 2 fold.
   > 
   > 1. how do we support it conforming with ANSI syntax
   > 2. how do we make sure v1 perform the same as before.
   > 
   > for Q1: the answer is probably MULTI_SET
   > 
   > * it is a SET with no ordering but can contain duplicates, suitable for our use case for GROUP-BY and FILTER, which they are considered as expanded/unnested during eval.
   > * but not every use cases is considered as MULTI_SET, for example selection will consider MV as an ARRAY
   >   therefore,
   > 
   > 1. an explicit `USE_AS_MULTISET(mv)` is desired to bridge the syntatic gap
   > 2. explicit `USE_AS_ARRAY(mv)` is the default (considering select * there's no reason to ask user explicitly put this in
   > 
   > for Q2: the problem is these USE_AS_*** methods are considered transform, which shouldn't be the case there's a simply way to solve this --> in CalciteRexExpressionUtils, we can simply ignored the functionCall and directly return the operand as a reference in V1 b/c V1 is SqlNode and bares no type info, this means directly putting the operand input reference without the USE_AS_*** type conversion will work naturally with the V1 context.
   > 
   > so in short my proposal is
   > 
   > 1. have `USE_AS_ARRAY` and `USE_AS_MULTISET` as scalarFunction wrappers (unimplemented) to bridge the gap of the syntactic problem on calcite
   > 2. in PhysicalPlanner explicitly exclude these syntatitic functions and directly drop to the operand;
   > 3. in the mid term we will have MV, MULTI_SET, ARRAY as 3 "co-existing" types, but in V1 there's only MV; in V2 MV is considered by default as ARRAY, unless `USE_AS_MULTISET` is used.
   > 4. in the long term, once we fully found all alternatives in standard SQL syntax for MV, we can safely consider MV as ARRAY. and only do MULTISET when necessary (i can only see it being useful in `MV > 5` situation, where there's no array equivalent, only multiset equivalent)
   > 
   > WDYT? Please let me know
   
   MV column in V2 is modeled as ARRAY by default.
   So in terms of supporting v1 format, we need to:
   1. use `ARRAY_TO_MV` or `USE_AS_MV` as the bridge to ensure Calcite understand the type consistency/conversion.
   2. (TODO) Implement MV GROUP BY in v2 intermediate stage to complete the story.
   3. (TODO) Implement ARRAY GROUP BY later on


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271169485


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
     assertEquals(results.get(10).asText(), "hello!");
   }
 
+  @Test
+  public void testMultiValueColumnGroupBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+        + "GROUP BY arrayToMV(RandomAirports)";
+    JsonNode jsonNode = postQuery(pinotQuery);
+    Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+  }
+
+  @Test
+  public void testMultiValueColumnGroupByOrderBy()
+      throws Exception {
+    String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "

Review Comment:
   Ok sounds good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1648941009

   rebase to master 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1274368960


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -50,6 +52,8 @@
 public class CalciteRexExpressionParser {
   private static final Logger LOGGER = LoggerFactory.getLogger(CalciteRexExpressionParser.class);
   private static final Map<String, String> CANONICAL_NAME_TO_SPECIAL_KEY_MAP;
+  private static final Set<String> ARRAY_TO_MV_ALIAS =

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org