You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "xiangfu0 (via GitHub)" <gi...@apache.org> on 2023/07/16 02:54:17 UTC
[GitHub] [pinot] xiangfu0 opened a new pull request, #11117: [multistage] bridge v2 query engine for leaf stage v1 group by multi-value column
xiangfu0 opened a new pull request, #11117:
URL: https://github.com/apache/pinot/pull/11117
Try to bridge v2 query engine for leaf stage v1 group by multi-value column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271453437
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
// date type conversion functions
CAST("cast"),
+ // object type
+ ARRAY_TO_MV("arrayToMV",
+ ReturnTypes.cascade(opBinding -> positionalComponentReturnType(opBinding, 0), SqlTypeTransforms.FORCE_NULLABLE),
Review Comment:
let's first create a component return type registry on @ScalarFunction so we dont have to modify the transform function side.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
// date type conversion functions
CAST("cast"),
+ // object type
+ ARRAY_TO_MV("arrayToMV",
Review Comment:
yeah we need to figure out a proper way b/c out put of a select is an array, but the table config / schema will still call this as MV column as convension. both have some confusion, but as long as the document is proper we should be good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] kishoreg commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1636990670
it feels like the other way around right? multivalue_to_array?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1274111424
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java:
##########
@@ -216,5 +216,11 @@ public static Object clpDecode(String logtypeFieldName, String dictVarsFieldName
String defaultValue) {
throw new UnsupportedOperationException("Placeholder scalar function, should not reach here");
}
+
+ @ScalarFunction(names = {"arrayToMV", "array_to_mv", "arrayToMultiValue", "array_to_multi_value"},
+ isPlaceholder = true)
+ public static String arrayToMultiValue(Object multiValue) {
Review Comment:
(minor)
```suggestion
public static String arrayToMV(Object array) {
```
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -199,6 +203,12 @@ private static Expression compileFunctionExpression(RexExpression.FunctionCall r
return compileOrExpression(rexCall, pinotQuery);
case OTHER_FUNCTION:
functionName = rexCall.getFunctionName();
+ // Special handle for leaf stage multi-value columns, as the default behavior for filter and group by is not
+ // sql standard, so need to use `array_to_mv` to convert the array to v1 multi-value column for behavior
+ // consistency meanwhile not violating the sql standard.
+ if (ARRAY_TO_MV_ALIAS.contains(functionName)) {
Review Comment:
Is function name always upper case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270810293
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
// date type conversion functions
CAST("cast"),
+ // object type
+ ARRAY_TO_MV("arrayToMV",
Review Comment:
IMO this is confusing name. the data type is already MV running an ARRAY_TO_MV is a bit weird.
should we named it `USE_AS_MV`? and we can say that MV columns are by default `USE_AS_ARRAY`
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+ + "GROUP BY arrayToMV(RandomAirports)";
+ JsonNode jsonNode = postQuery(pinotQuery);
+ Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+ }
+
+ @Test
+ public void testMultiValueColumnGroupByOrderBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
will it work if I run
```
SELECT count(*), arrayToMV(RandomAirports)
FROM mytable
WHERE Dest IN (SELECT Dest FROM myTable GROUP BY Dest HAVING count(*) > 10)
```
(later when we implemented the scalar function wrapper)
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
BaseIntegrationTestSets also have (1) hard-coded MV tests queries; and (2) withMultiValue flag in generated queries
should we turn them on?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271057600
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
v1/v2 behavior are different, we can turn on them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271117940
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
done
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270413027
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {
Review Comment:
In v1 leaf, this is just the transparent layer from underlying function
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271015899
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+ + "GROUP BY arrayToMV(RandomAirports)";
+ JsonNode jsonNode = postQuery(pinotQuery);
+ Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+ }
+
+ @Test
+ public void testMultiValueColumnGroupByOrderBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
This only works at leaf stage not intermediate stage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1654504252
Part of this https://github.com/apache/pinot/issues/10658
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1272525489
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
// date type conversion functions
CAST("cast"),
+ // object type
+ ARRAY_TO_MV("arrayToMV",
+ ReturnTypes.cascade(opBinding -> positionalComponentReturnType(opBinding, 0), SqlTypeTransforms.FORCE_NULLABLE),
Review Comment:
I will have a different pr for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271118559
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
// date type conversion functions
CAST("cast"),
+ // object type
+ ARRAY_TO_MV("arrayToMV",
Review Comment:
`USE_AS_MV` might be confusing, since we repurposed `MV` as `ARRAY` in v2, `ARRAY_TO_MV` might be more explicit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271057600
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
v1/v2 behavior are different, we can turn on them, but sql syntax will be different.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1636974385
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
> Merging [#11117](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (35693aa) into [master](https://app.codecov.io/gh/apache/pinot/commit/c32318360e46a4030bfbff9d11f57b7fa0564397?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (c323183) will **increase** coverage by `0.00%`.
> The diff coverage is `0.00%`.
```diff
@@ Coverage Diff @@
## master #11117 +/- ##
=========================================
Coverage 0.11% 0.11%
=========================================
Files 2203 2149 -54
Lines 118196 115707 -2489
Branches 17887 17581 -306
=========================================
Hits 137 137
+ Misses 118039 115550 -2489
Partials 20 20
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1temurin11 | `?` | |
| integration1temurin17 | `?` | |
| integration1temurin20 | `?` | |
| integration2temurin11 | `?` | |
| integration2temurin17 | `?` | |
| integration2temurin20 | `?` | |
| unittests1temurin11 | `?` | |
| unittests1temurin17 | `?` | |
| unittests1temurin20 | `?` | |
| unittests2temurin11 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
| unittests2temurin17 | `?` | |
| unittests2temurin20 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [...e/pinot/common/function/TransformFunctionType.java](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vVHJhbnNmb3JtRnVuY3Rpb25UeXBlLmphdmE=) | `0.00% <0.00%> (ø)` | |
| [...m/function/ArrayToMultiValueTransformFunction.java](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vQXJyYXlUb011bHRpVmFsdWVUcmFuc2Zvcm1GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...r/transform/function/TransformFunctionFactory.java](https://app.codecov.io/gh/apache/pinot/pull/11117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vVHJhbnNmb3JtRnVuY3Rpb25GYWN0b3J5LmphdmE=) | `0.00% <0.00%> (ø)` | |
... and [57 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11117/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1646299286
Test failures requiring fix by https://github.com/apache/pinot/pull/11151
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1274152585
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -199,6 +203,12 @@ private static Expression compileFunctionExpression(RexExpression.FunctionCall r
return compileOrExpression(rexCall, pinotQuery);
case OTHER_FUNCTION:
functionName = rexCall.getFunctionName();
+ // Special handle for leaf stage multi-value columns, as the default behavior for filter and group by is not
+ // sql standard, so need to use `array_to_mv` to convert the array to v1 multi-value column for behavior
+ // consistency meanwhile not violating the sql standard.
+ if (ARRAY_TO_MV_ALIAS.contains(functionName)) {
Review Comment:
Let's canonicalized and just do `ARRAY_TO_MV`
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -50,6 +52,8 @@
public class CalciteRexExpressionParser {
private static final Logger LOGGER = LoggerFactory.getLogger(CalciteRexExpressionParser.class);
private static final Map<String, String> CANONICAL_NAME_TO_SPECIAL_KEY_MAP;
+ private static final Set<String> ARRAY_TO_MV_ALIAS =
Review Comment:
Do we need this many variance? Once we have unnest this should be removed right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 merged pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 merged PR #11117:
URL: https://github.com/apache/pinot/pull/11117
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270353500
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java:
##########
@@ -216,5 +216,10 @@ public static Object clpDecode(String logtypeFieldName, String dictVarsFieldName
String defaultValue) {
throw new UnsupportedOperationException("Placeholder scalar function, should not reach here");
}
+
+ @ScalarFunction(names = {"arrayToMultiValue", "array_to_multi_value"}, isPlaceholder = true)
+ public static String arrayToMultiValue(Object multiValue) {
Review Comment:
(minor) Personally prefer `arrayToMV` to be consistent with other MV functions such as `sumMV`
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {
Review Comment:
Is this used in v1 or v2? Is it possible to just remove this function on the leaf?
With the current approach, filter will be very inefficient even if it works
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271004206
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+ + "GROUP BY arrayToMV(RandomAirports)";
+ JsonNode jsonNode = postQuery(pinotQuery);
+ Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+ }
+
+ @Test
+ public void testMultiValueColumnGroupByOrderBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
You need a `group by arrayToMV(RandomAirports)` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271461855
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -90,6 +90,11 @@ public enum TransformFunctionType {
// date type conversion functions
CAST("cast"),
+ // object type
+ ARRAY_TO_MV("arrayToMV",
+ ReturnTypes.cascade(opBinding -> positionalComponentReturnType(opBinding, 0), SqlTypeTransforms.FORCE_NULLABLE),
Review Comment:
actually ignored previous comment, did a bit of research and it seems like registering TransformFunctionType without an actual impl is better than having to parse scalar function annotation, which doesn't really allow anything other than primitives
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270424282
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayToMultiValueTransformFunction.java:
##########
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * <code>ArrayToMultiValueTransformFunction</code> is the bridge for leaf stage Group By Multi-Value column.
+ */
+public class ArrayToMultiValueTransformFunction implements TransformFunction {
Review Comment:
Understood, but with transform function no index can be used in filter
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1270420414
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java:
##########
@@ -216,5 +216,10 @@ public static Object clpDecode(String logtypeFieldName, String dictVarsFieldName
String defaultValue) {
throw new UnsupportedOperationException("Placeholder scalar function, should not reach here");
}
+
+ @ScalarFunction(names = {"arrayToMultiValue", "array_to_multi_value"}, isPlaceholder = true)
+ public static String arrayToMultiValue(Object multiValue) {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1646168883
> the grand question is whether we continue to support MV as a data type or collapse into ARRAY. if the answer to the above question is NO, this PR is not needed. if the answer is YES. there are several concerns. i thought about it a bit last night. the problem is 2 fold.
>
> 1. how do we support it conforming with ANSI syntax
> 2. how do we make sure v1 perform the same as before.
>
> for Q1: the answer is probably MULTI_SET
>
> * it is a SET with no ordering but can contain duplicates, suitable for our use case for GROUP-BY and FILTER, which they are considered as expanded/unnested during eval.
> * but not every use cases is considered as MULTI_SET, for example selection will consider MV as an ARRAY
> therefore,
>
> 1. an explicit `USE_AS_MULTISET(mv)` is desired to bridge the syntatic gap
> 2. explicit `USE_AS_ARRAY(mv)` is the default (considering select * there's no reason to ask user explicitly put this in
>
> for Q2: the problem is these USE_AS_*** methods are considered transform, which shouldn't be the case there's a simply way to solve this --> in CalciteRexExpressionUtils, we can simply ignored the functionCall and directly return the operand as a reference in V1 b/c V1 is SqlNode and bares no type info, this means directly putting the operand input reference without the USE_AS_*** type conversion will work naturally with the V1 context.
>
> so in short my proposal is
>
> 1. have `USE_AS_ARRAY` and `USE_AS_MULTISET` as scalarFunction wrappers (unimplemented) to bridge the gap of the syntactic problem on calcite
> 2. in PhysicalPlanner explicitly exclude these syntatitic functions and directly drop to the operand;
> 3. in the mid term we will have MV, MULTI_SET, ARRAY as 3 "co-existing" types, but in V1 there's only MV; in V2 MV is considered by default as ARRAY, unless `USE_AS_MULTISET` is used.
> 4. in the long term, once we fully found all alternatives in standard SQL syntax for MV, we can safely consider MV as ARRAY. and only do MULTISET when necessary (i can only see it being useful in `MV > 5` situation, where there's no array equivalent, only multiset equivalent)
>
> WDYT? Please let me know
MV column in V2 is modeled as ARRAY by default.
So in terms of supporting v1 format, we need to:
1. use `ARRAY_TO_MV` or `USE_AS_MV` as the bridge to ensure Calcite understand the type consistency/conversion.
2. (TODO) Implement MV GROUP BY in v2 intermediate stage to complete the story.
3. (TODO) Implement ARRAY GROUP BY later on
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1271169485
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java:
##########
@@ -485,6 +486,25 @@ public void testLiteralOnlyFunc()
assertEquals(results.get(10).asText(), "hello!");
}
+ @Test
+ public void testMultiValueColumnGroupBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
+ + "GROUP BY arrayToMV(RandomAirports)";
+ JsonNode jsonNode = postQuery(pinotQuery);
+ Assert.assertEquals(jsonNode.get("resultTable").get("rows").size(), 154);
+ }
+
+ @Test
+ public void testMultiValueColumnGroupByOrderBy()
+ throws Exception {
+ String pinotQuery = "SELECT count(*), arrayToMV(RandomAirports) FROM mytable "
Review Comment:
Ok sounds good
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on PR #11117:
URL: https://github.com/apache/pinot/pull/11117#issuecomment-1648941009
rebase to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11117: [multistage] bridge v2 query engine for leaf stage v1 multi-value column
Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11117:
URL: https://github.com/apache/pinot/pull/11117#discussion_r1274368960
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/parser/CalciteRexExpressionParser.java:
##########
@@ -50,6 +52,8 @@
public class CalciteRexExpressionParser {
private static final Logger LOGGER = LoggerFactory.getLogger(CalciteRexExpressionParser.class);
private static final Map<String, String> CANONICAL_NAME_TO_SPECIAL_KEY_MAP;
+ private static final Set<String> ARRAY_TO_MV_ALIAS =
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org