You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2024/02/12 15:46:49 UTC

[PR] Fix the incorrect cast from String to Bytes in V2 [pinot]

gortiz opened a new pull request, #12401:
URL: https://github.com/apache/pinot/pull/12401

   Tags: bugfix
   
   ## The problem
   
   In V2, expressions like `cast('stringLiteral' as BINARY)` were incorrectly executed. This also includes common patterns like `bytesColumn = 'some hex literal'`.
   
   For example, using table `starbucksStores` from `GenericQuickstart`, the following query failed in V2 while worked in V1:
   ```sql
   select location_st_point from starbucksStores where location_st_point = '80c062bf21bba70a9b404e969de0503750' limit 10
   ```
   
   ### Low level description
   
   The exception we got was:
   
   ```
   Caused by: java.lang.AssertionError: value 00ba86d4ebda4d5f8da9ba2cf3053208 does not match type class org.apache.calcite.avatica.util.ByteString
    at org.apache.calcite.linq4j.tree.ConstantExpression.<init>(ConstantExpression.java:51)
    at org.apache.calcite.linq4j.tree.Expressions.constant(Expressions.java:576)
    at org.apache.calcite.linq4j.tree.OptimizeShuttle.visit(OptimizeShuttle.java:291)
    at org.apache.calcite.linq4j.tree.UnaryExpression.accept(UnaryExpression.java:39))
   ```
   
   When the query is received, the expected algorithm is:
   1. Calcite parsed the SQL
   2. Calcite transformed the SQL to Relational Algrebra (`SqlNode` to `RelRoot`). In order to do so, an implicit call from String (`VARCHAR`) to Bytes (`VARBINARY`) is created.
   3. We call `sqlToRelConverter.trimUnusedFields` to simplify the query before optimize.
   4. Optimize. Here the CAST should be executed calling `hexToBinary`.
   
   What actually happened is that in 3. (`sqlToRelConverter.trimUnusedFields`) the expression `CAST(stringLiteral as VARBINARY)` was simplified by Calcite. There is a known bug in Calcite that fails in this kind of casts. There is a patch to solve if (see https://github.com/apache/calcite/pull/3635) but at the time of writing this is not what we want. Specifically, the patch plans to apply some custom semantic (compatible with MySQL and Postgres) on that conversion. That semantic is valid, but it is not what we do in V1. Therefore even if the bug is fixed in Calcite it won't behave as V1 users would expect and as a result they could find unexpected results when migrating queries from V1 to V2.
   
   ## The fix
   
   In order to fix the problem, a couple of new transformation rules has been introduced. These rules look for calls to `cast(somethingOfTypeString as VARBINARY)` and transform that to `hexToBytes(somethingOfTypeString)`. These rules have to be applied before `sqlToRelConverter.trimUnusedFields` is called. Given that method needs to be called before the current optimizers are used, we need to add a new optimizer for these rules.
   
   ### Details
   
   Right now these rules are only applied on filter, projection and joins. We may need to add them to other `Rel`s in the future.
   
   cc @walterddr @Jackie-Jiang @xiangfu0 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fix the incorrect cast from String to Bytes in V2 [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #12401:
URL: https://github.com/apache/pinot/pull/12401#discussion_r1494328544


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java:
##########
@@ -346,6 +372,19 @@ private HintStrategyTable getHintStrategyTable() {
     return PinotHintStrategyTable.PINOT_HINT_STRATEGY_TABLE;
   }
 
+  private static HepProgram getRewriteProgram() {
+    HepProgramBuilder hepProgramBuilder = new HepProgramBuilder();
+    // Set the match order as DEPTH_FIRST. The default is arbitrary which works the same as DEPTH_FIRST, but it's
+    // best to be explicit.
+    hepProgramBuilder.addMatchOrder(HepMatchOrder.DEPTH_FIRST);
+
+    hepProgramBuilder.addRuleInstance(StringToByteCastRule.OnFilter.INSTANCE);

Review Comment:
   Is there any missing Logical nodes?
   
   ```
   LogicalAggregate
   LogicalCalc
   LogicalCorrelate
   LogicalExchange
   LogicalFilter
   LogicalIntersect
   LogicalJoin
   LogicalMatch
   LogicalMinus
   LogicalProject
   LogicalRepeatUnion
   LogicalSnapshot
   LogicalSort
   LogicalSortExchange
   LogicalTableFunctionScan
   LogicalTableModify
   LogicalTableScan
   LogicalTableSpool
   LogicalUnion
   LogicalValues
   LogicalWindow
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fix the incorrect cast from String to Bytes in V2 [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #12401:
URL: https://github.com/apache/pinot/pull/12401#discussion_r1494328544


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java:
##########
@@ -346,6 +372,19 @@ private HintStrategyTable getHintStrategyTable() {
     return PinotHintStrategyTable.PINOT_HINT_STRATEGY_TABLE;
   }
 
+  private static HepProgram getRewriteProgram() {
+    HepProgramBuilder hepProgramBuilder = new HepProgramBuilder();
+    // Set the match order as DEPTH_FIRST. The default is arbitrary which works the same as DEPTH_FIRST, but it's
+    // best to be explicit.
+    hepProgramBuilder.addMatchOrder(HepMatchOrder.DEPTH_FIRST);
+
+    hepProgramBuilder.addRuleInstance(StringToByteCastRule.OnFilter.INSTANCE);

Review Comment:
   Is there any missing Logical nodes? E.g. `LogicalSort`
   
   ```
   LogicalAggregate
   LogicalCalc
   LogicalCorrelate
   LogicalExchange
   LogicalFilter
   LogicalIntersect
   LogicalJoin
   LogicalMatch
   LogicalMinus
   LogicalProject
   LogicalRepeatUnion
   LogicalSnapshot
   LogicalSort
   LogicalSortExchange
   LogicalTableFunctionScan
   LogicalTableModify
   LogicalTableScan
   LogicalTableSpool
   LogicalUnion
   LogicalValues
   LogicalWindow
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fix the incorrect cast from String to Bytes in V2 [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12401:
URL: https://github.com/apache/pinot/pull/12401#issuecomment-1954666322

   We don't need to follow single-stage engine convention here. In multi-stage engine, we try to follow the standard SQL behavior, where encoding should be provided when converting binary to string. Alternatively, the input value can be directly declared as bytes, e.g. `X'80c062bf21bba70a9b404e969de0503750` (note the leading `X` which indicates the value is hex encoded bytes).
   Here are the PostgreSQL behavior for binary for reference: https://www.postgresql.org/docs/current/datatype-binary.html, https://www.postgresql.org/docs/current/functions-binarystring.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fix the incorrect cast from String to Bytes in V2 [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12401:
URL: https://github.com/apache/pinot/pull/12401#issuecomment-2000614678

   As a safer implementation we decided to forbid this casting. See https://github.com/apache/pinot/issues/12457


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Fix the incorrect cast from String to Bytes in V2 [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz closed pull request #12401: Fix the incorrect cast from String to Bytes in V2
URL: https://github.com/apache/pinot/pull/12401


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org