You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/02/04 22:05:52 UTC

[GitHub] [beam] apilloud opened a new pull request #13898: [WIP][BEAM-10925] Make BeamCalcRel safe for ZetaSQL

apilloud opened a new pull request #13898:
URL: https://github.com/apache/beam/pull/13898


   This adds hooks to enable testing BeamCalcRel/BeamZetaSQLCalcRel in isolation and starts to fix some issues in BeamCalcRel.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam
 .apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.a
 pache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam
 .apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia edited a comment on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-776378220


   I am still not sure why we should reject Int64 type for UDF.
   
   Assuming a UDF is defined as `package com.xxx.client; class name; public Long udf(Long a)`, the code gen will generate code for UDF like `Long n = com.xxx.client.name.udf((Long) value)`.
   
   In this case, the value is from Row thus is compatible with ZetaSQL, the return value is still Java object which is also be compatible with ZetaSQL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud merged pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud merged pull request #13898:
URL: https://github.com/apache/beam/pull/13898


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780896213


   This PR is currently addressing the second point. There is currently data corruption introduced by UDFs. If you think it will take a significant time to address this I would suggest we roll back #13841 until you figure this out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia edited a comment on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-776378220


   I am still not sure why we should reject Int64 type for UDF.
   
   Assuming a UDF is defined as `package com.xxx.client; class name; public Long udf(Long a)`, the code gen will generate code for UDF like `Long n = com.xxx.client.name.udf((Long) value)`.
   
   In this case, the value is from Row thus is compatible with ZetaSQL, the return value is still Java object which is also be compatible with ZetaSQL, so there isn't a corrupted case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577298112



##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -129,20 +149,61 @@ static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) {
         for (RexNode rexNode : logicalCalc.getProgram().getExprList()) {
           if (rexNode instanceof RexCall) {
             RexCall call = (RexCall) rexNode;
-            if (call.getOperator() instanceof SqlUserDefinedFunction) {
+            final SqlOperator operator = call.getOperator();
+
+            CallImplementor implementor = RexImpTable.INSTANCE.get(operator);
+            if (implementor == null) {
+              // Reject methods with no implimentation
+              return false;

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780189510


   
   > The generated code prints the literal to a string and looks like this:
   > `c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).attachValues(java.util.Arrays.asList(new org.apache.beam.sdk.extensions. sql.provider.UdfTestProvider.IncrementFn().increment(5L))));`
   > 
   
   Based on this generated code, for a Java UDF, it's input won't lose any precision?
   
   Is there an example that the code generation generates code that casts input which causes precision lose? If not there is probably no data corruption for Java UDF. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-781484943


   The constraints added here do not limit BeamCalc to only UDFs, we would need significantly more constraints for that. They do ensure the only BeamCalc specific compliance failure is `unimplemented`. I'm happy to review any changes that loosen the constraints and hope some day there are none.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-776922290


   That isn't entirely true. Literals are stored as a BigDecimal (which is what was disabled): https://github.com/apache/beam/blob/9bbd5bd514efdbd68ac903ec416882b3cccceb2b/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlCalciteTranslationUtils.java#L211
   
   The generated code prints the literal to a string and looks like this:
   `c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).attachValues(java.util.Arrays.asList(new org.apache.beam.sdk.extensions.
   sql.provider.UdfTestProvider.IncrementFn().increment(5L))));`
   
   In earlier testing I was seeing overflow errors from gRPC, but that appears to actually be caused by DOUBLE literals (which are also stored as BigDecimal). I've added DECIMAL to the list of allowed types for literals and reverted the test changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780063274


   This is now rebased past #13934.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-776084963


   R: @ibzib 
   cc: @amaliujia 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577298008



##########
File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java
##########
@@ -222,13 +224,18 @@ public void testBinaryJavaUdf() {
     String sql =
         String.format(
             "CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); "
-                + "SELECT matches(\"a\", \"a\"), 'apple'='beta'",
+                + "SELECT matches('a', 'a'), matches('apple', 'beta')",

Review comment:
       I agree, this doesn't need to change for this PR so I reverted my changes.

##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -129,20 +149,61 @@ static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) {
         for (RexNode rexNode : logicalCalc.getProgram().getExprList()) {
           if (rexNode instanceof RexCall) {
             RexCall call = (RexCall) rexNode;
-            if (call.getOperator() instanceof SqlUserDefinedFunction) {
+            final SqlOperator operator = call.getOperator();
+
+            CallImplementor implementor = RexImpTable.INSTANCE.get(operator);
+            if (implementor == null) {
+              // Reject methods with no implimentation

Review comment:
       Done.

##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -81,6 +89,13 @@
   "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
 })
 public class ZetaSQLQueryPlanner implements QueryPlanner {
+  public enum Calc {

Review comment:
       I like your suggestion for now, simpler is better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577276158



##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -81,6 +89,13 @@
   "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
 })
 public class ZetaSQLQueryPlanner implements QueryPlanner {
+  public enum Calc {

Review comment:
       > I was trying not to add the ability to select arbitrary rules to the public API, but I'll do that instead.
   
   Good point. I'll leave it up to you, I'm fine with either way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577271985



##########
File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java
##########
@@ -222,13 +224,18 @@ public void testBinaryJavaUdf() {
     String sql =
         String.format(
             "CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); "
-                + "SELECT matches(\"a\", \"a\"), 'apple'='beta'",
+                + "SELECT matches('a', 'a'), matches('apple', 'beta')",

Review comment:
       The function is `testBinaryJavaUdf`, so I assume it is trying to test binary UDFs. I fixed what I see as a typo and reverted the changes disabling the test. I'm happy to revert this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577273339



##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -81,6 +89,13 @@
   "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
 })
 public class ZetaSQLQueryPlanner implements QueryPlanner {
+  public enum Calc {

Review comment:
       On second thought, why do we have the enum at all? Can we not pass the rules directly?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577274667



##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -81,6 +89,13 @@
   "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
 })
 public class ZetaSQLQueryPlanner implements QueryPlanner {
+  public enum Calc {

Review comment:
       I was trying not to add the ability to select arbitrary rules to the public API, but I'll do that instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-776173240


   To conclude a bit:
   
   This PR rejects unsupported types, other stuff, etc. in BeamZetaSQL UDF, while #13934 rejects mixed usage of built-in operator and UDF.
   
   So these two PRs do not conflict each other.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud closed pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud closed pull request #13898:
URL: https://github.com/apache/beam/pull/13898


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577146350



##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -81,6 +89,13 @@
   "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
 })
 public class ZetaSQLQueryPlanner implements QueryPlanner {
+  public enum Calc {

Review comment:
       Nit: can we rename this to avoid confusion with the actual [Calc](https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Calc.java)?

##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -129,20 +149,61 @@ static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) {
         for (RexNode rexNode : logicalCalc.getProgram().getExprList()) {
           if (rexNode instanceof RexCall) {
             RexCall call = (RexCall) rexNode;
-            if (call.getOperator() instanceof SqlUserDefinedFunction) {
+            final SqlOperator operator = call.getOperator();
+
+            CallImplementor implementor = RexImpTable.INSTANCE.get(operator);
+            if (implementor == null) {
+              // Reject methods with no implimentation

Review comment:
       ```suggestion
                 // Reject methods with no implementation
   ```
   

##########
File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java
##########
@@ -222,13 +224,18 @@ public void testBinaryJavaUdf() {
     String sql =
         String.format(
             "CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); "
-                + "SELECT matches(\"a\", \"a\"), 'apple'='beta'",
+                + "SELECT matches('a', 'a'), matches('apple', 'beta')",

Review comment:
       - Would the old version of this query succeed now?
   - I assume it is okay to remove the old query because we are getting the same coverage with internal compliance tests?

##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -129,20 +149,61 @@ static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) {
         for (RexNode rexNode : logicalCalc.getProgram().getExprList()) {
           if (rexNode instanceof RexCall) {
             RexCall call = (RexCall) rexNode;
-            if (call.getOperator() instanceof SqlUserDefinedFunction) {
+            final SqlOperator operator = call.getOperator();
+
+            CallImplementor implementor = RexImpTable.INSTANCE.get(operator);
+            if (implementor == null) {
+              // Reject methods with no implimentation
+              return false;

Review comment:
       We'll need to update the javadoc comment for this method to reflect these changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577275001



##########
File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java
##########
@@ -222,13 +224,18 @@ public void testBinaryJavaUdf() {
     String sql =
         String.format(
             "CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); "
-                + "SELECT matches(\"a\", \"a\"), 'apple'='beta'",
+                + "SELECT matches('a', 'a'), matches('apple', 'beta')",

Review comment:
       Currently, this tests what happens when we mix UDFs and builtin operators. We should probably keep this coverage (maybe change `matches` to `helloWorld` for simplicity) and make `testBinaryJavaUdf` a separate test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud edited a comment on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud edited a comment on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780063274


   This is now rebased past #13912.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780902965


   That works for me. I moved the first commit into it's own PR: #14009. Will one of you follow up and disable BeamJavaUdfCalcRule?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780927211






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
ibzib commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780898941


   > This PR is currently addressing the second point. There is currently data corruption introduced by UDFs. If you think it will take a significant time to address this I would suggest we roll back #13841 until you figure this out.
   
   Can we merge just the first commit from this PR, but remove `BeamJavaUdfCalcRule` from the planner by default?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780769137


   @amaliujia One example is `SELECT CAST (Double(179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368) AS NUMERIC) AS ColA` It generates this code:
   ```
   {
     final java.math.BigDecimal v = new java.math.BigDecimal(
       "179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368");
     c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).attachValues(java.util.Arrays.asList(v)));
   }
   ```
   This fixes at least 9 additional data corruption bugs (mostly around timestamps) and doesn't break any tests in Beam. I can provide you the link to the internal run of this, but if you want to debate further I would suggest we roll back first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ibzib commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
ibzib commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780903703


   > That works for me. I moved the first commit into it's own PR: #14009. Will one of you follow up and disable BeamJavaUdfCalcRule?
   
   SG, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780956621


   Ok, with another thought, I think why not we merge this PR for now? 
   
   We can start from the most strict constraints, then gradually loose the constraints.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #13898:
URL: https://github.com/apache/beam/pull/13898#discussion_r577268952



##########
File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
##########
@@ -81,6 +89,13 @@
   "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
 })
 public class ZetaSQLQueryPlanner implements QueryPlanner {
+  public enum Calc {

Review comment:
       How about `CalcRuleEnum`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-776378220


   I am still not sure why we should reject Int64 type for UDF.
   
   Assuming a UDF is defined as `Long udf(Long a)`, the code gen will generate code for UDF like `Long n = com.xxx.client.udf((Long) value)`.
   
   In this case, the value is from Row thus is compatible with ZetaSQL, the return value is still Java object which is also be compatible with ZetaSQL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780880231


   My suggestion is:
   1. if this PR's goal is to verify whether BeamCalRel is safe to run ZetaSQL operators, can you create a separate rule that allows ZetaSQL operators run in BeamCalRel(and reject those wrong cases)? 
   2. If this PR's goal is to help improve Java UDF, please hold this PR now. Kyle and I are discussing the next steps to improve current UDF implementation. This PR is a great reference to give us directions on where to check the implementation. We will need time to digest this PR and consider whether Calc splitting should go first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780950443


   Calcite does more for built-in operators than UDF. Using CAST as an example:  https://github.com/apache/calcite/blob/d815dc1209c3cc0728941e9fc81fce7d9e44c79b/core/src/main/java/org/apache/calcite/adapter/enumerable/RexToLixTranslator.java#L248 .
   
   For UDF, fortunately, Calcite does less. Basically Calcite will either generate the string representation of a value (i.e. Literal) or make a Row field access (i.e. column), and then make a reflection call on the UDF implementation. So for UDF, I think we only need to verify 1) whether Calcite does anything that make the value lose precision 2) whether Calcite generates a correct string representation for value or make correct field access to Row. The verification work will be much less compared to checking the built-in operators.
   
   I agree this PR could be an upper bound for the limitation of Java UDF, what I want to see is whether we can do better.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia edited a comment on pull request #13898: [BEAM-11747] Make BeamCalcRel safe for ZetaSQL

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #13898:
URL: https://github.com/apache/beam/pull/13898#issuecomment-780880231


   My suggestion is:
   1. if this PR's goal is to verify whether BeamCalRel is safe to run ZetaSQL operators, can you create a separate rule that allows ZetaSQL operators run in BeamCalRel(and reject those wrong cases)? 
   2. If this PR's goal is to help improve Java UDF, please hold this PR now. Kyle and I are discussing the next steps to improve current UDF implementation. This PR is a great reference to give us directions on where to check the implementation. We will need time to digest this PR and consider whether Calc splitting should go first. We will definitely consider this PR during our improving process. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org