You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/25 05:40:14 UTC

[GitHub] [iceberg] pan3793 opened a new issue #2382: Lack test on runtime jar

pan3793 opened a new issue #2382:
URL: https://github.com/apache/iceberg/issues/2382


   #2331 relocated the antlr4 classes in spark3 runtime module jar, but it cause several issues.
   
   For example,
   ```
   spark.sql("ALTER TABLE t2 WRITE ORDERED BY birthday ASC, height DESC NULL FIRST, weight ASC NULL LAST")
   ```
   will cause
   ```
   Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.parser.UpperCaseCharStream.<init>(Lorg/apache/iceberg/shaded/org/antlr/v4/runtime/CodePointCharStream;)V
   	at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parse(IcebergSparkSqlExtensionsParser.scala:124)
   	at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parsePlan(IcebergSparkSqlExtensionsParser.scala:105)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:605)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
   	at com.hiscat.iceberg.spark.QuickStart$.main(QuickStart.scala:70)
   	at com.hiscat.iceberg.spark.QuickStart.main(QuickStart.scala)
   ```
   
   This because spark `UpperCaseCharStream` refer the original antlr4 `CharStream`, but Iceberg invoke the with relocated one.
   
   And I think the root cause here is we just build a runtime jar with relocation, but has none test on it.
   
   Temporarily, I think we should revert #2331, because it broken the runtime jar. 
   But we do need relocated antrl4, because Spark 3.1 use a different version Antlr4 which will conflict with iceberg.
   cc @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813741310


   Hit the same issue. Hi @pan3793, are we going to revert #2331 anytime soon?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813763662


   @flyrain Are you planning to write IT for runtime jar modules? I thought that some Committer or PMC will revert the bad commit. Should I raise a revert PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 edited a comment on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813765958


   > I think we should copy class UpperCaseCharStream from Spark to Iceberg rather than reverting #2331.
   
   It's not such a simple thing. `UpperCaseCharStream` is one of the issues I hit, and after a fast glance, I found another one. It like a Whac-a-Mole game.
   
   So the safe solution I think is build a IT for the runtime jar modules rather than do manual test again and again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-809011235


   cc @RussellSpitzer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813763731


   I think we should copy class `UpperCaseCharStream` from Spark to Iceberg rather than reverting #2331. In that way, we can still shade antlr4 runtime, which isolates antlr4 dependency for Iceberg. Using the same version of antlr4 from Spark may cause problem in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814485591


   If it is a matter of copying a few classes to make shading work, I think we should do that. If not, let's revert the shading PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain removed a comment on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
flyrain removed a comment on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813742651


   Or use the unshaded version of `CharStreams` in the following code if we keep shading Antlr4
   ```
       val lexer = new IcebergSqlExtensionsLexer(new UpperCaseCharStream(CharStreams.fromString(command)))
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813742651


   Or use the unshaded version of `CharStreams` in the following code if we keep shading Antlr4
   ```
       val lexer = new IcebergSqlExtensionsLexer(new UpperCaseCharStream(CharStreams.fromString(command)))
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain edited a comment on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
flyrain edited a comment on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813763731


   I think we should copy class `UpperCaseCharStream` from Spark to Iceberg rather than reverting #2331. In that way, we can still shade antlr4 runtime, which isolates antlr4 dependency for Iceberg. Using the same version of antlr4 from Spark may cause problem in the future. cc @pan3793, @aokolnychyi , @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814485296


   I took a look as well. The only place we interact with antlr4 from Spark is in `IcebergSparkSqlExtensionsParser#parse`.
   
   Our `IcebergSqlExtensionsPostProcessor` is actually a copy of `PostProcessor` in Spark. We thought that is sufficient but it looks like not if we are shading. If I am correct and that's the only place we interact with antlr4 from Spark, the number of classes we should copy should be limited. I guess `UpperCaseCharStream`, `ParseErrorListener`, `ParseException`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 edited a comment on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814554216


   Agree with @aokolnychyi analysis, we should redefine `UpperCaseCharStream`, `ParseErrorListener`, `ParseException` in Iceberg. And if we done that, the shading should work, but it's better to do it after #2428 get in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-809019427


   We may need to provide several integration tests to validate that the spark-runtime jar & flink-runtime jar. Those tests will set up  a spark/flink mini cluster, and then load generated the spark/flink runtime jar, and finally run the sql script to do the validation.  In this way,  we could avoid the broken case when merging patches.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 closed issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 closed issue #2382:
URL: https://github.com/apache/iceberg/issues/2382


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-809081325


   For Spark we could just run in local mode, no need to setup a full cluster. The classpath issues would be the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813765958


   >> I think we should copy class UpperCaseCharStream from Spark to Iceberg rather than reverting #2331.
   
   It's not such a simple thing. `UpperCaseCharStream` is one of the issues I hit, and after a fast glance, I found another one. It like a Whac-a-Mole game.
   
   So the safe solution I think is build a IT for the runtime jar modules rather than do manual test again and again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 edited a comment on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814554216


   Agree with @aokolnychyi analysis, we should redefine `UpperCaseCharStream`, `ParseErrorListener`, `ParseException` in Iceberg. And if we done that, the shading should work, it's better to do it after #2428 get in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814554216


   Agreed with @aokolnychyi analysis, we should redefine `UpperCaseCharStream`, `ParseErrorListener`, `ParseException` in Iceberg. And if we done that, the shading should work, it's better to do it after #2428 get in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814485296


   I took a look as well. The only place we interact with antlr4 from Spark is in `IcebergSparkSqlExtensionsParser#parse`.
   
   Our `IcebergSqlExtensionsPostProcessor` is actually a copy of `PostProcessor` in Spark. We thought that is sufficient but it looks like not if we are shading. If I am correct and that's the only place we interact with antlr4 from Spark, the number of classes we have to copy should be limited. I guess `UpperCaseCharStream`, `ParseErrorListener`, `ParseException`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 closed issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 closed issue #2382:
URL: https://github.com/apache/iceberg/issues/2382


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813783170


   So is the issue here that because we extend Spark's Parser we end up shading Spark's internal references to Antlr Runtime code which causes our issues. 
   
   If this is the case I think we should drop the runtime reference altogether, although I really dislike it when library code depends on a transitive of another library.
   
   Another option is to mark the Antlr-runtime version as "Strictly" so that we at least fail quickly if Spark changes the version from underneath us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-813799091


   > If this is the case I think we should drop the runtime reference altogether, although I really dislike it when library code depends on a transitive of another library.
   
   agree with that. but we need IT(or something else, maybe gradle has such plugins) to enforce it rather than manual check.
   
   > Another option is to mark the Antlr-runtime version as "Strictly" so that we at least fail quickly if Spark changes the version from underneath us.
   
   Spark 3.1.1 upgraded antlr4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on issue #2382: Lack test on runtime jar

Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #2382:
URL: https://github.com/apache/iceberg/issues/2382#issuecomment-814268563


   Yeah, by looking at more code, reverting it is more practical. I like the fail-fast approach, but checking the antlr-runtime version is not feasible since we want a unified iceberg runtime version for all Spark 3.x, and Spark3.1.1 has changed antlr4 as @pan3793 mentioned.
   
   Yes, integration test or something similar is necessary. This issue should be considered as a blocker since every customized iceberg sql will fail, but obviously it didn't get much attention.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org