You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/10 23:07:46 UTC

[GitHub] [beam] TheNeuralBit commented on pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers

TheNeuralBit commented on pull request #12202:
URL: https://github.com/apache/beam/pull/12202#issuecomment-656932229


   I looked into the test failure. I found that if I change the dependency configuration from `provided` to `compile` here it fixes the test:
   https://github.com/apache/beam/blob/65297802aaaddda66b3fda4bafb15640f0fc3530/sdks/java/extensions/sql/build.gradle#L61
   
   From the stacktrace:
   ```
   java.util.ServiceConfigurationError: org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider: Provider org.apache.beam.sdk.extensions.sql.meta.provider.parquet.ParquetTableProvider could not be instantiated
   	at java.util.ServiceLoader.fail(ServiceLoader.java:232)
   	at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
   	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
   	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
   	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
   	at org.apache.beam.sdk.extensions.sql.impl.BeamCalciteSchemaFactory$AllProviders.getTableProvider(BeamCalciteSchemaFactory.java:86)
   ...
   Caused by: java.lang.NoClassDefFoundError: org/apache/beam/sdk/io/parquet/ParquetSchemaCapableIOProvider
   	at org.apache.beam.sdk.extensions.sql.meta.provider.parquet.ParquetTableProvider.<init>(ParquetTableProvider.java:47)
   ```
   
   You can see the error is occurring when we try to instantiate a class from the parquet package at runtime, because the class can't be found. It looks like this may have been a problem before your PR, but it didn't come up because we just weren't exercising code that called the parquet package.
   
   TBH I don't have a great handle on the difference between these dependency configurations. My understanding of `compile` vs. `provided` is that `compile` will include the compiled java in the artifact, but `provided` assumes that it will be provided by some other jar on the classpath (useful SO answer: https://stackoverflow.com/questions/30731084/provided-dependency-in-gradle). So it seems what's happening is the parquet package is there when we compile, but nothing is adding it to the classpath when we run JdbcJarTest.
   
   I'm not sure why these IO dependencies are `provided` in the first place. I think maybe the intention is that way users can just include the IOs that they intend to use, but this seems problematic when BeamCalciteSchemaFactory is loading every TableProvider implementation: https://github.com/apache/beam/blob/65297802aaaddda66b3fda4bafb15640f0fc3530/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java#L86
   
   My suggestion would be to just make parquet a `compile`, dependency like we've already done for mongo. (cc @lukecwik and @kennknowles in case they think this is a bad idea).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org