You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/09/29 00:12:09 UTC

[GitHub] [incubator-pinot] jackjlli opened a new pull request #6070: Add Hadoop related dependencies in pinot-tool module

jackjlli opened a new pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070


   ## Description
   This PR adds Hadoop related dependencies (only **runtime** scope) in pinot-tool module.
   These Hadoop jars are needed in some Pinot commands at runtime, such as `CreateSegment` for ORC files.
   
   In pinot-orc pom file, those Hadoop dependencies are in provided scope, which means the transitive dependencies won't get pulled in at runtime.
   https://www.baeldung.com/maven-dependency-scopes#:~:text=Transitive%20dependencies%2C%20on%20the%20other,%3A%20mvn%20dependency%3Atree%20command
   
   Current exception:
   ```
   java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FileSystem
   	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_172]
   	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_172]
   	at org.apache.pinot.tools.admin.command.CreateSegmentCommand.execute(CreateSegmentCommand.java:264) ~[classes/:?]
   	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:152) [classes/:?]
   	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:164) [classes/:?]
   Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FileSystem
   	at java.lang.Class.getDeclaredConstructors0(Native Method) ~[?:1.8.0_172]
   	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) ~[?:1.8.0_172]
   	at java.lang.Class.getConstructor0(Class.java:3075) ~[?:1.8.0_172]
   	at java.lang.Class.getConstructor(Class.java:1825) ~[?:1.8.0_172]
   	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:270) ~[classes/:?]
   	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:239) ~[classes/:?]
   	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:220) ~[classes/:?]
   	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:132) ~[classes/:?]
   	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReader(RecordReaderFactory.java:156) ~[classes/:?]
   	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReader(RecordReaderFactory.java:143) ~[classes/:?]
   	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:129) ~[classes/:?]
   	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:96) ~[classes/:?]
   	at org.apache.pinot.tools.admin.command.CreateSegmentCommand.lambda$execute$0(CreateSegmentCommand.java:238) ~[classes/:?]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_172]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_172]
   	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_172]
   Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FileSystem
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_172]
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_172]
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_172]
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_172]
   	at java.lang.Class.getDeclaredConstructors0(Native Method) ~[?:1.8.0_172]
   	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) ~[?:1.8.0_172]
   	at java.lang.Class.getConstructor0(Class.java:3075) ~[?:1.8.0_172]
   	at java.lang.Class.getConstructor(Class.java:1825) ~[?:1.8.0_172]
   	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:270) ~[classes/:?]
   	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:239) ~[classes/:?]
   	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:220) ~[classes/:?]
   	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReaderByClass(RecordReaderFactory.java:132) ~[classes/:?]
   	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReader(RecordReaderFactory.java:156) ~[classes/:?]
   	at org.apache.pinot.spi.data.readers.RecordReaderFactory.getRecordReader(RecordReaderFactory.java:143) ~[classes/:?]
   	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.getRecordReader(SegmentIndexCreationDriverImpl.java:129) ~[classes/:?]
   	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:96) ~[classes/:?]
   	at org.apache.pinot.tools.admin.command.CreateSegmentCommand.lambda$execute$0(CreateSegmentCommand.java:238) ~[classes/:?]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_172]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_172]
   	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_172]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli merged pull request #6070: Add Hadoop related dependencies in pinot-tool module

Posted by GitBox <gi...@apache.org>.
jackjlli merged pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #6070: Add Hadoop related dependencies in pinot-tool module

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070#issuecomment-700901988


   Why was this approved?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6070: Add Hadoop related dependencies in pinot-tool module

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070#issuecomment-701903297


   Discussed with @mayankshriv. The issue before this PR is that pinot-orc and pinot-parquet module needs Hadoop libraries. While the Hadoop dependencies are in provided scope, which means these Hadoop jars will not be included in the classpath. Thus, we encounter `NoClassDefFoundError` shown in the description of this PR above. PluginManager doesn't help here because the prerequisite is that it requires the jars that contains the needed classes to be in the classpath at the first place.
   
   There are two ways to solve this issue:
   1. configure the pom so that these Hadoop jars can be shown in the classpath, like this PR does. Another way is to specify the scope of Hadoop dependencies to `compile` in the pom files of pinot-orc and pinot-parquet.
   2. users manually add those Hadoop jars in the classpath when running pinot commands.
   
   The 1st approach seems better, because for the 2nd one, users have to manually add these jars to the classpath one by one. And it'll take multiple times for users to keep adding the missing jars. Plus, we have to tell users to do that.
   
   As to the 1st approach, we can revert the change in this PR and put the Hadoop dependencies down to the pinot-orc and pinot-parquet modules, so that if we don't want to support some formats in the future, the Hadoop dependencies can be removed altogether. I've tested this change in the HadoopSegmentCreationJob and it works well. Here is the new PR: https://github.com/apache/incubator-pinot/pull/6088.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6070: Add Hadoop related dependencies in pinot-tool module

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070#issuecomment-700930974


   @mayankshriv the Hadoop jars are also needed for parquet as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6070: Add Hadoop related dependencies in pinot-tool module

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070#issuecomment-700932259


   @kishoreg we do use `PluginManager` to instantiate the specific record reader. It's just that the required Hadoop jars are not pulled in because they are transitive dependencies and of provided scope in the repo. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #6070: Add Hadoop related dependencies in pinot-tool module

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #6070:
URL: https://github.com/apache/incubator-pinot/pull/6070#issuecomment-700902220


   We should be using plugins for this rt


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org