You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Abhishek Tiwari (JIRA)" <ji...@apache.org> on 2017/08/07 19:40:01 UTC

[jira] [Updated] (GOBBLIN-67) gobblin-mapreduce.sh pulling in insufficient runtime dependencies in g0.9.0 for Kafka->HDFS ingestion

     [ https://issues.apache.org/jira/browse/GOBBLIN-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abhishek Tiwari updated GOBBLIN-67:
-----------------------------------
    Sprint: Apache Gobblin 170724, Apache Gobblin 170807  (was: Apache Gobblin 170724)

> gobblin-mapreduce.sh pulling in insufficient runtime dependencies in g0.9.0 for Kafka->HDFS ingestion
> -----------------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-67
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-67
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Michał Woś
>            Assignee: Raul A
>              Labels: Bug:Generic, Framework:Build
>
> https://github.com/linkedin/gobblin/blob/master/gobblin-docs/case-studies/Kafka-HDFS-Ingestion.md
> There is:
> source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
> extract.namespace=gobblin.extract.kafka
> and we get:
> Exception in thread main java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.KafkaSimpleSource
> as it was moved in commit 130afec71: Moving Kafka dependencies out into version specific modules (#1417)
> and it seems that module gobblin-modules where it was moved does not build at all (?)
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483 
> *Github Reporter* : [~wosiu] 
> *Github Assignee* : [~shirshanka] 
> *Github Created At* : 2016-12-22T00:11:10Z 
> *Github Updated At* : 2017-02-14T23:36:32Z 
> h3. Comments 
> ----
> *chavdar* wrote on 2016-12-22T04:38:29Z : Hi Michal,
> Are you running gobblin-example, gobblin-distribution or a different
> packaging?
> Thanks.
> On Wed, Dec 21, 2016 at 4:11 PM, Michał Woś <no...@github.com>
> wrote:
> > https://github.com/linkedin/gobblin/blob/master/gobblin-
> > docs/case-studies/Kafka-HDFS-Ingestion.md
> >
> > There is:
> > source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
> > extract.namespace=gobblin.extract.kafka
> >
> > and we gets:
> > Exception in thread main java.lang.ClassNotFoundException:
> > gobblin.source.extractor.extract.kafka.KafkaSimpleSource
> >
> > as it was moved in commit: Moving Kafka dependencies out into version
> > specific modules (#1417 <https://github.com/linkedin/gobblin/pull/1417>)
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <https://github.com/linkedin/gobblin/issues/1483>, or mute the thread
> > <https://github.com/notifications/unsubscribe-auth/AA4sG9JoGxS3egli3sOlbE-QyPlMlL6Kks5rKcAfgaJpZM4LTgpY>
> > .
> >
>  
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-268716836 
> ----
> *panagiotious* wrote on 2016-12-24T23:18:30Z : I think I am facing the same problem:
> ```
> $ bin/gobblin-mapreduce.sh --workdir gobblin-dirs/work --conf ~/gobblin/config/kafka1.pull
> [...Hadoop gossip...]
> Exception in thread main java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.KafkaSimpleSource
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:195)
>         at gobblin.runtime.JobContext.createSource(JobContext.java:216)
>         at gobblin.runtime.JobContext.<init>(JobContext.java:148)
>         at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:140)
>         at gobblin.runtime.mapreduce.MRJobLauncher.<init>(MRJobLauncher.java:130)
>         at gobblin.runtime.mapreduce.CliMRJobLauncher.<init>(CliMRJobLauncher.java:54)
>         at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:106)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> ``` 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269103855 
> ----
> [~shirshanka] wrote on 2016-12-25T00:53:21Z : @panagiotious : How are you pulling in the gobblin jars? 
> You need gobblin-kafka-08 jar on your classpath.
> gobblin-core is supposed to pull this in transitively. 
> https://mvnrepository.com/artifact/com.linkedin.gobblin/gobblin-kafka-08/0.9.0
>   
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269106010 
> ----
> *panagiotious* wrote on 2016-12-25T03:32:17Z : After extracting the tarball create from the `./gradlew clean assemble` command (`build` fails on the `metastore` tests for some weird reason that I still cannot debug - but there is no dependency referenced in the documentation), I can see `gobblin-dist/lib/gobblin-kafka-08-0.9.0-24-gec7d3a2.jar` and  `gobblin-dist/lib/gobblin-kafka-common-0.9.0-24-gec7d3a2.jar` in the `lib/` directory. I assume that directory is added in the `classpath` when I submit the job, or it would not be able to find any `jar`.
> Weirdly enough, if I copy `lib/gobblin-core-0.8.0.jar` that contains the class `KafkaSimpleSource`  from the distribution of v0.8.0 (the downloaded tarball file) to my `lib` directory, the aforementioned error (missing class) does not appear, but the job hangs and is not submitted to my cluster.
>  
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269108794 
> ----
> [~shirshanka] wrote on 2016-12-25T07:31:30Z : Is Kafka ingestion working for you in standalone mode? (http://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/#standalone) 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269113024 
> ----
> *panagiotious* wrote on 2016-12-25T07:50:25Z : Yes, perfectly! I have tried it both with the Wikipedia example and our Kafka topics.
> I just realized that the `assemble` command has been giving me an incomplete distribution. I have been using Java 7, which is deployed by the Cloudera Manager to our edge nodes, which turns out is not supported by Gobblin 0.8.0; it makes the `metastore` tests fail and (probably) some jars are silently failing to be built and included in the final `tarball`.
> I have successfully assembled Gobblin with Java 8 on a different host, but it will not run on our edge nodes. The closest I got with copying jars from version 0.8.0 to the manually assembled version 0.9.0 was getting the job submitted and then it failing with `Error: java.lang.ClassNotFoundException: gobblin.metrics.MetricContext at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at [...]`.
> At this point I should note that Java 7 fails to complete a `build` at the `metastore` tests, whereas Java 8 seems to get past that but fail when it makes extraordinary assumptions, like ZooKeeper running on the host - the host has `iptables` that would block any communication that is not expected.
> I do not understand why `assemble` would yield a different distribution than `build`, but I am pretty confident that that's the case when tests are failing. The command I've been using has been `./gradlew clean assemble -PuseHadoop2 -PhadoopVersion=2.6.0-cdh5.8.3 --stacktrace --info` in all my builds. My config file is the MR example with Kafka (with changes in the hosts of course).
> I will keep trying, although if there is no Java 7 compatibility, I am not confident I can get this up and running with CDH on our cluster. 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269113431 
> ----
> [~wosiu] wrote on 2016-12-25T12:43:55Z : Fixed for me. #1490 works :)
> FYI, the way I build:
> ```
> ./gradlew -PhadoopVersion=2.6.0-cdh5.4.3 -q clean assemble
> ```
> and run:
> ```
> LPATH=/home/michalw/gobblin-dist/lib
> BUILD=0.9.0-27-gf497089
> ./gobblin-dist/bin/gobblin-mapreduce.sh \
>     --jt yarnrm \
>     --fs hdfs://logs-hdfs-nameserivce \
>     --conf gobblin-job-config/gobblin-mr-ingestion.properties \
>     --logdir gobblin-logs \
>     --workdir /home/michalw/gobblin-work \
>     --jars $LPATH/guava-retrying-2.0.0.jar,$LPATH/kafka-avro-serializer-2.0.1.jar,$LPATH/kafka-json-serializer-2.0.1.jar,$LPATH/hadoop-common-2.6.0-cdh5.4.3.jar,$LPATH/gobblin-metrics-base-$BUILD.jar,$LPATH/gobblin-metrics-$BUILD.jar,$LPATH/gobblin-core-base-$BUILD.jar,$LPATH/gobblin-kafka-08-$BUILD.jar,$LPATH/gobblin-kafka-common-$BUILD.jar
> ```
> unfortunately I need to specify all that jars by hand.. But it works :)
> @panagiotious let me know if I may close this one. 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269121168 
> ----
> *panagiotious* wrote on 2016-12-25T23:52:26Z : Yes this looks like it works. I assumed that the `lib/` directory would have already been added in the jars dependencies, but I guess we need to explicitly specify.
> Thank you! 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269142272 
> ----
> [~shirshanka] wrote on 2016-12-28T08:04:11Z : Looks like gobblin-mapreduce.sh is selectively pulling in specific jars for its runtime deps from /lib. 
> https://github.com/linkedin/gobblin/blob/master/bin/gobblin-mapreduce.sh#L130
> That's what broke your jobs. We'll figure out the best maintainable solution long term and update the script. Let's keep this issue open.  
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269441365 
> ----
> [~wosiu] wrote on 2016-12-28T09:02:32Z : Ok, so just to refer - I created some time ago:
> https://github.com/linkedin/gobblin/issues/1466
>  
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269447676 
> ----
> [~wosiu] wrote on 2017-01-05T03:53:17Z : Meantime: Is it possible to build gobblin in a way that all jars are without version infix?
> I mean instead e.g.:
> gobblin-yarn-0.9.0-28-gcb609f2.jar
> we would have:
> gobblin-yarn.jar ?
> I'm aware I can change it after build, but maybe there is already some knob in your gradle config? 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-270557996 
> ----
> *anshuGithubData* wrote on 2017-02-01T13:05:23Z : Hello everyone, 
> I was also facing the similar issues. 
> As Woisu mentioned, I have tried to follow the steps as below. 
> (1) 
> When I do assemble with following command **./gradlew -PhadoopVersion=2.6.0-cdh5.9.0 -q clean assemble **, it fails with following error
> Task failed with an exception.
> -----------
> * What went wrong:
> Execution failed for task ':gobblin-api:javadoc'.
> > Javadoc generation failed. Generated Javadoc options file (useful for troubleshooting): '/home/cdhuser/gobblin/build/gobblin-api/tmp/javadoc/javadoc.options'
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
> ==============================================================================
> Task failed with an exception.
> -----------
> * What went wrong:
> Execution failed for task ':gobblin-rest-service:gobblin-rest-client:javadoc'.
> > Javadoc generation failed. Generated Javadoc options file (useful for troubleshooting): '/home/cdhuser/gobblin/build/gobblin-rest-client/tmp/javadoc/javadoc.options'
> (2)
> So I tried this one **./gradlew -x javadoc -PhadoopVersion=2.6.0-cdh5.9.0 -q clean assemble --stacktrace** and it was okay I believe, I get the below message.
> 2 warnings
> Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-couchbase/src/main/java/gobblin/couchbase/writer/CouchbaseWriter.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-couchbase/src/main/java/gobblin/couchbase/writer/CouchbaseWriter.java uses unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> /home/cdhuser/gobblin/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/GobblinWorkUnitsInputFormat.java:124: warning: Generating equals/hashCode implementation but without a call to superclass, even though this class does not extend java.lang.Object. If this is intentional, add '@EqualsAndHashCode(callSuper=false)' to your type.
>   @EqualsAndHashCode
>   ^
> Note: Some input files use or override a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: /home/cdhuser/gobblin/gobblin-compaction/src/main/java/gobblin/compaction/mapreduce/avro/AvroKeyDedupReducer.java uses unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> Note: Some input files use or override a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: Some input files use or override a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: Some input files use unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> 1 warning
> Note: Some input files use unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> Note: Some input files use or override a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: /home/cdhuser/gobblin/gobblin-cluster/src/main/java/gobblin/cluster/GobblinHelixTaskStateTracker.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: /home/cdhuser/gobblin/gobblin-cluster/src/main/java/gobblin/cluster/GobblinHelixJob.java uses unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> Note: /home/cdhuser/gobblin/gobblin-aws/src/main/java/gobblin/aws/GobblinAWSClusterLauncher.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-helix/src/main/java/gobblin/runtime/ZkDatasetStateStore.java uses unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-azkaban/src/main/java/gobblin/azkaban/AzkabanIntegrationTestLauncher.java uses unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> (3)
> Then I am executing the following command 
> LPATH=/home/cdhuser/gobblin/build/gobblin-distribution/distributions/gobblin-dist/lib
> BUILD=0.9.0-120-g75ebc38
> ./bin/gobblin-mapreduce.sh \
>     --jt http://(IP where RM is running):8032 \
>     --conf confGobblinKafkaTestJobs/kafkatohdfs.pull \
>     --jars $LPATH/guava-retrying-2.0.0.jar,$LPATH/kafka-avro-serializer-2.0.1.jar,$LPATH/kafka-json-serializer-2.0.1.jar,$LPATH/hadoop-common-2.6.0-cdh5.9.0.jar,$LPATH/gobblin-metrics-base-$BUILD.jar,$LPATH/gobblin-metrics-$BUILD.jar,$LPATH/gobblin-core-base-$BUILD.jar,$LPATH/gobblin-kafka-08-$BUILD.jar,$LPATH/gobblin-kafka-common-$BUILD.jar
> It is failing with this error **Error: java.lang.ClassNotFoundException: org.reflections.Reflections**. 
> So I have added $LPATH/javassist-3.18.2-GA.jar, but same error. 
> Any help would be really appreciated!
> Thanks, 
> Anshu
>  
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-276652455 
> ----
> *anshuGithubData* wrote on 2017-02-01T13:44:23Z : @wosiu @panagiotious @shirshanka If you guys can please have a look into the issue I am facing (described above) and help me out that would be great!
> Thanks
> Anshu 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-276660177 
> ----
> [~wosiu] wrote on 2017-02-14T23:34:41Z : @anshuGithubData
> Yes, after upgrading to the current master (commit: a1b9bf579fb6) I've got the same
> `It is failing with this error Error: java.lang.ClassNotFoundException: org.reflections.Reflections.`
> Appending following to --jars flag did the thing for me:
> $LPATH/reflections-0.9.10.jar,$LPATH/javassist-3.18.2-GA.jar,$LPATH/opencsv-3.8.jar
> Also it seems that authors advice to use gobblin.sh instead of gobblin-mapreduce.sh (?) Although I still use gobblin-mapreduce.sh as it still works fine for me.
>  
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1483#issuecomment-279871219



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)