You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/08 18:03:54 UTC
[GitHub] [incubator-hudi] hikiyoung opened a new issue #1499: [SUPPORT]
DeltaStreamer - NoClassDefFoundError for HiveDriver
hikiyoung opened a new issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499
**_Tips before filing an issue_**
- Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
- Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
- If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
**Describe the problem you faced**
Using DeltaStreamer with --enable-hive-sync and it throws NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; error.
Should I change something in the default compilation process to include this class?
**To Reproduce**
Steps to reproduce the behavior:
1. Properties file
```
include=base.properties
hoodie.datasource.write.recordkey.field=ORDERNUMBER
hoodie.datasource.write.partitionpath.field=PARTITIONPATH
hoodie.datasource.hive_sync.assume_date_partitioning=false
hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/hadoop/hudi/config/orders_hudi_schema.avro
hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/hadoop/hudi/config/orders_hudi_schema.avro
hoodie.deltastreamer.source.kafka.topic=orders_hudi_v1
bootstrap.servers=kafka-broker-1:9092
auto.offset.reset=smallest
hoodie.datasource.hive_sync.database=hudi
hoodie.datasource.hive_sync.table=orders_hudi_cow
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000
hoodie.datasource.hive_sync.username=hive
hoodie.datasource.hive_sync.password=hive
hoodie.datasource.hive_sync.partition_fields=PARTITIONPATH
ionValueExtractor
```
2. Launch script with HoodieDeltaStreamer
```
TARGET_DATABASE="hudi"
TRAGET_TABLE="orders_hudi"
HUDI_UTILITIES_BUNDLE="file:///usr/lib/hudi/hudi-utilities-bundle.jar"
TARGET_BASE_PATH="s3://data-store/$TARGET_DATABASE/$TRAGET_TABLE"
PROPS="file:///home/hadoop/hudi/config/kafka-source.properties"
CHECKPOINT_BASE_PATH="s3://data-store/checkpoint/$TARGET_DATABASE/$TRAGET_TABLE"
spark-submit \
--conf 'spark.jars=/usr/lib/hudi/hudi-hadoop-mr-bundle.jar,/usr/lib/hudi/hudi-hive-bundle.jar,/usr/lib/hudi/hudi-presto-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/hudi/hudi-timeline-server-bundle.jar' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--master yarn \
--deploy-mode client \
--jars /usr/lib/spark/jars/httpclient-4.5.9.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
--storage-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--target-base-path $TARGET_BASE_PATH \
--target-table "$TARGET_DATABASE.$TRAGET_TABLE" \
--source-ordering-field UPDATEDATE \
--enable-hive-sync \
--continuous \
--props $PROPS \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
```
**Expected behavior**
Sync to hive
**Environment Description**
EMR 2.59.0
* Hudi version : 0.5.0-inc
* Spark version : 2.4.4
* Hive version : 2.3.6
* Hadoop version : 2.8.5
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```
20/04/08 17:54:22 INFO YarnScheduler: Removed TaskSet 39.0, whose tasks have all completed, from pool
20/04/08 17:54:22 INFO DAGScheduler: ResultStage 39 (collect at HoodieRealtimeTableCompactor.java:200) finished in 3.432 s
20/04/08 17:54:22 INFO DAGScheduler: Job 13 finished: collect at HoodieRealtimeTableCompactor.java:200, took 3.436397 s
20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/.aux/20200408175418.compaction.requested
20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/.aux/20200408175418.compaction.requested
20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/20200408175418.compaction.requested
20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/20200408175418.compaction.requested
20/04/08 17:54:22 INFO S3NativeFileSystem: Opening 's3://data-store/hudi/orders_hudi_cow/.hoodie/hoodie.properties' for reading
20/04/08 17:49:45 INFO Utils: Supplied authorities: localhost:10000
20/04/08 17:49:45 INFO Utils: Resolved authority: localhost:10000
20/04/08 17:49:45 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
20/04/08 17:49:46 WARN AbstractDeltaStreamerService: Gracefully shutting down compactor
20/04/08 17:50:27 ERROR AbstractDeltaStreamerService: Service shutdown with error
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:70)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:116)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111)
at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/04/08 17:50:27 ERROR AbstractDeltaStreamerService: Monitor noticed one or more threads failed. Requesting graceful shutdown of other threads
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.lambda$monitorThreads$0(AbstractDeltaStreamerService.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111)
at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 3 more
20/04/08 17:50:27 INFO Javalin: Stopping Javalin ...
20/04/08 17:50:27 INFO SparkUI: Stopped Spark web UI at http://xxx.yyy.compute.internal:4040
20/04/08 17:50:27 INFO Javalin: Javalin has stopped
20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Interrupting monitor thread
20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Shutting down all executors
20/04/08 17:50:27 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
20/04/08 17:50:27 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Stopped
20/04/08 17:50:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/04/08 17:50:27 INFO MemoryStore: MemoryStore cleared
20/04/08 17:50:27 INFO BlockManager: BlockManager stopped
20/04/08 17:50:27 INFO BlockManagerMaster: BlockManagerMaster stopped
20/04/08 17:50:27 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/04/08 17:50:27 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:70)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:116)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111)
at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on issue #1499: [SUPPORT]
DeltaStreamer - NoClassDefFoundError for HiveDriver
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611627711
> It works with the way #1. Thank you so much @lamber-ken .
You're welcome : )
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] hikiyoung commented on issue #1499: [SUPPORT]
DeltaStreamer - NoClassDefFoundError for HiveDriver
Posted by GitBox <gi...@apache.org>.
hikiyoung commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611117191
@lamber-ken
Thanks for the quick response. Let me try.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on issue #1499: [SUPPORT]
DeltaStreamer - NoClassDefFoundError for HiveDriver
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611114936
hi @hikiyoung, there are two ways to solve your problem
1. Keep hudi-0.5.0 version
```
1. checkout 0.5.0 branch
https://github.com/apache/incubator-hudi/tree/release-0.5.0
2. remove these code
https://github.com/apache/incubator-hudi/blob/release-0.5.0/packaging/hudi-spark-bundle/pom.xml
<relocation>
<pattern>org.apache.hive.jdbc.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hive.jdbc.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.metastore.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hadoop_hive.metastore.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hive.common.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hive.common.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.common.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hadoop_hive.common.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.conf.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hadoop_hive.conf.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hive.service.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hive.service.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.service.</pattern>
<shadedPattern>org.apache.hudi.org.apache.hadoop_hive.service.</shadedPattern>
</relocation>
3. compile hudi package
```
2. Upgrade hudi to 0.5.2 version
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] lamber-ken closed issue #1499: [SUPPORT]
DeltaStreamer - NoClassDefFoundError for HiveDriver
Posted by GitBox <gi...@apache.org>.
lamber-ken closed issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] hikiyoung commented on issue #1499: [SUPPORT]
DeltaStreamer - NoClassDefFoundError for HiveDriver
Posted by GitBox <gi...@apache.org>.
hikiyoung commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611452422
It works with the way #1. Thank you so much @lamber-ken .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services