You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/08 18:03:54 UTC

[GitHub] [incubator-hudi] hikiyoung opened a new issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

hikiyoung opened a new issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499
 
 
   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Using DeltaStreamer with --enable-hive-sync and it throws NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; error.
   Should I change something in the default compilation process to include this class?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Properties file
   ```
   include=base.properties
   hoodie.datasource.write.recordkey.field=ORDERNUMBER
   hoodie.datasource.write.partitionpath.field=PARTITIONPATH
   hoodie.datasource.hive_sync.assume_date_partitioning=false
   hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/hadoop/hudi/config/orders_hudi_schema.avro
   hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/hadoop/hudi/config/orders_hudi_schema.avro
   
   hoodie.deltastreamer.source.kafka.topic=orders_hudi_v1
   bootstrap.servers=kafka-broker-1:9092
   auto.offset.reset=smallest
   
   hoodie.datasource.hive_sync.database=hudi
   hoodie.datasource.hive_sync.table=orders_hudi_cow
   hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000
   hoodie.datasource.hive_sync.username=hive
   hoodie.datasource.hive_sync.password=hive
   hoodie.datasource.hive_sync.partition_fields=PARTITIONPATH
   ionValueExtractor
   ```
   2. Launch script with HoodieDeltaStreamer
   ```
   TARGET_DATABASE="hudi"
   TRAGET_TABLE="orders_hudi"
   HUDI_UTILITIES_BUNDLE="file:///usr/lib/hudi/hudi-utilities-bundle.jar"
   TARGET_BASE_PATH="s3://data-store/$TARGET_DATABASE/$TRAGET_TABLE"
   PROPS="file:///home/hadoop/hudi/config/kafka-source.properties"
   CHECKPOINT_BASE_PATH="s3://data-store/checkpoint/$TARGET_DATABASE/$TRAGET_TABLE"
   
   spark-submit \
     --conf 'spark.jars=/usr/lib/hudi/hudi-hadoop-mr-bundle.jar,/usr/lib/hudi/hudi-hive-bundle.jar,/usr/lib/hudi/hudi-presto-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/hudi/hudi-timeline-server-bundle.jar'  \
     --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
     --master yarn \
     --deploy-mode client \
     --jars /usr/lib/spark/jars/httpclient-4.5.9.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar \
     --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
     --storage-type MERGE_ON_READ \
     --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
     --target-base-path $TARGET_BASE_PATH \
     --target-table "$TARGET_DATABASE.$TRAGET_TABLE" \
     --source-ordering-field UPDATEDATE \
     --enable-hive-sync \
     --continuous \
     --props $PROPS \
     --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   ```
   
   
   **Expected behavior**
   
   Sync to hive
   
   **Environment Description**
   EMR 2.59.0
   
   * Hudi version : 0.5.0-inc
   
   * Spark version : 2.4.4
   
   * Hive version : 2.3.6
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   20/04/08 17:54:22 INFO YarnScheduler: Removed TaskSet 39.0, whose tasks have all completed, from pool
   20/04/08 17:54:22 INFO DAGScheduler: ResultStage 39 (collect at HoodieRealtimeTableCompactor.java:200) finished in 3.432 s
   20/04/08 17:54:22 INFO DAGScheduler: Job 13 finished: collect at HoodieRealtimeTableCompactor.java:200, took 3.436397 s
   20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/.aux/20200408175418.compaction.requested
   20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/.aux/20200408175418.compaction.requested
   20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/20200408175418.compaction.requested
   20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/20200408175418.compaction.requested
   20/04/08 17:54:22 INFO S3NativeFileSystem: Opening 's3://data-store/hudi/orders_hudi_cow/.hoodie/hoodie.properties' for reading
   20/04/08 17:49:45 INFO Utils: Supplied authorities: localhost:10000
   20/04/08 17:49:45 INFO Utils: Resolved authority: localhost:10000
   20/04/08 17:49:45 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
   20/04/08 17:49:46 WARN AbstractDeltaStreamerService: Gracefully shutting down compactor
   
   
   
   20/04/08 17:50:27 ERROR AbstractDeltaStreamerService: Service shutdown with error
   java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
           at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
           at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:70)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:116)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
           at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111)
           at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390)
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   20/04/08 17:50:27 ERROR AbstractDeltaStreamerService: Monitor noticed one or more threads failed. Requesting graceful shutdown of other threads
   java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
           at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
           at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.lambda$monitorThreads$0(AbstractDeltaStreamerService.java:134)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
           at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111)
           at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390)
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
           ... 3 more
   20/04/08 17:50:27 INFO Javalin: Stopping Javalin ...
   20/04/08 17:50:27 INFO SparkUI: Stopped Spark web UI at http://xxx.yyy.compute.internal:4040
   20/04/08 17:50:27 INFO Javalin: Javalin has stopped
   20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Interrupting monitor thread
   20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Shutting down all executors
   20/04/08 17:50:27 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
   20/04/08 17:50:27 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
   (serviceOption=None,
    services=List(),
    started=false)
   20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Stopped
   20/04/08 17:50:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   20/04/08 17:50:27 INFO MemoryStore: MemoryStore cleared
   20/04/08 17:50:27 INFO BlockManager: BlockManager stopped
   20/04/08 17:50:27 INFO BlockManagerMaster: BlockManagerMaster stopped
   20/04/08 17:50:27 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   20/04/08 17:50:27 INFO SparkContext: Successfully stopped SparkContext
   Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
           at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
           at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:70)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:116)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive;
           at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111)
           at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390)
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

Posted by GitBox <gi...@apache.org>.
lamber-ken commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611627711
 
 
   > It works with the way #1. Thank you so much @lamber-ken .
   
   You're welcome : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] hikiyoung commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

Posted by GitBox <gi...@apache.org>.
hikiyoung commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611117191
 
 
   @lamber-ken 
   Thanks for the quick response. Let me try.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

Posted by GitBox <gi...@apache.org>.
lamber-ken commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611114936
 
 
   hi @hikiyoung, there are two ways to solve your problem
   
   1. Keep hudi-0.5.0 version
   
   ```
   1. checkout 0.5.0 branch
   https://github.com/apache/incubator-hudi/tree/release-0.5.0
   
   2. remove these code 
   https://github.com/apache/incubator-hudi/blob/release-0.5.0/packaging/hudi-spark-bundle/pom.xml
   
   <relocation>
     <pattern>org.apache.hive.jdbc.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hive.jdbc.</shadedPattern>
   </relocation>
   <relocation>
     <pattern>org.apache.hadoop.hive.metastore.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hadoop_hive.metastore.</shadedPattern>
   </relocation>
   <relocation>
     <pattern>org.apache.hive.common.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hive.common.</shadedPattern>
   </relocation>
   <relocation>
     <pattern>org.apache.hadoop.hive.common.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hadoop_hive.common.</shadedPattern>
   </relocation>
   <relocation>
     <pattern>org.apache.hadoop.hive.conf.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hadoop_hive.conf.</shadedPattern>
   </relocation>
   <relocation>
     <pattern>org.apache.hive.service.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hive.service.</shadedPattern>
   </relocation>
   <relocation>
     <pattern>org.apache.hadoop.hive.service.</pattern>
     <shadedPattern>org.apache.hudi.org.apache.hadoop_hive.service.</shadedPattern>
   </relocation>
   
   3. compile hudi package
   ```
   
   2. Upgrade hudi to 0.5.2 version

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken closed issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

Posted by GitBox <gi...@apache.org>.
lamber-ken closed issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] hikiyoung commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

Posted by GitBox <gi...@apache.org>.
hikiyoung commented on issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1499#issuecomment-611452422
 
 
   It works with the way #1. Thank you so much @lamber-ken .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services