You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/07 11:38:11 UTC

[GitHub] [hudi] praveenkmr opened a new issue, #6623: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index [SUPPORT]

praveenkmr opened a new issue, #6623:
URL: https://github.com/apache/hudi/issues/6623

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   During the upgradation of Hudi Pipelines from v0.6.0 to v0.9.0. I am facing ClassNotFoundException for those Pipelines where HBase is being used as an index.
   
   Hudi v0.6.0 was running fine on EMR v5.31.0 but the pipeline with the same configuration is failing in Hudi v0.9.0 in EMR 5.35.0 Cluster. HBase is also hosted in a separate EMR v5.31.0 cluster.
   
   While trying with the spark CLI, I am able to connect to HBase and able to write the Data but when trying with spark-submit it was failing.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create an HBase Cluster in EMR v5.31.0.
   2. Trigger Spark Submit in EMR v5.35.0 in order to load the data in Hudi Table. The Job will fail with ClassNotFoundException 
   
   **Expected behavior**
   
   The job should be able to connect to HBase and load data into Hudi table.
   
   **Environment Description**
   
   * Hudi version: 0.9.0
   
   * Spark version: 2.4.8
   
   * Hive version: 2.3.9
   
   * Hadoop version: 2.10.1
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   The hudi configuration that is being used 
   
   ```
    "hudi_properties": {
      "hoodie.clean.async": "false",
      "hoodie.clean.automatic": "true",
      "hoodie.cleaner.commits.retained": "10",
      "hoodie.cleaner.parallelism": "500",
      "hoodie.consistency.check.enabled": "true",
      "hoodie.datasource.hive_sync.enable": "true",
      "hoodie.datasource.hive_sync.database": "<hudi_database>",
      "hoodie.datasource.hive_sync.table": "<hudi_table_name>",
      "hoodie.datasource.hive_sync.partition_fields": "<partition_key>",
      "hoodie.datasource.hive_sync.assume_date_partitioning": "false",
      "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",   
      "hoodie.datasource.write.table.type": "COPY_ON_WRITE",
      "hoodie.datasource.write.operation": "upsert",
      "hoodie.datasource.write.partitionpath.field": "<partition_key>",
      "hoodie.datasource.write.precombine.field": "<precombine_key>",
      "hoodie.datasource.write.recordkey.field": "<primary_key>",
      "hoodie.datasource.write.streaming.ignore.failed.batch": "false",
      "hoodie.datasource.write.hive_style_partitioning": "true",
      "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
      "hoodie.hbase.index.update.partition.path": "true",
      "hoodie.index.hbase.get.batch.size": "1000",
      "hoodie.index.hbase.max.qps.per.region.server": "1000",
      "hoodie.index.hbase.put.batch.size": "1000",
      "hoodie.index.hbase.qps.allocator.class": "org.apache.hudi.index.hbase.DefaultHBaseQPSResourceAllocator",
      "hoodie.index.hbase.qps.fraction": "0.5",
      "hoodie.index.hbase.rollback.sync": "true",
      "hoodie.index.hbase.table": "<hbase_table_name>",
      "hoodie.index.hbase.zknode.path": "/hbase",
      "hoodie.index.hbase.zkport": "2181",
      "hoodie.index.hbase.zkquorum": "<hudi_hbase_cluster_private_dns>",
      "hoodie.index.type": "HBASE",
      "hoodie.memory.compaction.fraction": "0.8",
      "hoodie.parquet.block.size": "152043520",
      "hoodie.parquet.compression.codec": "snappy",
      "hoodie.parquet.max.file.size": "152043520",
      "hoodie.parquet.small.file.limit": "104857600",
      "hoodie.table.name": "<hudi_table_name>",
      "hoodie.upsert.shuffle.parallelism": "20"
    }
   ```
   
   **Stacktrace**
   
   ``` 
   Caused by: org.apache.hudi.exception.HoodieDependentSystemUnavailableException: System HBASE unavailable. Tried to connect to ip-xxx-xx-xx-xx.ec2.internal:2181
    	at org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.getHBaseConnection(SparkHoodieHBaseIndex.java:153)
    	at org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.lambda$locationTagFunction$eda54cbe$1(SparkHoodieHBaseIndex.java:217)
    	at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    	at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    	at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
    	at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
    	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1181)
    	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1155)
    	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1090)
    	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1155)
    	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:881)
    	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:95)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    	at org.apache.spark.scheduler.Task.run(Task.scala:123)
    	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
    	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	... 1 more
    Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
    	at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
    	at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
    	at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
    	at org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.getHBaseConnection(SparkHoodieHBaseIndex.java:151)
    	... 31 more
    Caused by: java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    	at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    	... 34 more
    Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
    	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2460)
    	at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656)
    	... 39 more
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
    	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2428)
    	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2452)
    	... 40 more
    Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
    	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2332)
    	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2426)
    	... 41 more 
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua closed issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

Posted by GitBox <gi...@apache.org>.

yihua closed issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index
URL: https://github.com/apache/hudi/issues/6623


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

Posted by GitBox <gi...@apache.org>.

yihua commented on issue #6623:
URL: https://github.com/apache/hudi/issues/6623#issuecomment-1241060251

   @praveenkmr Great to hear that.
   
   For the upgrade to OSS Hudi 0.12.0 (the latest release), using hudi-spark-bundle should be sufficient as OSS Hudi 0.12.0 bundle jars work out-of-the-box on the EMR environment.  Prior to this version, you need to follow the workaround above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] praveenkmr commented on issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

Posted by GitBox <gi...@apache.org>.

praveenkmr commented on issue #6623:
URL: https://github.com/apache/hudi/issues/6623#issuecomment-1241989216

   @yihua  Thanks for the confirmation on v0.12.0 ( latest version). Will try to test on the same code in the latest releases also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

Posted by GitBox <gi...@apache.org>.

yihua commented on issue #6623:
URL: https://github.com/apache/hudi/issues/6623#issuecomment-1239698510

   cc @umehrot2 @rahil-c @CTTY 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

Posted by GitBox <gi...@apache.org>.

yihua commented on issue #6623:
URL: https://github.com/apache/hudi/issues/6623#issuecomment-1239696874

   @praveenkmr have you tried spark-submit or spark-shell with the suggested workaround in https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-considerations.html?
   ```
   spark-submit \
   --jars /usr/lib/spark/external/lib/spark-avro.jar,/usr/lib/hudi/cli/lib/*.jar \
   --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
   --conf "spark.sql.hive.convertMetastoreParquet=false"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] praveenkmr commented on issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

Posted by GitBox <gi...@apache.org>.

praveenkmr commented on issue #6623:
URL: https://github.com/apache/hudi/issues/6623#issuecomment-1240213423

   @yihua  Thanks a lot, Ethan.. I tried the suggestion and it worked fine... Still, wondering during further upgradation do we need to follow the same approach of loading all the jars during spark-submit or in the latest version there is a scope to use hudi-spark-bundle.jar directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org