You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/26 06:58:31 UTC

[GitHub] [hudi] kumudkumartirupati opened a new issue, #5976: [SUPPORT] col_stats raising `java.lang.NoClassDefFoundError` exception

kumudkumartirupati opened a new issue, #5976:
URL: https://github.com/apache/hudi/issues/5976

   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   - YES
   
   **Describe the problem you faced**
   Getting an exception when the col_stats are enabled for tables which has decimal values.
   
   **A clear and concise description of the problem**
   Couldn't find many classes used in `HoodieMetadataPayload.java` from `org.apache.hudi.avro.model` package in the hudi spark bundle.
   
   **To Reproduce**
   ```
   /opt/spark/bin/spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer local:/opt/spark/jars/hudi-utilities-slim-bundle_2.12-0.11.1.jar \
       --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
       --payload-class org.apache.hudi.common.model.OverwriteWithLatestAvroPayload \
       --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
       --table-type COPY_ON_WRITE \
       --target-table default \
       --props s3a://bucket/hudi-defaults.conf \
       --config-folder s3a://bucket/configs \
       --base-path-prefix s3a://bucket/dbs \
       --source-ordering-field ts_ms \
       --op UPSERT \
       --sync-tool-classes org.apache.hudi.hive.HiveSyncTool,org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool \
       --enable-sync
   ```
   
   **Expected behavior**
   
   col_stats indexing should work when enabled on new tables for the first time.
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.2.1
   
   * Hadoop version : 3.3.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : yes
   
   **Stacktrace**
   
   ```
   22/06/25 16:09:47 ERROR HoodieMultiTableDeltaStreamer: error while running MultiTableDeltaStreamer for table: test
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 314.0 failed 4 times, most recent failure: Lost task 0.3 in stage 314.0 (TID 7224) (10.11.19.222 executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.avro.model.DecimalWrapper
   	at org.apache.hudi.metadata.HoodieMetadataPayload.wrapStatisticValue(HoodieMetadataPayload.java:686)
   	at org.apache.hudi.metadata.HoodieMetadataPayload.lambda$createColumnStatsRecords$13(HoodieMetadataPayload.java:595)
   	at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
   	at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(Unknown Source)
   	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(Unknown Source)
   	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(Unknown Source)
   	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(Unknown Source)
   	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(Unknown Source)
   	at java.base/java.util.Spliterators$1Adapter.hasNext(Unknown Source)
   	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
   	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
   	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
   	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
   	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
   	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
   	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kumudkumartirupati commented on issue #5976: [SUPPORT] column stats raising `java.lang.NoClassDefFoundError` exception when enabled

Posted by GitBox <gi...@apache.org>.

kumudkumartirupati commented on issue #5976:
URL: https://github.com/apache/hudi/issues/5976#issuecomment-1168407428

   Looks this is an issue with the mvn packaging. I retried building myself the jar from scratch `release-0.11.1` and it worked fine. I believe this has something to do with incompatible generated source classes for avro models being used during the packaging.
   I can see that these generated classes are not getting cleaned during `mvn clean` and they are still available post `mvn clean` in the target directory. This could be the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kumudkumartirupati closed issue #5976: [SUPPORT] column stats raising `java.lang.NoClassDefFoundError` exception when enabled

Posted by GitBox <gi...@apache.org>.

kumudkumartirupati closed issue #5976: [SUPPORT] column stats raising `java.lang.NoClassDefFoundError` exception when enabled
URL: https://github.com/apache/hudi/issues/5976


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kumud-hs commented on issue #5976: [SUPPORT] column stats raising `java.lang.NoClassDefFoundError` exception when enabled

Posted by GitBox <gi...@apache.org>.

kumud-hs commented on issue #5976:
URL: https://github.com/apache/hudi/issues/5976#issuecomment-1169607929

   Thanks, @yihua for the clarification. Somehow missed that in the release notes and got confused from the default spark version used in the bundle `3.2` as per this (https://hudi.apache.org/docs/quick-start-guide). Closing this in favor of #5641 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #5976: [SUPPORT] column stats raising `java.lang.NoClassDefFoundError` exception when enabled

Posted by GitBox <gi...@apache.org>.

yihua commented on issue #5976:
URL: https://github.com/apache/hudi/issues/5976#issuecomment-1169093342

   @kumudkumartirupati  Thanks for bubbling this up.  This is a known issue: the `hudi-utilities-slim-bundle` introduced by 0.11.0 release is not yet fully compatible with Spark 3.2 and hudi-spark3.2-bundle.  There are more details around this in the [release notes](https://hudi.apache.org/releases/release-0.11.0#slim-utilities-bundle) of 0.11.0.
   
   This is due to some of the dependencies including Avro which are targeted for different versions in Spark 3.1 vs 3.2.  The published artifact of `hudi-utilities-slim-bundle` is built to work with Spark 3.1 and 2.4.  If you build the slim bundle jar using Spark 3.2 profile in mvn, that should work with Spark 3.2 and hudi-spark3.2-bundle.  We have addressed this issue in #5641 for the next major release (0.12.0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kumudkumartirupati commented on issue #5976: [SUPPORT] column stats raising `java.lang.NoClassDefFoundError` exception when enabled

Posted by GitBox <gi...@apache.org>.

kumudkumartirupati commented on issue #5976:
URL: https://github.com/apache/hudi/issues/5976#issuecomment-1169609417

   Thanks, @yihua for the clarification. Somehow missed that in the release notes and got confused from the default spark version used in the bundle 3.2 as per this (https://hudi.apache.org/docs/quick-start-guide). Closing this in favor of #5641 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org