You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/29 13:00:07 UTC
[GitHub] [hudi] jasondavindev edited a comment on issue #4122: [SUPPORT] UPDATE command doest not working on Spark SQL
jasondavindev edited a comment on issue #4122:
URL: https://github.com/apache/hudi/issues/4122#issuecomment-981610020
@xushiyan Thanks! I built the image, but when I trying write a dataframe, I receive the error
```bash
>>> df.write.format('hudi').options(**hudi_options).save('/tmp/data/sample')
37491 [Thread-3] WARN org.apache.hudi.common.config.DFSPropertiesConfiguration - Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
37500 [Thread-3] ERROR org.apache.hudi.common.config.DFSPropertiesConfiguration - Error reading in properties from dfs
37500 [Thread-3] WARN org.apache.hudi.common.config.DFSPropertiesConfiguration - Didn't find config file under default conf file dir: file:/etc/hudi/conf
38382 [Thread-3] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
38400 [Thread-3] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
41212 [Thread-3] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
41217 [Thread-3] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
41972 [Executor task launch worker for task 0.0 in stage 49.0 (TID 44)] ERROR org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor - Error upserting bucketType UPDATE for partition :0
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hbase.io.hfile.LruBlockCache.<clinit>(LruBlockCache.java:935)
at org.apache.hadoop.hbase.io.hfile.CacheConfig.getL1(CacheConfig.java:553)
at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:660)
at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:246)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:100)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:120)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:164)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:375)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:353)
at org.apache.hudi.table.action.deltacommit.AbstractSparkDeltaCommitActionExecutor.handleUpdate(AbstractSparkDeltaCommitActionExecutor.java:84)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:313)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:172)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Unexpected version format: 11.0.13
at org.apache.hadoop.hbase.util.ClassSize.<clinit>(ClassSize.java:119)
... 39 more
```
I found a issue related to this error, but it was a compatibility issue (0.4.x version).
You can see my application here https://github.com/jasondavindev/delta-lake-dms-cdc/blob/main/apps/hudi_update.py
Using `0.9.0` version was written successfully
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org