You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/31 21:21:39 UTC
[GitHub] [hudi] rubenssoto opened a new issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
rubenssoto opened a new issue #2508:
URL: https://github.com/apache/hudi/issues/2508
Hello,
Hudi Version: 0.7.0
Spark: 3.0.1
Emr 6.2.0
Spark Submit: spark-submit --deploy-mode cluster --conf spark.executor.cores=5 --conf spark.executor.memoryOverhead=3000 --conf spark.executor.memory=32g --conf spark.yarn.maxAppAttempts=1 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --jars s3://dl/lib/spark-daria_2.12-0.38.2.jar --packages org.apache.spark:spark-avro_2.12:2.4.4,org.apache.hudi:hudi-spark-bundle_2.12:0.7.0 --class TableProcessorWrapper s3://dl/code/projects/data_projects/batch_processor_engine/batch-processor-engine_2.12-3.0.1_0.5.jar courier_api_group02
Hudi Options:
`Map(hoodie.datasource.hive_sync.database -> raw_courier_api_hudi,
hoodie.parquet.small.file.limit -> 67108864,
hoodie.copyonwrite.record.size.estimate -> 1024,
hoodie.datasource.write.precombine.field -> LineCreatedTimestamp,
hoodie.datasource.hive_sync.partition_fields -> created_year_month_brt_partition,
hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor,
hoodie.parquet.max.file.size -> 134217728,
hoodie.parquet.block.size -> 67108864,
hoodie.datasource.hive_sync.table -> order,
hoodie.datasource.write.operation -> upsert,
hoodie.datasource.hive_sync.enable -> true,
hoodie.datasource.write.recordkey.field -> id,
hoodie.table.name -> order,
hoodie.datasource.hive_sync.jdbcurl -> jdbc:hive2://emr:10000,
hoodie.datasource.write.hive_style_partitioning -> true,
hoodie.datasource.write.table.name -> order,
hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.SimpleKeyGenerator,
hoodie.upsert.shuffle.parallelism -> 50,
hoodie.datasource.write.partitionpath.field -> created_year_month_brt_partition)`
Error:
`diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:308)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:299)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:272)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
... 31 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
... 32 more
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://31-ze-datalake-raw/courier_api/order/created_year_month_brt_partition=202012/a71490e9-d2e7-4ecf-b48a-6b7046770841-0_43-11441-0_20210131205623.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more
Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
... 11 more
Driver stacktrace:
at jobs.TableProcessor.start(TableProcessor.scala:101)
at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
ApplicationMaster host: ip-10-0-19-128.us-west-2.compute.internal
ApplicationMaster RPC port: 45559
queue: default
start time: 1612127355095
final status: FAILED
tracking URL: http://ip-10-0-29-186.us-west-2.compute.internal:20888/proxy/application_1612125097081_0004/
user: hadoop`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-812936183
Hello,
Sorry for my very late response, but I tried again and I have problems with only one table, same table, same error:
21/04/03 22:09:32 WARN TaskSetManager: Lost task 0.0 in stage 31.0 (TID 1025, ip-10-0-49-182.us-west-2.compute.internal, executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
... 31 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
... 32 more
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://31-ze-datalake-raw/courier_api/order/7acd6348-88ab-46f9-adf9-b09293b2d8bd-0_46-10253-0_20210403213218.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more
Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
... 11 more
21/04/03 22:09:32 INFO TaskSetManager: Starting task 0.1 in stage 31.0 (TID 1033, ip-10-0-49-182.us-west-2.compute.internal, executor 2, partition 0, NODE_LOCAL, 7132 bytes)
21/04/03 22:09:32 WARN TaskSetManager: Lost task 4.0 in stage 31.0 (TID 1029, ip-10-0-49-182.us-west-2.compute.internal, executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :4
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
... 31 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
... 32 more
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://31-ze-datalake-raw/courier_api/order/c2b48006-dfb3-4715-86d5-09692665089d-0_45-10252-0_20210403213218.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more
Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
... 11 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770483807
I changed the type to string and the problem was not solved same behavior.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-810436034
closing this out due to no activity. Please re-open or create a new ticket if required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770475266
I made more tests and I only had the problem when my bulk insert operation has the option hoodie.datasource.write.row.writer.enable true.
I only had this problem on this table, this table has a column type array
with that option false the column on hive:
`order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
with that option true:
`order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
I will try to convert the column to string....
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-812936268
Now Im testing with Hudi 0.8.0-rc1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770546985
I made the same procedure, the only difference was
one time I tried with hoodie.datasource.write.row.writer.enable true and didn't work and another time with the same config false and it worked.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-771413944
@nsivabalan Do you think this may have something to do with the Encoders needed in the row write path ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-771413944
@nsivabalan Do you think this may have something to do with the Encoders needed in the row write path ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-812936246
When I dont use row writer enable on bulk insert I have no problems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto edited a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto edited a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770483807
I changed the type to string and the problem was not solved same behavior.
It could be a bug?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto edited a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto edited a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770475266
I made more tests and I only had the problem when my bulk insert operation has the option hoodie.datasource.write.row.writer.enable true.
I only had this problem on this table, this table has a column type array
with that option false the column on hive:
`order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
with that option true:
`order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
The original column it is a json and I struct the column in spark with this schema:
StructField(
"line_items",
ArrayType(
StructType(
List(
StructField("product_variant_id", IntegerType),
StructField("item_inventory_id", IntegerType),
StructField("price", DecimalType(14, 2)),
StructField("total_price", DecimalType(14, 2)),
StructField("subtotal_price", DecimalType(14, 2)),
StructField("total_item_price", DecimalType(14, 2)),
StructField("total_tax", DecimalType(14, 2)),
StructField("total_coupon_discount", DecimalType(14, 2)),
StructField("total_offer_discount", DecimalType(14, 2)),
StructField("total_discount", DecimalType(14, 2)),
StructField("quantity", IntegerType),
StructField("description", StringType),
StructField("short_description", StringType),
StructField("image_url", StringType),
StructField("title", StringType),
StructField("subtitle", StringType),
StructField(
"product",
StructType(
List(
StructField("product_id", IntegerType),
StructField("title", StringType),
StructField("tags", StringType),
StructField("label", StringType),
StructField("image_url", StringType),
StructField("description", StringType),
StructField("short_description", StringType),
StructField("rgb", BooleanType),
StructField("has_fixed_price", BooleanType)
)
)
)
I will try to convert the column to string....
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772655440
I did ran some local tests w/ array column and things are fine. given below is the schema that I tested locally. Also, from the stack trace, don't think encoders are the issue. but I too don't have much idea as of now. need more time to triage.
@n3nash / @bvaradar : Can you guys spot any issues based on information given buy author.
@rubenssote: I am sure you might have tested the schema and the stack trace also does not hint that. but anyways, but wanted to confirm once. Did you ensure the schema is valid (and is backwards compatible ) ?
```
diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135)
```
```
public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
});public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
});
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772582307
Sure. I will check it out. But in the mean time, can you(@rubenssoto ) confirm that you are not blocked as such and you were looking to get better performance w/ row writing. but in general, you are fine for now?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770541514
No, the same schema, no changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772661097
yeah, is valid, at night, I will test again, only for sure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813124159
Sure. Keep us posted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-908911135
Closing this due to no activity and since we could not reproduce. Please re-open if are you are still having issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #2508:
URL: https://github.com/apache/hudi/issues/2508
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #2508:
URL: https://github.com/apache/hudi/issues/2508
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772655440
I did ran some local tests w/ array column and things are fine. given below is the schema that I tested locally. Also, from the stack trace, don't think encoders are the issue. but I too don't have much idea as of now. need more time to triage.
@n3nash / @bvaradar : Can you guys spot any issues based on information given buy author.
```
diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135)
```
```
public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
});public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
});
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770535798
@rubenssoto Have you changed the schema from the last time you did a bulkInsert into this table ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813499168
For eg: I tried array of records with some fields, and it looks good w/ bulk insert and row writer enabled.
```
{
"type" : "record",
"name" : "triprec",
"fields" : [ {
"name" : "timestamp",
"type" : "long"
}, {
"name" : "_row_key",
"type" : "string"
}, {
"name" : "rider",
"type" : "string"
}, {
"name" : "tip_history",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "tip_history",
"fields" : [ {
"name" : "amount",
"type" : "double"
}, {
"name" : "currency",
"type" : "string"
} ]
},
"default" : [ ]
}
} ]
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-779368051
@rubenssoto : would you mind responding to this whenever you can. If it worked for you, do let us know so that we can close the ticket.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813477549
@rubenssoto : can you give us avro schema for which you are running into issues. You can change col names for privacy is required. Or better, if you know which field is exactly having issues, let us know. for eg: is it array (record with list of strings) or something of this sort. I mean, you don't need to give us the full schema, just the fields that are having issues. If you can't narrow down, nvm. give us the entire schema. we can investigate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan removed a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan removed a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813124159
Sure. Keep us posted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772598997
@nsivabalan I only have this problem in one table, so, would be good it works in the future, but for now, it's fine.
thanks for asking, you are the best! :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #2508:
URL: https://github.com/apache/hudi/issues/2508
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-908911135
Closing this due to no activity and since we could not reproduce. Please re-open if are you are still having issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org