You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/31 21:21:39 UTC

[GitHub] [hudi] rubenssoto opened a new issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

rubenssoto opened a new issue #2508:
URL: https://github.com/apache/hudi/issues/2508


   Hello,
   
   Hudi Version: 0.7.0
   Spark: 3.0.1
   Emr 6.2.0
   
   Spark Submit: spark-submit --deploy-mode cluster --conf spark.executor.cores=5 --conf spark.executor.memoryOverhead=3000 --conf spark.executor.memory=32g --conf spark.yarn.maxAppAttempts=1 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --jars s3://dl/lib/spark-daria_2.12-0.38.2.jar --packages org.apache.spark:spark-avro_2.12:2.4.4,org.apache.hudi:hudi-spark-bundle_2.12:0.7.0 --class TableProcessorWrapper s3://dl/code/projects/data_projects/batch_processor_engine/batch-processor-engine_2.12-3.0.1_0.5.jar courier_api_group02
   
   Hudi Options:
   `Map(hoodie.datasource.hive_sync.database -> raw_courier_api_hudi, 
   hoodie.parquet.small.file.limit -> 67108864, 
   hoodie.copyonwrite.record.size.estimate -> 1024, 
   hoodie.datasource.write.precombine.field -> LineCreatedTimestamp, 
   hoodie.datasource.hive_sync.partition_fields -> created_year_month_brt_partition, 
   hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor, 
   hoodie.parquet.max.file.size -> 134217728, 
   hoodie.parquet.block.size -> 67108864, 
   hoodie.datasource.hive_sync.table -> order, 
   hoodie.datasource.write.operation -> upsert, 
   hoodie.datasource.hive_sync.enable -> true, 
   hoodie.datasource.write.recordkey.field -> id, 
   hoodie.table.name -> order, 
   hoodie.datasource.hive_sync.jdbcurl -> jdbc:hive2://emr:10000, 
   hoodie.datasource.write.hive_style_partitioning -> true, 
   hoodie.datasource.write.table.name -> order, 
   hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.SimpleKeyGenerator, 
   hoodie.upsert.shuffle.parallelism -> 50, 
   hoodie.datasource.write.partitionpath.field -> created_year_month_brt_partition)`
   
   Error:
   `diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
   	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
   	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:127)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:308)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:299)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:272)
   	... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
   	... 31 more
   Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
   	... 32 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://31-ze-datalake-raw/courier_api/order/created_year_month_brt_partition=202012/a71490e9-d2e7-4ecf-b48a-6b7046770841-0_43-11441-0_20210131205623.parquet
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
   	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
   	at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	... 4 more
   Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
   	at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
   	at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
   	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
   	at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
   	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
   	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
   	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
   	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
   	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
   	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
   	... 11 more
   
   Driver stacktrace:
   	at jobs.TableProcessor.start(TableProcessor.scala:101)
   	at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
   	at scala.util.Success.$anonfun$map$1(Try.scala:255)
   	at scala.util.Success.map(Try.scala:213)
   	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
   	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
   	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
   	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
   	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
   	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
   	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
   	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
   	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
   
   	 ApplicationMaster host: ip-10-0-19-128.us-west-2.compute.internal
   	 ApplicationMaster RPC port: 45559
   	 queue: default
   	 start time: 1612127355095
   	 final status: FAILED
   	 tracking URL: http://ip-10-0-29-186.us-west-2.compute.internal:20888/proxy/application_1612125097081_0004/
   	 user: hadoop`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-812936183


   Hello,
   
   Sorry for my very late response, but I tried again and I have problems with only one table, same table, same error:
   
   21/04/03 22:09:32 WARN TaskSetManager: Lost task 0.0 in stage 31.0 (TID 1025, ip-10-0-49-182.us-west-2.compute.internal, executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
   	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
   	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:127)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
   	... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
   	... 31 more
   Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
   	... 32 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://31-ze-datalake-raw/courier_api/order/7acd6348-88ab-46f9-adf9-b09293b2d8bd-0_46-10253-0_20210403213218.parquet
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
   	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
   	at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	... 4 more
   Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
   	at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
   	at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
   	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
   	at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
   	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
   	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
   	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
   	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
   	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
   	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
   	... 11 more
   
   21/04/03 22:09:32 INFO TaskSetManager: Starting task 0.1 in stage 31.0 (TID 1033, ip-10-0-49-182.us-west-2.compute.internal, executor 2, partition 0, NODE_LOCAL, 7132 bytes)
   21/04/03 22:09:32 WARN TaskSetManager: Lost task 4.0 in stage 31.0 (TID 1029, ip-10-0-49-182.us-west-2.compute.internal, executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :4
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
   	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
   	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:127)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
   	... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
   	... 31 more
   Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
   	... 32 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://31-ze-datalake-raw/courier_api/order/c2b48006-dfb3-4715-86d5-09692665089d-0_45-10252-0_20210403213218.parquet
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
   	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
   	at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	... 4 more
   Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
   	at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
   	at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
   	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
   	at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
   	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
   	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
   	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
   	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
   	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
   	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
   	... 11 more


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770483807


   I changed the type to string and the problem was not solved same behavior.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-810436034


   closing this out due to no activity. Please re-open or create a new ticket if required. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770475266


   I made more tests and I only had the problem when my bulk insert operation has the option hoodie.datasource.write.row.writer.enable true.
   
   I only had this problem on this table, this table has a column type array
   with that option false the column on hive:
   
   `order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
   
   
   with that option true:
   `order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
   
   
   I will try to convert the column to string....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-812936268


   Now Im testing with Hudi 0.8.0-rc1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770546985


   I made the same procedure, the only difference was
   one time I tried with hoodie.datasource.write.row.writer.enable true and didn't work and another time with the same config false and it worked.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-771413944


   @nsivabalan Do you think this may have something to do with the Encoders needed in the row write path ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-771413944


   @nsivabalan Do you think this may have something to do with the Encoders needed in the row write path ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-812936246


   When I dont use row writer enable on bulk insert I have no problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto edited a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto edited a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770483807


   I changed the type to string and the problem was not solved same behavior.
   It could be a bug?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto edited a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto edited a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770475266


   I made more tests and I only had the problem when my bulk insert operation has the option hoodie.datasource.write.row.writer.enable true.
   
   I only had this problem on this table, this table has a column type array
   with that option false the column on hive:
   
   `order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
   
   
   with that option true:
   `order_details_line_items` array<struct<product_variant_id:int,item_inventory_id:int,price:decimal(14,2),total_price:decimal(14,2),subtotal_price:decimal(14,2),total_item_price:decimal(14,2),total_tax:decimal(14,2),total_coupon_discount:decimal(14,2),total_offer_discount:decimal(14,2),total_discount:decimal(14,2),quantity:int,description:string,short_description:string,image_url:string,title:string,subtitle:string,product:struct<product_id:int,title:string,tags:string,label:string,image_url:string,description:string,short_description:string,rgb:boolean,has_fixed_price:boolean>,category:struct<id:int,title:string>,brand:struct<id:int,title:string>,applicable_discount:struct<discount_value:decimal(14,2),discount_type:string,discount_value_type:int,presented_discount_value:decimal(14,2),final_price:decimal(14,2),final_unit_price:decimal(14,2)>>>
   
   The original column it is a json and I struct the column in spark with this schema:
   StructField(
             "line_items",
             ArrayType(
               StructType(
                 List(
                   StructField("product_variant_id", IntegerType),
                   StructField("item_inventory_id", IntegerType),
                   StructField("price", DecimalType(14, 2)),
                   StructField("total_price", DecimalType(14, 2)),
                   StructField("subtotal_price", DecimalType(14, 2)),
                   StructField("total_item_price", DecimalType(14, 2)),
                   StructField("total_tax", DecimalType(14, 2)),
                   StructField("total_coupon_discount", DecimalType(14, 2)),
                   StructField("total_offer_discount", DecimalType(14, 2)),
                   StructField("total_discount", DecimalType(14, 2)),
                   StructField("quantity", IntegerType),
                   StructField("description", StringType),
                   StructField("short_description", StringType),
                   StructField("image_url", StringType),
                   StructField("title", StringType),
                   StructField("subtitle", StringType),
                   StructField(
                     "product",
                     StructType(
                       List(
                         StructField("product_id", IntegerType),
                         StructField("title", StringType),
                         StructField("tags", StringType),
                         StructField("label", StringType),
                         StructField("image_url", StringType),
                         StructField("description", StringType),
                         StructField("short_description", StringType),
                         StructField("rgb", BooleanType),
                         StructField("has_fixed_price", BooleanType)
                       )
                     )
                   )
   
   I will try to convert the column to string....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan edited a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan edited a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772655440


   I did ran some local tests w/ array column and things are fine. given below is the schema that I tested locally. Also, from the stack trace, don't think encoders are the issue. but I too don't have much idea as of now. need more time to triage.
   @n3nash / @bvaradar : Can you guys spot any issues based on information given buy author. 
   @rubenssote: I am sure you might have tested the schema and the stack trace also does not hint that. but anyways,  but wanted to confirm once. Did you ensure the schema is valid (and is backwards compatible ) ?  
   
   ```
   diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7
   at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279)
   at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135)
   ```
   
   ```
   public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
         new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
         new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
         new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
     });public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
         new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
         new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
         new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
     });
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772582307


   Sure. I will check it out. But in the mean time, can you(@rubenssoto ) confirm that you are not blocked as such and you were looking to get better performance w/ row writing. but in general, you are fine for now? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770541514


   No, the same schema, no changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772661097


   yeah, is valid, at night, I will test again, only for sure.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813124159


   Sure. Keep us posted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-908911135


   Closing this due to no activity and since we could not reproduce. Please re-open if are you are still having issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #2508:
URL: https://github.com/apache/hudi/issues/2508


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #2508:
URL: https://github.com/apache/hudi/issues/2508


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772655440


   I did ran some local tests w/ array column and things are fine. given below is the schema that I tested locally. Also, from the stack trace, don't think encoders are the issue. but I too don't have much idea as of now. need more time to triage.
   @n3nash / @bvaradar : Can you guys spot any issues based on information given buy author. 
   
   ```
   diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7
   at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279)
   at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135)
   ```
   
   ```
   public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
         new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
         new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
         new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
     });public static final StructType STRUCT_TYPE = new StructType(new StructField[] {
         new StructField(HoodieRecord.COMMIT_TIME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.RECORD_KEY_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.PARTITION_PATH_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField(HoodieRecord.FILENAME_METADATA_FIELD, DataTypes.StringType, false, Metadata.empty()),
         new StructField("randomInt", DataTypes.IntegerType, false, Metadata.empty()),
         new StructField("randomLong", DataTypes.LongType, false, Metadata.empty()),
         new StructField("array_long", DataTypes.createArrayType(DataTypes.LongType), false, Metadata.empty())
     });
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-770535798


   @rubenssoto Have you changed the schema from the last time you did a bulkInsert into this table ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813499168


   For eg: I tried array of records with some fields, and it looks good w/ bulk insert and row writer enabled. 
   ```
   {
     "type" : "record",
     "name" : "triprec",
     "fields" : [ {
       "name" : "timestamp",
       "type" : "long"
     }, {
       "name" : "_row_key",
       "type" : "string"
     }, {
       "name" : "rider",
       "type" : "string"
     }, {
       "name" : "tip_history",
       "type" : {
         "type" : "array",
         "items" : {
           "type" : "record",
           "name" : "tip_history",
           "fields" : [ {
             "name" : "amount",
             "type" : "double"
           }, {
             "name" : "currency",
             "type" : "string"
           } ]
         },
         "default" : [ ]
       }
     } ]
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-779368051


   @rubenssoto : would you mind responding to this whenever you can. If it worked for you, do let us know so that we can close the ticket. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813477549


   @rubenssoto : can you give us avro schema for which you are running into issues. You can change col names for privacy is required. Or better, if you know which field is exactly having issues, let us know. for eg: is it array (record with list of strings) or something of this sort. I mean, you don't need to give us the full schema, just the fields that are having issues. If you can't narrow down, nvm. give us the entire schema. we can investigate. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan removed a comment on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan removed a comment on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-813124159


   Sure. Keep us posted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

rubenssoto commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-772598997


   @nsivabalan I only have this problem in one table, so, would be good it works in the future, but for now, it's fine.
   
   thanks for asking, you are the best! :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #2508:
URL: https://github.com/apache/hudi/issues/2508


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2508:
URL: https://github.com/apache/hudi/issues/2508#issuecomment-908911135


   Closing this due to no activity and since we could not reproduce. Please re-open if are you are still having issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org