You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/08 06:47:21 UTC

[GitHub] [hudi] nleena123 opened a new issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34

nleena123 opened a new issue #4980:
URL: https://github.com/apache/hudi/issues/4980


   Hi All,
   
   I am unable to re-process the data with old schema through apache Hudi.
   
   Getting below exception while running the job.
   
   ************************************ERROR*******************************
   Job aborted due to stage failure. Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34 Caused by: HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed Caused by: HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed Caused by: ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed Caused by: HoodieException: operation has failed Caused by: InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'EVAR106' not found
   at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:105)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:105)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:920)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:920)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
   	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:393)
   	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1445)
   	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1355)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1419)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1238)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:391)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)
   	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
   	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:150)
   	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:119)
   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
   	at org.apache.spark.scheduler.Task.run(Task.scala:91)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:812)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1643)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:815)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:671)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
   	... 38 more
   Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
   	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
   	... 41 more
   Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
   
   i have tried with  the hoodie.datasource.write.reconcile.schema=true property ,but didn't work for me .
   
   Please let me know the resolution  for the issue.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1063713326


   the expectation w/ reconcile schema is.
   lets say initial schema is schemaA
   and then table got upgraded to schemaB (by adding new columns to the end). 
   
   after this upgrade, if you try to ingest records w/ schemaA, by default it will fail. but if you set the reconcile schema config, it will succeed. 
   
   but do remember that schemaB should be backwards compataible w/ schemaA. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] scxwhite commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34

Posted by GitBox <gi...@apache.org>.
scxwhite commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1061643403


   The schema evolutino needs to follow https://hudi.apache.org/docs/schema_evolution.
   For these old schema data, it is recommended that you manually convert to the new schema data before merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1063721849


   if you can provide us w/ table schema and schema of incoming batch, we can see whats the issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1073048120


   @nleena123 : any updates to follow up on here. if you got it resolved, do close the github issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org