You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/08 06:47:21 UTC
[GitHub] [hudi] nleena123 opened a new issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34
nleena123 opened a new issue #4980:
URL: https://github.com/apache/hudi/issues/4980
Hi All,
I am unable to re-process the data with old schema through apache Hudi.
Getting below exception while running the job.
************************************ERROR*******************************
Job aborted due to stage failure. Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34 Caused by: HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed Caused by: HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed Caused by: ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed Caused by: HoodieException: operation has failed Caused by: InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'EVAR106' not found
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:105)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:105)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:920)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:920)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:393)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1445)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1355)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1419)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1238)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:391)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:150)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:119)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:91)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:812)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1643)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:815)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:671)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
... 38 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
... 41 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed
i have tried with the hoodie.datasource.write.reconcile.schema=true property ,but didn't work for me .
Please let me know the resolution for the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1063713326
the expectation w/ reconcile schema is.
lets say initial schema is schemaA
and then table got upgraded to schemaB (by adding new columns to the end).
after this upgrade, if you try to ingest records w/ schemaA, by default it will fail. but if you set the reconcile schema config, it will succeed.
but do remember that schemaB should be backwards compataible w/ schemaA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] scxwhite commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34
Posted by GitBox <gi...@apache.org>.
scxwhite commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1061643403
The schema evolutino needs to follow https://hudi.apache.org/docs/schema_evolution.
For these old schema data, it is recommended that you manually convert to the new schema data before merging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1063721849
if you can provide us w/ table schema and schema of incoming batch, we can see whats the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4980: unable reprocess the data with old schema through Apache hudi ,Caused by: HoodieUpsertException: Error upserting bucketType UPDATE for partition :34
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4980:
URL: https://github.com/apache/hudi/issues/4980#issuecomment-1073048120
@nleena123 : any updates to follow up on here. if you got it resolved, do close the github issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org