You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/10 18:55:12 UTC

[GitHub] [hudi] leobiscassi commented on issue #7533: [SUPPORT] Recreate deleted metadata table

leobiscassi commented on issue #7533:
URL: https://github.com/apache/hudi/issues/7533#issuecomment-1377707695

   I'm experiencing a similar situation. I did upgrade my tables to EMR 6.9 with hudi 0.12, my pipelines broke, so I downgraded the tables back to EMR 6.5 and hudi 0.9. After that I'm seeing that even with the metadata table config enabled I'm not able to see them on s3. I've tried to do the following:
   
   - Started EMR 6.5 cluster
   - Executed hudi cli with the command `sudo /usr/lib/hudi/cli/bin/hudi-cli.sh`
   - Connected to my table using `connect --path <S3-PATH>`
   - Executed the command `metadata create`
   
   The command fails but it seems to create an empty metadata table, this is the stack trace
   
   ```shell
   2023-01-10 18:47:36,061 INFO scheduler.DAGScheduler: ResultStage 0 (collect at HoodieSparkEngineContext.java:73) failed in 0.607 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (ip-172-31-2-164.us-west-2.compute.internal executor 1): java.lang.IllegalStateException: unread block data
           at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
           at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
           at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
   2023-01-10 18:47:36,064 INFO scheduler.DAGScheduler: Job 0 failed: collect at HoodieSparkEngineContext.java:73, took 0.652691 s
   2023-01-10 18:47:36,065 ERROR core.SimpleExecutionStrategy: Command failed java.lang.reflect.UndeclaredThrowableException
   2023-01-10 18:47:36,066 WARN JLineShellComponent.exceptions: 
   java.lang.reflect.UndeclaredThrowableException
           at org.springframework.util.ReflectionUtils.rethrowRuntimeException(ReflectionUtils.java:315)
           at org.springframework.util.ReflectionUtils.handleInvocationTargetException(ReflectionUtils.java:295)
           at org.springframework.util.ReflectionUtils.handleReflectionException(ReflectionUtils.java:279)
           at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:219)
           at org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
           at org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
           at org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
           at org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
           at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (ip-172-31-2-164.us-west-2.compute.internal executor 1): java.lang.IllegalStateException: unread block data
           at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
           at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
           at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
           at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2470)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2419)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2418)
           at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2418)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1125)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1125)
           at scala.Option.foreach(Option.scala:407)
           at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1125)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2684)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2626)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2615)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:914)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2262)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2281)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2306)
           at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
           at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
           at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
           at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
           at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
           at org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:73)
           at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.getPartitionsToFilesMapping(HoodieBackedTableMetadataWriter.java:365)
           at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.bootstrapFromFilesystem(HoodieBackedTableMetadataWriter.java:313)
           at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.bootstrapIfNeeded(HoodieBackedTableMetadataWriter.java:272)
           at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.initialize(SparkHoodieBackedTableMetadataWriter.java:91)
           at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:114)
           at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:62)
           at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:58)
           at org.apache.hudi.cli.commands.MetadataCommand.create(MetadataCommand.java:104)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
           ... 6 more
   Caused by: java.lang.IllegalStateException: unread block data
           at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2934)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1704)
           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
           at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
           at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:457)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           ... 1 more
   ```
   
   Does anyone have a clue on why this is happening?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org