You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/24 07:29:25 UTC

[GitHub] [hudi] duanyongvictory opened a new issue #2482: [SUPPORT]

duanyongvictory opened a new issue #2482:
URL: https://github.com/apache/hudi/issues/2482


   hudi version:
   hudi-spark-bundle_2.11-0.5.2-incubating.jar
   
   data sample:
   {"data_version":"123", "p_sn":"3456e", "gender":"女","pix":[{"p_sn":"161"}]}
   
   where i use spark to write this data into hudi format, spark revails such errors:
   
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, hadoop6.gd7.yiducloud.cn, executor 1): java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:127)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:180)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
           at scala.collection.Iterator$class.foreach(Iterator.scala:891)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
           at scala.collection.AbstractIterator.to(Iterator.scala:1334)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class 
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:124)
           ... 28 more
   Caused by: java.lang.reflect.InvocationTargetException
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
           at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   Caused by: org.apache.avro.UnresolvedUnionException: Not in union [{"type":"record","name":"pix","namespace":"hoodie.table.table_record","fields":[{"name":"p_sn","type":["string","null"]}]},"null"]: {"p_sn": "161"}
           at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
           at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
           at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
           at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
           at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
           at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
           at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
           at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
           at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
           at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
           at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
           at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
           at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
           at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
           at org.apache.hudi.common.util.HoodieAvroUtils.avroToBytes(HoodieAvroUtils.java:76)
           at org.apache.hudi.common.model.BaseAvroPayload.<init>(BaseAvroPayload.java:52)
           at org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.<init>(OverwriteWithLatestAvroPayload.java:43)
           ... 34 more
   
   Driver stacktrace:
           at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
           at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
           at scala.Option.foreach(Option.scala:257)
           at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1409)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
           at org.apache.spark.rdd.RDD.take(RDD.scala:1382)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1517)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
           at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1516)
           at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
           at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:142)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:745)
   
   but when i delete the key: "pix" in the data, the spark works fine.
   I think the root cause is:
   
   Caused by: org.apache.avro.UnresolvedUnionException: Not in union [{"type":"record","name":"pix","namespace":"hoodie.table.table_record","fields":[{"name":"p_sn","type":["string","null"]}]},"null"]: {"p_sn": "161"}
   
   Looks like hudi does not support list of object.
   I search lots of information and nothing useful can be found.
   
   Can anyone help me with this ? thanks.
   urgent。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash closed issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
n3nash closed issue #2482:
URL: https://github.com/apache/hudi/issues/2482


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash closed issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
n3nash closed issue #2482:
URL: https://github.com/apache/hudi/issues/2482


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash closed issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
n3nash closed issue #2482:
URL: https://github.com/apache/hudi/issues/2482


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2482:
URL: https://github.com/apache/hudi/issues/2482#issuecomment-767390730


   0.5.2 is a really old version of Hudi. Can you try with 0.6.0  ?
   
   Also, Did the column pix already exist in the existing dataset. Hudi relies on Avro schema compatibility to write data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2482:
URL: https://github.com/apache/hudi/issues/2482#issuecomment-810438101


   gentle ping. once you respond, can you please remove "awaiting-user-response" label for the issue. If possible add "awaiting-community-help" label. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2482:
URL: https://github.com/apache/hudi/issues/2482#issuecomment-771410764


   @duanyongvictory Were you able to use the latest release 0.7.0 and see if it resolves your issue ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2482:
URL: https://github.com/apache/hudi/issues/2482#issuecomment-774507168


   @duanyongvictory : would be nice if you can fill in the details in the issue creating template. Would help us triage better. For eg, if you can fill in these, would be nice. 
   
   ```
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   - Were you running older version of Hudi and encountered this after upgrade? in other words, older Hudi version you were able to run successfully and with 0.7.0 there is a bug. 
   
   - Is this affecting your production? trying to gauge the severity. 
   
   - Or you are trying out a POC ? and this is the first time trying out Hudi. 
   
   **Additional context**
   
   Add any other context about the problem here.
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2482:
URL: https://github.com/apache/hudi/issues/2482#issuecomment-771410764


   @duanyongvictory Were you able to use the latest release 0.7.0 and see if it resolves your issue ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on issue #2482: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2482:
URL: https://github.com/apache/hudi/issues/2482#issuecomment-824526117


   @duanyongvictory Closing this ticket due to inactivity. Please re-open if your issue persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org