You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/13 05:51:58 UTC

[GitHub] [hudi] brandon-stanley opened a new issue #1960: [SUPPORT]

brandon-stanley opened a new issue #1960:
URL: https://github.com/apache/hudi/issues/1960


   **Describe the problem you faced**
   
   I am trying to create a `COPY_ON_WRITE` table on S3 without having to specify a `hoodie.datasource.write.precombine.field`(https://hudi.apache.org/docs/configurations.html#PRECOMBINE_FIELD_OPT_KEY) value (as the table I am creating does not have an appropriate field for tie breakers). The `hoodie.datasource.write.operation` attribute is `upsert`. I have done some digging and found that you can alter the `hoodie.datasource.write.payload.class` (https://hudi.apache.org/docs/configurations.html#PAYLOAD_CLASS_OPT_KEY). I have tried specifying the value as `org.apache.hudi.common.model.HoodieAvroPayload` as it seems the precombine function will not perform the compareTo operation.
   
   When I run the job, I receive the following error:
   ```
   Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were...
   ```
   
   I am assuming this means that the configuration value is still set to the previous default class: `org.apache.hudi.OverwriteWithLatestAvroPayload`. Can you please provide guidance now resolving this issue?
   
   **Environment Description**
   I am running `Hudi 0.5.2` on `Spark 2.4.3`.
   
   * Hudi version : 0.5.2
   
   * Spark version : 2.4.3
   
   * Storage (HDFS/S3/GCS..) : S3
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha edited a comment on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
bhasudha edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673336838


   @brandon-stanley  the `hoodie.datasource.write.precombine.field` is a mandatory field. If not specified a default field name `ts` is assumed. Since your table does not have this field you are seeing the above error.  The payload class invocation is not an issue since the stack trace you are pointing to here is happening way before the payload class is being invoked. You might want to point the `hoodie.datasource.write.precombine.field` to a valid column in the table and then also pass in a payload class that would ignore the precombine field. You can try that way. 
   
   But this aside,  does your dataset not have duplicates ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-679236183


   Closing this issue as we have a jira to track.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673336838


   @brandon-stanley  the `hoodie.datasource.write.precombine.field` is a mandatory field. If not specified a default field name `ts` is assumed. Since your table does not have this field you are seeing the above error.  The payload class invocation is not an issue since the stack trace you are pointing to here is happening way before the payload class is being invoked. You might want to point the `hoodie.datasource.write.precombine.field` to a valid column in the table and then also pass in a payload class that would ignore the precombine field.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandonstanley-rci commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandonstanley-rci commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-674082705


   @bhasudha I was considering that but I want to avoid altering the raw data as much as possible and do not want to add additional columns to the data. Is there a way to drop the column so that it does not show up in the Apache Hudi table? If not, is there a default class that does not contain precombine logic?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandon-stanley edited a comment on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandon-stanley edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673462785


   @bhasudha Thanks for the response. Does the precombine field have to be a non-nullable field/column as well? My dataset may have duplicates but I have implemented custom logic to deduplicate since there are two columns within my dataset that are used to determine which is the latest record: COALESCE(update_date, create_date). I implemented it this way because it is an [SCD type 2 table.](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)
   
   Also, how would I specify the payload class that would ignore the precombine field? I receive the following error when specifying the `hoodie.datasource.write.payload.class` configuration property as `org.apache.hudi.common.model.HoodieAvroPayload`:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o152.save.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in stage 21.0 (TID 529, localhost, executor driver): java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord, java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   
   Driver stacktrace:
           at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
           at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at scala.Option.foreach(Option.scala:245)
           at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1471)
           at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
           at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:142)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           ... 1 more
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord, java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandon-stanley edited a comment on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandon-stanley edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673462785


   @bhasudha Thanks for the response. Does the precombine field have to be a non-nullable field/column as well? My dataset may have duplicates but I have implemented custom logic to deduplicate since there are two columns within my dataset that are used to determine which is the latest record: COALESCE(update_date, create_date). It is an [SCD type 2 table.](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673910051


   @brandon-stanley Based on your description above, you could try this:
   
   Instead of skipping the precombine field, you could add the COALESCE(update_date, create_date) as new column before writing to Hudi and pass in that new column as the precombine field. I think you could use withColumn() in Spark to do this. Here duplicates are handled based on the latest value of the precombine field which is the COALESCE() described above. You wouldn't need to worry about Payload class then. 
   
   Please correct me if I am missing something.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandon-stanley edited a comment on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandon-stanley edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673462785


   @bhasudha Thanks for the response. Does the precombine field have to be a non-nullable field/column as well? My dataset may have duplicates but I have implemented custom logic to deduplicate since there are two columns within my dataset that are used to determine which is the latest record: COALESCE(update_date, create_date). It is an [SCD type 2 table.](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)
   
   Also, how would I specify the payload class that would ignore the precombine field? I receive the following error when specifying the `hoodie.datasource.write.payload.class` configuration property as `org.apache.hudi.common.model.HoodieAvroPayload`:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o152.save.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in stage 21.0 (TID 529, localhost, executor driver): java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord, java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   
   Driver stacktrace:
           at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
           at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at scala.Option.foreach(Option.scala:245)
           at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1471)
           at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
           at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:142)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           ... 1 more
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord, java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandon-stanley commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandon-stanley commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673462785


   @bhasudha Thanks for the response. Does the precombine field have to be a non-nullable field/column as well? My dataset may have duplicates but I have implemented custom logic to deduplicate since there are two columns within my dataset that are used to determine which is the latest record: COALESCE(update_date, create_date). It is an SCD type 2 table.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1960:
URL: https://github.com/apache/hudi/issues/1960


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-678172498


   @brandon-stanley : Thanks for bringing this. I have filed a jira to track it : https://issues.apache.org/jira/browse/HUDI-1208
   As a workaround, you can disable precombine but the problem is the writer logic currently requires a non-null field to be specified for ordering. You can simply pass the record key field itself as ordering field and turn off precombine (as mentioned in the post you pointed) and this should work. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandon-stanley commented on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandon-stanley commented on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-677957741


   @bhasudha This [post](https://github.com/apache/hudi/issues/1986) points out that the precombine logic can be disabled. Is this true?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandon-stanley edited a comment on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandon-stanley edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673462785


   @bhasudha Thanks for the response. Does the precombine field have to be a non-nullable field/column as well? My dataset may have duplicates but I have implemented custom logic to deduplicate since there are two columns within my dataset that are used to determine which is the latest record: COALESCE(update_date, create_date). I implemented it this way because it is an [SCD type 2 table.](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)
   
   Also, how would I specify the payload class that would ignore the precombine field? I receive the following error when specifying the `hoodie.datasource.write.payload.class` configuration property as `org.apache.hudi.common.model.HoodieAvroPayload`. Do I need to create a custom class that implements the [HoodieRecordPayload interface](https://github.com/apache/hudi/blob/release-0.5.2/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java)?
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o152.save.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in stage 21.0 (TID 529, localhost, executor driver): java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord, java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   
   Driver stacktrace:
           at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
           at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at scala.Option.foreach(Option.scala:245)
           at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1471)
           at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
           at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:142)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           ... 1 more
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord, java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] brandonstanley-rci edited a comment on issue #1960: How do you change the 'hoodie.datasource.write.payload.class' configuration property?

Posted by GitBox <gi...@apache.org>.
brandonstanley-rci edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-674082705


   @bhasudha I was considering this approach but I want to avoid altering the raw data as much as possible and do not want to add additional columns to the data. Is there a way to drop the column so that it does not show up in the Apache Hudi table? If not, is there a default class that does not contain precombine logic?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org