You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2023/03/29 04:39:00 UTC
[jira] [Resolved] (HUDI-5986) empty preCombineKey should never be stored in hoodie.properties

     [ https://issues.apache.org/jira/browse/HUDI-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Chen resolved HUDI-5986.
------------------------------

> empty preCombineKey should never be stored in hoodie.properties
> ---------------------------------------------------------------
>
>                 Key: HUDI-5986
>                 URL: https://issues.apache.org/jira/browse/HUDI-5986
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: hudi-utilities
>            Reporter: Wechar
>            Priority: Major
>              Labels: pull-request-available
>
> *Overview:*
> We found {{hoodie.properties}} will keep the empty preCombineKey if the table does not have preCombineKey. And the empty preCombineKey will cause the exception when insert data:
> {code:bash}
> Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found in record. Acceptable fields were :[id, name, price]
> 	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557)
> 	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134)
> 	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
> 	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
> 	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:123)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> *Steps to Reproduce:*
> {code:sql}
> -- 1. create a table without preCombineKey
> CREATE TABLE default.test_hudi_default_cm (
>   uuid int,
>   name string,
>   price double
> ) USING hudi
> options (
>  primaryKey='uuid');
> -- 2. config write operation to insert
> set hoodie.datasource.write.operation=insert;
> set hoodie.merge.allow.duplicate.on.inserts=true;
> -- 3. insert data
> insert into default.test_hudi_default_cm select 1, 'name1', 1.1;
> -- 4. insert overwrite
> insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1;
> -- 5. insert data will occur exception
> insert into default.test_hudi_default_cm select 1, 'name3', 1.1;
> {code}
> *Root Cause:*
> Hudi re-construct the table when *insert overwrite table* in sql but the configured operation   is not, then it stores the default empty preCombineKey in {{hoodie.properties}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)