You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/19 12:49:59 UTC

[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…

KnightChess commented on code in PR #6824:
URL: https://github.com/apache/hudi/pull/6824#discussion_r999401951


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##########
@@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie
 
       // column order changed after left anti join , we should keep column order of source dataframe
       val cols = removeMetaFields(sourceDF).columns
-      executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), parameters)
+      executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), writeParam)

Review Comment:
   yes, use `hoodie.combine.before.insert` will de-duplicate, but this is not friendly to users.
   When create a table with precombine field and use merge into sql to upsert data, it may be prod duplicate records if user wirte diff merge sql. if user need solve it, we need set `hoodie.combine.before.insert` in one case which only has  no match branch. User will have doubt, a table with precombineKey in merge sql, sometime writing effect is `upsert` and sometime `insert`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org