You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "kazdy (via GitHub)" <gi...@apache.org> on 2023/03/25 22:19:08 UTC

[GitHub] [hudi] kazdy commented on pull request #7514: [HUDI-5418] Remove misleading line about mor precombine from quickstart spark-sql guide

kazdy commented on PR #7514:
URL: https://github.com/apache/hudi/pull/7514#issuecomment-1483935259

   > for mOR its mandatory, and for COW, its mandatory if users are using "upserts". for immutable workloads in COW, it may not be required.
   
   @nsivabalan  @jonvex 
   That's not exactly true. CoW allows mutable workloads when no precombine is set with MERGE INTO statement:
   https://github.com/apache/hudi/blob/1a526eea748d93f28f8cd4a786d5357d218c392c/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala#L349-L354
   Here INSERT op is used to do support WHEN MATCHED UPDATE ... records.
   This is inconsistent with how sql UPDATE works, but for some reason, this is how it is.
   
   Another example is that we also can update record when no precombine field is specified if we use spark datasource insert in upsert mode. It will update existing records. So one can argue it is no longer an immutable workload.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org