You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "scxwhite (via GitHub)" <gi...@apache.org> on 2023/02/07 09:22:09 UTC

[GitHub] [hudi] scxwhite commented on a diff in pull request #7842: [HUDI-5695]Fix the wrong use of order field in PartialUpdateAvroPayload class.

scxwhite commented on code in PR #7842:
URL: https://github.com/apache/hudi/pull/7842#discussion_r1098391650


##########
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateAvroPayload.java:
##########
@@ -196,30 +196,40 @@ protected Option<IndexedRecord> mergeDisorderRecordsWithMetadata(
   /**
    * Returns whether the given record is newer than the record of this payload.
    *
-   * @param orderingVal
-   * @param record The record
-   * @param prop   The payload properties
-   *
+   * @param schema      The schema
+   * @param record      The record
+   * @param prop        The payload properties
    * @return true if the given record is newer
    */
-  private static boolean isRecordNewer(Comparable orderingVal, IndexedRecord record, Properties prop) {
+  private boolean isRecordNewer(Schema schema, IndexedRecord record, Properties prop) throws IOException {
     String orderingField = prop.getProperty(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY);
+    String preCombineField = prop.getProperty("hoodie.datasource.write.precombine.field");
+
     if (!StringUtils.isNullOrEmpty(orderingField)) {
       boolean consistentLogicalTimestampEnabled = Boolean.parseBoolean(prop.getProperty(
           KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.key(),
           KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.defaultValue()));
 
+      Comparable incomingOrderingVal;
+      if (orderingField.equals(preCombineField)) {
+        incomingOrderingVal = orderingVal;

Review Comment:
   @danny0405  Thank you for your review.
   If I understand correctly, the value of orderingVal variable defined in BaseAvroPayload is precombine configuration. You can refer to this part of code.
   https://github.com/apache/hudi/blob/6436ef3ee2cb887cf25643c5c99f1fe34a64ec68/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L1095
   
   In the process of using Spark, I found that precombine is configured by hoodie.datasource.write.precombine.field configuration item, and orderfield is configured by hoodie.datasource.write.precombine.field configuration item.There is no restriction that precombine and orderfield must be consistent.
   
   
   https://github.com/apache/hudi/blob/6436ef3ee2cb887cf25643c5c99f1fe34a64ec68/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java#L126
   
   And in the DefaultHoodieRecordPayload class, we can also find that the orderingVal used is the value of the configuration item hoodie.payload.ordering.field.The orderingVal variable in the BaseAvroPayload class is not used.
   
   As for the reason why I compare the fields of the two to see if they are consistent, as you just mentioned, it is to avoid the time of data serialization.
   
   I look forward to your answer. Thank you again.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org