You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "lokeshj1703 (via GitHub)" <gi...@apache.org> on 2023/02/15 09:03:22 UTC

[GitHub] [hudi] lokeshj1703 opened a new pull request, #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

lokeshj1703 opened a new pull request, #7961:
URL: https://github.com/apache/hudi/pull/7961

   ### Change Logs
   
   Modify DefaultHoodieRecordPayload to be able to handle a configured delete key and marker
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431022184

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "lokeshj1703 (via GitHub)" <gi...@apache.org>.
lokeshj1703 commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1109527842


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -37,8 +37,10 @@
  * 1. preCombine - Picks the latest delta record for a key, based on an ordering field 2. combineAndGetUpdateValue/getInsertValue - Chooses the latest record based on ordering field value.
  */
 public class DefaultHoodieRecordPayload extends OverwriteWithLatestAvroPayload {
-
+  private static final String DEFAULT_DELETE_MARKER = "false";

Review Comment:
   Deleted it in the latest commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431033302

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1434395965

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244",
       "triggerID" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e514bf1d6609d802bc0bf417abbd133f5e674d90",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e514bf1d6609d802bc0bf417abbd133f5e674d90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244) 
   * e514bf1d6609d802bc0bf417abbd133f5e674d90 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1108758019


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -37,8 +37,10 @@
  * 1. preCombine - Picks the latest delta record for a key, based on an ordering field 2. combineAndGetUpdateValue/getInsertValue - Chooses the latest record based on ordering field value.
  */
 public class DefaultHoodieRecordPayload extends OverwriteWithLatestAvroPayload {
-
+  private static final String DEFAULT_DELETE_MARKER = "false";
   public static final String METADATA_EVENT_TIME_KEY = "metadata.event_time.key";
+  public static final String DELETE_KEY = "hoodie.payload.delete.field";
+  public static final String DELETE_MARKER = "hoodie.payload.delete.marker";

Review Comment:
   I think we should have a validation that if DELETE_KEY is set then DELETE_MARKER should also be set? Is there any point of one being present without the other?



##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -71,18 +73,38 @@ public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue
     /*
      * Now check if the incoming record is a delete record.
      */
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
   }
 
   @Override
   public Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException {
-    if (recordBytes.length == 0 || isDeletedRecord) {
+    if (recordBytes.length == 0) {
       return Option.empty();
     }
     GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema);
     eventTime = updateEventTime(incomingRecord, properties);
 
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) {
+    final String deleteKey = properties.getProperty(DELETE_KEY);
+    if (deleteKey == null) {
+      return super.isDeleteRecord(genericRecord);

Review Comment:
   Agree, DELETE_MARKER should be set if DELETE_KEY is not null. However, if it is null then calling the parent implementation is fine.



##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -71,18 +73,38 @@ public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue
     /*
      * Now check if the incoming record is a delete record.
      */
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
   }
 
   @Override
   public Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException {
-    if (recordBytes.length == 0 || isDeletedRecord) {
+    if (recordBytes.length == 0) {
       return Option.empty();
     }
     GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema);
     eventTime = updateEventTime(incomingRecord, properties);
 
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) {

Review Comment:
   Can we add a test to cover this scenario?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1433035339

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204) 
   * fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "lokeshj1703 (via GitHub)" <gi...@apache.org>.
lokeshj1703 commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1109528075


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -71,18 +73,38 @@ public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue
     /*
      * Now check if the incoming record is a delete record.
      */
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
   }
 
   @Override
   public Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException {
-    if (recordBytes.length == 0 || isDeletedRecord) {
+    if (recordBytes.length == 0) {
       return Option.empty();
     }
     GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema);
     eventTime = updateEventTime(incomingRecord, properties);
 
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) {

Review Comment:
   Added a test in the latest commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "lokeshj1703 (via GitHub)" <gi...@apache.org>.
lokeshj1703 commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1109538804


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -71,18 +74,41 @@ public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue
     /*
      * Now check if the incoming record is a delete record.
      */
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
   }
 
   @Override
   public Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException {
-    if (recordBytes.length == 0 || isDeletedRecord) {
+    if (recordBytes.length == 0) {
       return Option.empty();
     }
     GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema);
     eventTime = updateEventTime(incomingRecord, properties);
 
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) {
+    final String deleteKey = properties.getProperty(DELETE_KEY);
+    if (StringUtils.isNullOrEmpty(deleteKey)) {
+      return isDeleteRecord(genericRecord);
+    }
+
+    ValidationUtils.checkArgument(!StringUtils.isNullOrEmpty(properties.getProperty(DELETE_MARKER)),
+        () -> DELETE_MARKER + " should be configured with " + DELETE_KEY);
+    // Modify to be compatible with new version Avro.
+    // The new version Avro throws for GenericRecord.get if the field name
+    // does not exist in the schema.
+    if (genericRecord.getSchema().getField(deleteKey) == null) {
+      return false;
+    }

Review Comment:
   Should we also validate that schema should have the configured deleteKey? @codope 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope merged PR #7961:
URL: https://github.com/apache/hudi/pull/7961


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "lokeshj1703 (via GitHub)" <gi...@apache.org>.
lokeshj1703 commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1106842865


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -71,18 +73,38 @@ public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue
     /*
      * Now check if the incoming record is a delete record.
      */
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
   }
 
   @Override
   public Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException {
-    if (recordBytes.length == 0 || isDeletedRecord) {
+    if (recordBytes.length == 0) {
       return Option.empty();
     }
     GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema);
     eventTime = updateEventTime(incomingRecord, properties);
 
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) {
+    final String deleteKey = properties.getProperty(DELETE_KEY);
+    if (deleteKey == null) {
+      return super.isDeleteRecord(genericRecord);

Review Comment:
   If `DELETE_MARKER` property is not set, should we throw an exception here or fall back to default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431832295

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1433046011

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244",
       "triggerID" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204) 
   * fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1433816066

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244",
       "triggerID" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1434404483

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244",
       "triggerID" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e514bf1d6609d802bc0bf417abbd133f5e674d90",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15274",
       "triggerID" : "e514bf1d6609d802bc0bf417abbd133f5e674d90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244) 
   * e514bf1d6609d802bc0bf417abbd133f5e674d90 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15274) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1435362119

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15244",
       "triggerID" : "fe33fb9b643bdcab9f1690b8079eb0c5dfc7f123",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e514bf1d6609d802bc0bf417abbd133f5e674d90",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15274",
       "triggerID" : "e514bf1d6609d802bc0bf417abbd133f5e674d90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e514bf1d6609d802bc0bf417abbd133f5e674d90 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15274) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431111782

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201) 
   * c8ac28edb845302c9f3afbc980b03782c0605564 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431125334

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201",
       "triggerID" : "29189395a4d407c331c89c11b1e70e989d704b20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204",
       "triggerID" : "c8ac28edb845302c9f3afbc980b03782c0605564",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201) 
   * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1108765694


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -37,8 +37,10 @@
  * 1. preCombine - Picks the latest delta record for a key, based on an ordering field 2. combineAndGetUpdateValue/getInsertValue - Chooses the latest record based on ordering field value.
  */
 public class DefaultHoodieRecordPayload extends OverwriteWithLatestAvroPayload {
-
+  private static final String DEFAULT_DELETE_MARKER = "false";

Review Comment:
   is it being used somewhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1109604433


##########
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##########
@@ -71,18 +74,41 @@ public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue
     /*
      * Now check if the incoming record is a delete record.
      */
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
   }
 
   @Override
   public Option<IndexedRecord> getInsertValue(Schema schema, Properties properties) throws IOException {
-    if (recordBytes.length == 0 || isDeletedRecord) {
+    if (recordBytes.length == 0) {
       return Option.empty();
     }
     GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema);
     eventTime = updateEventTime(incomingRecord, properties);
 
-    return Option.of(incomingRecord);
+    return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) {
+    final String deleteKey = properties.getProperty(DELETE_KEY);
+    if (StringUtils.isNullOrEmpty(deleteKey)) {
+      return isDeleteRecord(genericRecord);
+    }
+
+    ValidationUtils.checkArgument(!StringUtils.isNullOrEmpty(properties.getProperty(DELETE_MARKER)),
+        () -> DELETE_MARKER + " should be configured with " + DELETE_KEY);
+    // Modify to be compatible with new version Avro.
+    // The new version Avro throws for GenericRecord.get if the field name
+    // does not exist in the schema.
+    if (genericRecord.getSchema().getField(deleteKey) == null) {
+      return false;
+    }

Review Comment:
   No not necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org