You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/31 08:34:22 UTC

[GitHub] [hudi] Karl-WangSK opened a new pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Karl-WangSK opened a new pull request #2056:
URL: https://github.com/apache/hudi/pull/2056


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage
   
   ## Brief change log
   
   update current value for several fields that you want to change.
   
   The default payload OverwriteWithLatestAvroPayload overwrite the whole record when 
   
   compared to orderingVal.This doesn't meet our need when we just want to change specified fields.
   For example: (suppose Default value is null)
   ```
   current Value 
   Field:      name   age   gender
   Value:     karl     20    male
   ```
   ```
   insert Value
   Field:      name   age   gender
   Value:     null     30    null
   ```
   ```
   After insert:
   Field:      name   age   gender
   Value:     karl     30    male
   ```
   ## Verify this pull request
   
   Added TestOverwriteMulColAvroPayload to verify the change.
   
   ## Committer checklist
   
   - [x]  Has a corresponding JIRA in PR title & commit
   
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r482964009



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  protected boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean ovewriteField(Object value, Object defaultValue) {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480679600



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteMulColAvroPayload.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteMulColAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.
+   * @param orderingVal {@link Comparable} to be used in pre combine.
+   */
+  public OverwriteMulColAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public OverwriteMulColAvroPayload(Option<GenericRecord> record) {
+    this(record.isPresent() ? record.get() : null, (record1) -> 0); // natural order
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord insertRecord = (GenericRecord) recordOption.get();
+    GenericRecord currentRecord = (GenericRecord) currentValue;
+
+    Object deleteMarker = insertRecord.get("_hoodie_is_deleted");

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanghua commented on a change in pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
yanghua commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r482897786



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  protected boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value

Review comment:
       ditto

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteNonDefaultsWithLatestAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.

Review comment:
       If you want to add java doc, it would be better to add a description for the method?  

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  protected boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean ovewriteField(Object value, Object defaultValue) {

Review comment:
       `overwriteField`?

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+
+/**
+ * Unit tests {@link TestOverwriteNonDefaultsWithLatestAvroPayload}.
+ */
+public class TestOverwriteNonDefaultsWithLatestAvroPayload {
+  private Schema schema;
+
+  @BeforeEach
+  public void setUp() throws Exception {
+    schema = Schema.createRecord(Arrays.asList(
+            new Schema.Field("id", Schema.create(Schema.Type.STRING), "", null),
+            new Schema.Field("partition", Schema.create(Schema.Type.STRING), "", ""),
+            new Schema.Field("ts", Schema.create(Schema.Type.LONG), "", null),
+            new Schema.Field("_hoodie_is_deleted", Schema.create(Schema.Type.BOOLEAN), "", false)

Review comment:
       Can we add a use case for default value which neither `null` nor `''`?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  protected boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean ovewriteField(Object value, Object defaultValue) {
+    return defaultValue == null ? value == defaultValue : defaultValue.toString().equals(value.toString());

Review comment:
       change to : `return defaultValue == null ? value == null : defaultValue.toString().equals(value.toString());`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r481098135



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteNonDefaultsWithLatestAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.
+   * @param orderingVal {@link Comparable} to be used in pre combine.
+   */
+  public OverwriteNonDefaultsWithLatestAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public OverwriteNonDefaultsWithLatestAvroPayload(Option<GenericRecord> record) {
+    super(record); // natural order
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord insertRecord = (GenericRecord) recordOption.get();
+    GenericRecord currentRecord = (GenericRecord) currentValue;
+
+    if (isDeleteRecord(insertRecord)) {
+      return Option.empty();
+    } else {
+      List<Schema.Field> fields = schema.getFields();
+      for (int i = 0; i < fields.size(); i++) {

Review comment:
       done!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-686328982


   hi, ready to merge if no other problems @yanghua  @vinothchandar.Thanks!!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-689884262


   sorry, long weekend here in the states. will take a look today 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanghua commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
yanghua commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-686541845


   > > @Karl-WangSK thanks for your contribution! LGTM now, let's wait for a moment to see if @vinothchandar has any concern, OK?
   > 
   > OK,Thanks for your time and review! @yanghua
   
   These are not worth mentioning. It is lucky for the community to have contributors like you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanghua commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
yanghua commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-687951066


   @vinothchandar Any feedback on this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480673740



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteMulColAvroPayload.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteMulColAvroPayload extends OverwriteWithLatestAvroPayload {

Review comment:
       ok.I think it's great




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480715969



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {
+    return  defaultValue == null ? value == defaultValue : value.toString().equals(defaultValue.toString());

Review comment:
       1) `Object.equals()` is compare their memory address. 
   ```
   public boolean equals(Object obj) {
           return (this == obj);
       }
   ```
   2) value and defaultValue they are same field from same schema, so their data type must be the same, and have checked data type before using this method. so we just simply check if their values are the same using `toString()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar merged pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #2056:
URL: https://github.com/apache/hudi/pull/2056


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480715969



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {
+    return  defaultValue == null ? value == defaultValue : value.toString().equals(defaultValue.toString());

Review comment:
       1)`Object.equals()` is compare their memory address. 
   ```
   public boolean equals(Object obj) {
           return (this == obj);
       }
   ```
   2) value and defaultValue they are same field from same schema, so their data type must be the same, and have checked data type before using this method. so we just simply check if their values are the same using `toString()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-686538072


   > @Karl-WangSK thanks for your contribution! LGTM now, let's wait for a moment to see if @vinothchandar has any concern, OK?
   
   OK,Thanks for your time and review!  @yanghua 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480715969



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {
+    return  defaultValue == null ? value == defaultValue : value.toString().equals(defaultValue.toString());

Review comment:
       1) `Object.equals()` is compare their memory address. So it will return false even though like value="1" and defaultValue="1"
   ```
   public boolean equals(Object obj) {
           return (this == obj);
       }
   ```
   2) value and defaultValue they are same field from same schema, so their data type must be the same, and have checked data type before using this method. so we just simply check if their values are the same using `toString()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK edited a comment on pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK edited a comment on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-686328982


   hi, ready to merge if no other problems @yanghua  @vinothchandar .Thanks!!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480675442



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {

Review comment:
       done!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r482964165



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  protected boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean ovewriteField(Object value, Object defaultValue) {
+    return defaultValue == null ? value == defaultValue : defaultValue.toString().equals(value.toString());

Review comment:
       ok!!!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-683650816


   hi,  can u take a look when freeee ? @yanghua 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480265954



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteMulColAvroPayload.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteMulColAvroPayload extends OverwriteWithLatestAvroPayload {

Review comment:
       can we name this something like `OverwriteNonDefaultsWithLatestAvroPayload` 
   
   Trying to make sure it captures the fact that default value is used to decide whether we overwrite or not. Initially, I thought this was doing a partial merge (which is also something we should add IMO) - which just updates some columns. We eventually need soemthing like that to support a SQL MERGE 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {
+    return  defaultValue == null ? value == defaultValue : value.toString().equals(defaultValue.toString());

Review comment:
       can we just compare without the .toString() .. coz otherwise data type mismatches may be masked. i.e "1" vs 1L , which will both probably get the same string representation

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {

Review comment:
       would renaming to `ovewriteField()` be more understandable? 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {

Review comment:
       surprised checkstyle is happy with space after comma after `Object value,` 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteMulColAvroPayload.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteMulColAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.
+   * @param orderingVal {@link Comparable} to be used in pre combine.
+   */
+  public OverwriteMulColAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public OverwriteMulColAvroPayload(Option<GenericRecord> record) {
+    this(record.isPresent() ? record.get() : null, (record1) -> 0); // natural order
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord insertRecord = (GenericRecord) recordOption.get();
+    GenericRecord currentRecord = (GenericRecord) currentValue;
+
+    Object deleteMarker = insertRecord.get("_hoodie_is_deleted");

Review comment:
       this check can be done using the `isDeleteRecord()` method. we can make that protected . wdyt?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteMulColAvroPayload.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteMulColAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.
+   * @param orderingVal {@link Comparable} to be used in pre combine.
+   */
+  public OverwriteMulColAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public OverwriteMulColAvroPayload(Option<GenericRecord> record) {
+    this(record.isPresent() ? record.get() : null, (record1) -> 0); // natural order

Review comment:
       can we call super(..) here

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {
+    return  defaultValue == null ? value == defaultValue : value.toString().equals(defaultValue.toString());

Review comment:
       would `Objects.equals()` help us compare more easily.. it can deal with one argument being null. we can then avoid this whole method.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r482964417



##########
File path: hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+
+/**
+ * Unit tests {@link TestOverwriteNonDefaultsWithLatestAvroPayload}.
+ */
+public class TestOverwriteNonDefaultsWithLatestAvroPayload {
+  private Schema schema;
+
+  @BeforeEach
+  public void setUp() throws Exception {
+    schema = Schema.createRecord(Arrays.asList(
+            new Schema.Field("id", Schema.create(Schema.Type.STRING), "", null),
+            new Schema.Field("partition", Schema.create(Schema.Type.STRING), "", ""),
+            new Schema.Field("ts", Schema.create(Schema.Type.LONG), "", null),
+            new Schema.Field("_hoodie_is_deleted", Schema.create(Schema.Type.BOOLEAN), "", false)

Review comment:
       ok! added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
Karl-WangSK commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r480715969



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -79,8 +79,18 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
    * @param genericRecord instance of {@link GenericRecord} of interest.
    * @returns {@code true} if record represents a delete record. {@code false} otherwise.
    */
-  private boolean isDeleteRecord(GenericRecord genericRecord) {
+  public boolean isDeleteRecord(GenericRecord genericRecord) {
     Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
     return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
+
+  /**
+   *
+   * @param value value in Insert Value
+   * @param defaultValue defaultValue of the field
+   * @return {@code true} if value equals defaultValue {@code false} otherwise.
+   */
+  public Boolean fieldJudge(Object value,Object defaultValue) {
+    return  defaultValue == null ? value == defaultValue : value.toString().equals(defaultValue.toString());

Review comment:
       1) `Object.equals()` is compare their memory address. So it will return false even though value="1" and defaultValue="1"
   ```
   public boolean equals(Object obj) {
           return (this == obj);
       }
   ```
   2) value and defaultValue they are same field from same schema, so their data type must be the same, and have checked data type before using this method. so we just simply check if their values are the same using `toString()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanghua commented on a change in pull request #2056: [HUDI-1255]:Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
yanghua commented on a change in pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#discussion_r481070616



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteNonDefaultsWithLatestAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.
+   * @param orderingVal {@link Comparable} to be used in pre combine.
+   */
+  public OverwriteNonDefaultsWithLatestAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public OverwriteNonDefaultsWithLatestAvroPayload(Option<GenericRecord> record) {
+    super(record); // natural order
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord insertRecord = (GenericRecord) recordOption.get();
+    GenericRecord currentRecord = (GenericRecord) currentValue;
+
+    if (isDeleteRecord(insertRecord)) {
+      return Option.empty();
+    } else {
+      List<Schema.Field> fields = schema.getFields();
+      for (int i = 0; i < fields.size(); i++) {

Review comment:
       `foreach ` grammar is more graceful than `for` loop.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ * <p>
+ * 1. preCombine - Picks the latest delta record for a key, based on an ordering field.
+ * 2. combineAndGetUpdateValue/getInsertValue - overwrite storage for specified fields
+ * that doesn't equal defaultValue.
+ */
+public class OverwriteNonDefaultsWithLatestAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  /**
+   * @param record      Generic record for the payload.
+   * @param orderingVal {@link Comparable} to be used in pre combine.
+   */
+  public OverwriteNonDefaultsWithLatestAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public OverwriteNonDefaultsWithLatestAvroPayload(Option<GenericRecord> record) {
+    super(record); // natural order
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord insertRecord = (GenericRecord) recordOption.get();
+    GenericRecord currentRecord = (GenericRecord) currentValue;
+
+    if (isDeleteRecord(insertRecord)) {
+      return Option.empty();
+    } else {
+      List<Schema.Field> fields = schema.getFields();
+      for (int i = 0; i < fields.size(); i++) {
+        Object value = insertRecord.get(fields.get(i).name());
+        Object defaultValue = fields.get(i).defaultVal();
+        if (ovewriteField(value, defaultValue)) {
+          continue;

Review comment:
       Here `continue` is unnecessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanghua commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteMulColAvroPayload) for updating specified fields in storage

Posted by GitBox <gi...@apache.org>.
yanghua commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-686397274


   > hi, ready to merge if no other problems @yanghua @vinothchandar.Thanks!!!
   
   @Karl-WangSK I will review again soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org