You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/05 06:55:02 UTC

[GitHub] [hudi] fanaticjo opened a new pull request #3035: HUDI-1936 Introduce a optional property for conditional upsert

fanaticjo opened a new pull request #3035:
URL: https://github.com/apache/hudi/pull/3035


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   If anyone wants to use custom upsert logic then they have to override the Latest avro payload class which is only possible in java or scala . 
   
   Python developers have no such option . 
   
   Will be introducing a new payload class and a new key which can work in java , scala and python 
   
   This class will be responsible for custom upsert logic and a new key hoodie.update.key which will accept the columns which only need to be updated 
   
    
   
   "hoodie.update.keys": "admission_date,name",  #comma seperated key 
   "hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert" #custom upsert key 
   
    
   
   so this will only update the column admission_date and name in the target table 
   
   
   
   ## Brief change log
   
   *(for example:)*
     - added hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
   
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r658767757



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+

Review comment:
       yes user can set different values for different batches for cow it working , mor will test 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-896336864


   In some sense, with the Spark SQL support now, python users can do custom merges? does that satisfy your requirements?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r658876300



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord existingRecord = (GenericRecord) currentValue;
+    GenericRecord incomingRecord = (GenericRecord) recordOption.get();
+    List<String> keys = splitKeys(properties.getProperty("hoodie.update.keys"));

Review comment:
       if the record is deleted and the same record comes again it is considered as a new record and treated as insert which i think is correct only 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r652332012



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord existingRecord = (GenericRecord) currentValue;
+    GenericRecord incomingRecord = (GenericRecord) recordOption.get();
+    List<String> keys = splitKeys(properties.getProperty("hoodie.update.keys"));

Review comment:
       Also, lets add this config to DataSourceWriteOptions. may be we can name it as "hoodie.datasource.write.partial.fields.to.update"

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord existingRecord = (GenericRecord) currentValue;
+    GenericRecord incomingRecord = (GenericRecord) recordOption.get();
+    List<String> keys = splitKeys(properties.getProperty("hoodie.update.keys"));

Review comment:
       Do you know what happens if a record is deleted and then an insert/upsert is sent to hudi for this record. We need to understand if combineAndGetUpdateValue() will be invoked or the new incoming record will be treated as a completely new record. 
   

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+

Review comment:
       we might have to check for delete record as well. 
   something like 
   ```
   isDeleteRecord((GenericRecord) indexedRecord)
   ```
   Please do check OverwriteWithLatestAvroPayload impl




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r658879988



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+

Review comment:
       i see one thing as a blocker is if a new column is introduced it makes it as null , any idea how to tackle this ? one idea i have is check what schema is mismatch and add that in the properties only in that new column will get values or is there any hudi way for that 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-855229410


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3035](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (26dadb6) into [master](https://codecov.io/gh/apache/hudi/commit/974b476180e61fac58cd87e78699428d4108a482?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (974b476) will **decrease** coverage by `45.74%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3035/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #3035       +/-   ##
   ============================================
   - Coverage     55.01%   9.27%   -45.75%     
   + Complexity     3850      48     -3802     
   ============================================
     Files           485      54      -431     
     Lines         23467    2016    -21451     
     Branches       2497     241     -2256     
   ============================================
   - Hits          12911     187    -12724     
   + Misses         9405    1816     -7589     
   + Partials       1151      13     -1138     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.27% <ø> (-61.56%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [462 more](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-864355261


   cc @vingov do you mind taking a review at this, given its a python benefiting change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r658855139



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord existingRecord = (GenericRecord) currentValue;
+    GenericRecord incomingRecord = (GenericRecord) recordOption.get();
+    List<String> keys = splitKeys(properties.getProperty("hoodie.update.keys"));

Review comment:
       i dont think DataSourceWriteOptions is accessible from model package 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-961587786


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r676157680



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+

Review comment:
       @nsivabalan  any updates ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r654948338



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+
+    GenericRecord existingRecord = (GenericRecord) currentValue;
+    GenericRecord incomingRecord = (GenericRecord) recordOption.get();
+    List<String> keys = splitKeys(properties.getProperty("hoodie.update.keys"));

Review comment:
       have not checked this use case , have to check this 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r658876838



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+

Review comment:
       This class should only be considered only for upserts , so why delete is required ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-855229410


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3035](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (26dadb6) into [master](https://codecov.io/gh/apache/hudi/commit/974b476180e61fac58cd87e78699428d4108a482?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (974b476) will **increase** coverage by `15.81%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3035/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #3035       +/-   ##
   =============================================
   + Coverage     55.01%   70.83%   +15.81%     
   + Complexity     3850      385     -3465     
   =============================================
     Files           485       54      -431     
     Lines         23467     2016    -21451     
     Branches       2497      241     -2256     
   =============================================
   - Hits          12911     1428    -11483     
   + Misses         9405      454     -8951     
   + Partials       1151      134     -1017     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `70.83% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3035?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...di-cli/src/main/java/org/apache/hudi/cli/Main.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL01haW4uamF2YQ==) | | |
   | [.../java/org/apache/hudi/common/metrics/Registry.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21ldHJpY3MvUmVnaXN0cnkuamF2YQ==) | | |
   | [...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==) | | |
   | [...i/common/util/collection/ExternalSpillableMap.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9FeHRlcm5hbFNwaWxsYWJsZU1hcC5qYXZh) | | |
   | [...rg/apache/hudi/metadata/HoodieMetadataMetrics.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllTWV0YWRhdGFNZXRyaWNzLmphdmE=) | | |
   | [...c/main/java/org/apache/hudi/dla/DLASyncConfig.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNDb25maWcuamF2YQ==) | | |
   | [...java/org/apache/hudi/common/util/NumericUtils.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvTnVtZXJpY1V0aWxzLmphdmE=) | | |
   | [...i/hive/SlashEncodedDayPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkRGF5UGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==) | | |
   | [.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=) | | |
   | [.../java/org/apache/hudi/common/model/ActionType.java](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0FjdGlvblR5cGUuamF2YQ==) | | |
   | ... and [421 more](https://codecov.io/gh/apache/hudi/pull/3035/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-855229410






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] fanaticjo commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
fanaticjo commented on a change in pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#discussion_r658767757



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.ColumnNotFoundException;
+import org.apache.hudi.exception.UpdateKeyNotFoundException;
+import org.apache.hudi.exception.WriteOperationException;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+
+/**
+ * subclass of OverwriteWithLatestAvroPayload used for delta streamer.
+ *
+ * <ol>
+ * <li> combineAndGetUpdateValue - Accepts the column names to be updated;
+ * <li> splitKeys - Split keys based upon keys;
+ * </ol>
+ */
+public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload {
+
+  public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  /**
+   * split keys over.
+   */
+  public List<String> splitKeys(String keys) throws UpdateKeyNotFoundException {
+    if (keys == null) {
+      throw new UpdateKeyNotFoundException("keys cannot be null");
+    } else if (keys.equals("")) {
+      throw new UpdateKeyNotFoundException("keys cannot be blank");
+    } else {
+      return Arrays.stream(keys.split(",")).collect(Collectors.toList());
+    }
+  }
+
+  /**
+   * check column exi.
+   */
+  public boolean checkColumnExists(List<String> keys, Schema schema) {
+    List<Schema.Field> field = schema.getFields();
+    List<Schema.Field> common = new ArrayList<>();
+    for (Schema.Field columns : field) {
+      if (keys.contains(columns.name())) {
+        common.add(columns);
+      }
+    }
+    return common.size() == keys.size();
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties)
+      throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException {
+
+    if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) {
+      throw new WriteOperationException("write should be upsert");
+    }
+
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+
+    if (!recordOption.isPresent()) {
+      return Option.empty();
+    }
+

Review comment:
       yes user can set different values for different batches 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-961587786


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-961587786


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3035:
URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209",
       "triggerID" : "26dadb6627c90c9f06e66fba0b8bd24e5579665f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org