You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/07 12:56:32 UTC

[GitHub] [hudi] leesf opened a new pull request #1929: [HUDI-1160] Support update partial fields for CoW table

leesf opened a new pull request #1929:
URL: https://github.com/apache/hudi/pull/1929


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751319239


   @leesf : do you think we can get this in by next week for upcoming release cut? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-703244010


   > > @leesf Any update? Let me know if you need any help here
   > 
   > ack, will update the PR ASAP
   
   @leesf Any update for the comment mentioned by @satishkotha? I would like to refer to your pull request for the work of [HUDI-1161](https://issues.apache.org/jira/browse/HUDI-1161).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487137203



##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] SteNicholas edited a comment on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
SteNicholas edited a comment on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-703244010


   > > @leesf Any update? Let me know if you need any help here
   > 
   > ack, will update the PR ASAP
   
   @leesf Any update for the comment mentioned by @satishkotha? And IMO you should resolve the conflicts first. I would like to refer to your pull request for the work of [HUDI-1161](https://issues.apache.org/jira/browse/HUDI-1161).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] SteNicholas removed a comment on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
SteNicholas removed a comment on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-703244010


   > > @leesf Any update? Let me know if you need any help here
   > 
   > ack, will update the PR ASAP
   
   @leesf Any update for the comment mentioned by @satishkotha? And IMO you should resolve the conflicts first. I would like to refer to your pull request for the work of [HUDI-1161](https://issues.apache.org/jira/browse/HUDI-1161).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751319542


   @vinothchandar : do you think we can make this a blocker? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-683302488


   @vinothchandar @bvaradar please take a look when free


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] liujinhui1994 removed a comment on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
liujinhui1994 removed a comment on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751544449


   I’m working on it now, but encountered some parquet-related issues in the process. Today I will PR to discuss
   @leesf 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r586541353



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.util.List;
+
+public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload {
+  public PartialUpdatePayload(GenericRecord record, Comparable orderingVal) {
+    super(record, orderingVal);
+  }
+
+  public PartialUpdatePayload(Option<GenericRecord> record) {
+    this(record.get(), (record1) -> 0); // natural order
+  }
+
+  @Override
+  public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+    Option<IndexedRecord> recordOption = getInsertValue(schema);
+    if (recordOption.isPresent()) {
+      IndexedRecord record = recordOption.get();
+      GenericRecord current = (GenericRecord) record;

Review comment:
       this is not current, this is new incoming record. Can we fix the naming please.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-811627648


   Closing in favor of https://github.com/apache/hudi/pull/2666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-771700273


   @liujinhui1994 : please let me know once you have addressed all comments and rebased w/ latest master. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-689905849


   @satishkotha Can you please help review? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-702500795


   > @leesf Any update? Let me know if you need any help here
   
   ack, will update the PR ASAP


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-771700273


   @liujinhui1994 : please let me know once you have addressed all comments and rebased w/ latest master. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751324318


   > @vinothchandar : do you think we can make this a blocker?
   
   @liujinhui1994 takes over this work, would you please share the progress


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r511534369



##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       +1 

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {

Review comment:
       Can we have this so that whether or not we can update partial fields is simply controlled by the payload impl. 
   I am concerned that a config like `config.updatePartialFields` is not inline with how we treat payloads. for e.g for MOR, there is no writeConfig when we merge the fields via record reader 

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";
+  public static final String DEFAULT_DELETE_MARKER_FIELD = "_hoodie_is_deleted";
+
+  public static final String UPDATE_PARTIAL_FIELDS = "hoodie.update.partial.fields";

Review comment:
       javadocs on what these mean please. 

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       We can get the last Schema from the commit metadata, right? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-732324996


   @leesf are you still pursuing this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487128319



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-789832412


   FYI: we have another [PR](https://github.com/apache/hudi/pull/2106) open to support similar feature just for preCombine. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751319239


   @leesf : do you think we can get this in by next week for upcoming release cut? 
   also, we have introduced new apis for all methods in [HoodieRecordPayload](https://issues.apache.org/jira/browse/HUDI-898) with Properties arg to assist in special needs like this. Do you think we can leverage this new api. 
   Also, I see another PR on similar lines https://github.com/apache/hudi/pull/2106


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-735362200


   > @leesf are you still pursuing this?
   
   yes, will update the PR when free


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] liujinhui1994 commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
liujinhui1994 commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751544449


   I’m working on it now, but encountered some parquet-related issues in the process. Today I will PR to discuss
   @leesf 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       > Do you need to create resolver again? Does config.getLastSchema() work here?
   
   I tested it E2E and found that would not get the lastSchema from config since the config object are different

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       > We are potentially reducing schema here, so I think this can lead to issues. Can we throw error? At the least, can you add a LOG here to make sure this gets noticed?
   
   it handles the case that users config update partial fields in the first time, my original idea is not to throw error in this case, and LOG here sounds reasonable to me.

##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       sounds reasonable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] satishkotha commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
satishkotha commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-702404415


   @leesf Any update? Let me know if you need any help here


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487128690



##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       good catch, not needed. will revert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] liujinhui1994 commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
liujinhui1994 commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-799175366


   @nsivabalan   This PR has been modified based on the comments,
   https://github.com/apache/hudi/pull/2666
   Please review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] satishkotha commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
satishkotha commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r486525140



##########
File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = "hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = BulkInsertSortMode.GLOBAL_SORT
       .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = "hoodie.write.delete.marker.field";

Review comment:
       Is this needed for this change? what is this used for? 

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());
+        schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+      } catch (Exception e) {
+        // ignore exception.
+        schema = config.getSchema();

Review comment:
       We are potentially reducing schema here,  so I think this can lead to issues.  Can we throw error? At the least, can you add a LOG here to make sure this gets noticed? 

##########
File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
     if (extraMetadata.isPresent()) {
       extraMetadata.get().forEach(metadata::addMetadata);
     }
-    metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+    String schema = config.getSchema();
+    if (config.updatePartialFields()) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(table.getMetaClient());

Review comment:
       Do you need to create resolver again? Does config.getLastSchema() work here?

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/commit/MergeHelper.java
##########
@@ -73,7 +74,11 @@
     } else {
       gReader = null;
       gWriter = null;
-      readSchema = upsertHandle.getWriterSchemaWithMetafields();
+      if (table.getConfig().updatePartialFields() && !StringUtils.isNullOrEmpty(table.getConfig().getLastSchema())) {
+        readSchema = new Schema.Parser().parse(table.getConfig().getLastSchema());

Review comment:
       similar comment as before. if we make config.getSchema() to always track full table schema, this can be simplified.
   

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String
    * @param config Write Config
    * @return
    */
-  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair<Schema, Schema> getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, HoodieTable hoodieTable) {
     Schema originalSchema = new Schema.Parser().parse(config.getSchema());
     Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+    boolean updatePartialFields = config.updatePartialFields();
+    if (updatePartialFields) {
+      try {
+        TableSchemaResolver resolver = new TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
       This is only applicable for MergeHandle if i understand correctly. Do you think its better to override this in MergeHandle?

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/commit/MergeHelper.java
##########
@@ -73,7 +74,11 @@
     } else {
       gReader = null;
       gWriter = null;
-      readSchema = upsertHandle.getWriterSchemaWithMetafields();
+      if (table.getConfig().updatePartialFields() && !StringUtils.isNullOrEmpty(table.getConfig().getLastSchema())) {
+        readSchema = new Schema.Parser().parse(table.getConfig().getLastSchema());
+      } else {
+        readSchema = upsertHandle.getWriterSchemaWithMetafields();

Review comment:
       we are also calling getWriterSchemaWithMetafields in other places in this class (example: line 163). Dont we need to read getLastSchema() there?

##########
File path: hudi-client/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
##########
@@ -237,6 +238,9 @@ protected void finalizeWrite(String instantTime, List<HoodieWriteStat> stats, Ho
    * By default, return the writer schema in Write Config for storing in commit.
    */
   protected String getSchemaToStoreInCommit() {
+    if (config.updatePartialFields() && !StringUtils.isNullOrEmpty(config.getLastSchema())) {

Review comment:
       This check is repeated in multiple places. I usually find this pattern error-prone. Is it possible to reorganize? For example, we always expect config.getSchema() to represent full table schema. We add new config.getUpdateSchema() that tracks partial fields that are being updated.  
   
   With that approach, I think we can use getUpdateSchema only in MergeHandle/helpers. Storing schema would work as before as we store full table schema i.e., config.getSchema()

##########
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -192,6 +208,10 @@ protected long getAttemptId() {
 
   protected HoodieFileWriter createNewFileWriter(String instantTime, Path path, HoodieTable<T> hoodieTable,
       HoodieWriteConfig config, Schema schema, SparkTaskContextSupplier sparkTaskContextSupplier) throws IOException {
-    return HoodieFileWriterFactory.getFileWriter(instantTime, path, hoodieTable, config, schema, sparkTaskContextSupplier);
+    if (config.updatePartialFields() && !StringUtils.isNullOrEmpty(config.getLastSchema())) {
+      return HoodieFileWriterFactory.getFileWriter(instantTime, path, hoodieTable, config, new Schema.Parser().parse(config.getLastSchema()), sparkTaskContextSupplier);

Review comment:
       same question as above, is it better to override this only in MergeHandle?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] liujinhui1994 commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
liujinhui1994 commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-751622554


   Can be completed today, and PR
   @leesf 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] liujinhui1994 commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
liujinhui1994 commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-793798733


   @nsivabalan  I’m sorry for the late reply. Recently I cannot access git due to network reasons. I will deal with this PR as suggested in the comments. I hope you will have time to review later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #1929: [HUDI-1160] Support update partial fields for CoW table

Posted by GitBox <gi...@apache.org>.
nsivabalan closed pull request #1929:
URL: https://github.com/apache/hudi/pull/1929


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org