You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/04 14:01:09 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

nsivabalan opened a new pull request #1792:
URL: https://github.com/apache/hudi/pull/1792


   ## What is the purpose of the pull request
   
   Fixing deletes for inserts in same write batch. 
   
   ## Brief change log
   
   If deletes are issued for records going to be inserted in same write batch, w/o this patch, deletes were ignored. So, the patch fixes the same.  
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - *TestHoodieClientOnCopyOnWriteStorage#testDeletesForInsertsInSameBatch*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1792:
URL: https://github.com/apache/hudi/pull/1792#issuecomment-662763386


   ```
   [ERROR] Failures: 
   2552[ERROR] org.apache.hudi.client.TestHoodieClientOnCopyOnWriteStorage.testUpsertsUpdatePartitionPathGlobalBloom(HoodieIndex$IndexType)
   2553[INFO]   Run 1: PASS
   2554[ERROR]   Run 2: TestHoodieClientOnCopyOnWriteStorage.testUpsertsUpdatePartitionPathGlobalBloom:448->testUpsertsUpdatePartitionPath:506 expected: <1> but was: <0>
   ```
   
   @nsivabalan this was a recent test right?  this seems to be failing here. may be run this until failiure locally and go from there?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1792:
URL: https://github.com/apache/hudi/pull/1792#discussion_r451583378



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -59,24 +59,28 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
 
   @Override
   public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+    return getInsertValue(schema);
+  }
 
-    Option<IndexedRecord> recordOption = getInsertValue(schema);
-    if (!recordOption.isPresent()) {
+  @Override
+  public Option<IndexedRecord> getInsertValue(Schema schema) throws IOException {

Review comment:
       have updated the description. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1792:
URL: https://github.com/apache/hudi/pull/1792#issuecomment-660123935


   @vinothchandar : awaiting your review. if this is not doable (checking delete in getInsertValue()) for some reason, we might have to make some changes to https://github.com/apache/hudi/pull/1819 as well. so would appreciate your review. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1792:
URL: https://github.com/apache/hudi/pull/1792#issuecomment-661682772


   @nsivabalan this does seem good to me. Can we add a Unit test specifically for `OverwriteWithLatestAvroPayload` which just tests these scenarios at the single class elvel? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar merged pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #1792:
URL: https://github.com/apache/hudi/pull/1792


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1792:
URL: https://github.com/apache/hudi/pull/1792#discussion_r450343032



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##########
@@ -59,24 +59,28 @@ public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload
 
   @Override
   public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException {
+    return getInsertValue(schema);
+  }
 
-    Option<IndexedRecord> recordOption = getInsertValue(schema);
-    if (!recordOption.isPresent()) {
+  @Override
+  public Option<IndexedRecord> getInsertValue(Schema schema) throws IOException {

Review comment:
       Trying to understand what we are really fixing here... 
   If there were an insert and delete in the same batch, precombining favors delete, then we actually skip the record and not do the insert? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org