You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "peanut-chenzhong (via GitHub)" <gi...@apache.org> on 2023/02/24 01:58:18 UTC

[GitHub] [hudi] peanut-chenzhong opened a new issue, #8034: [SUPPORT]merge into didn`t reinsert the delete record

peanut-chenzhong opened a new issue, #8034:
URL: https://github.com/apache/hudi/issues/8034

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.create table if not exists hudi_table1 (id int,name string,price double)  using hudi options (type = 'mor',primaryKey = 'id',preCombineField = 'price');
   2.create table if not exists hudi_table2 (id int,name string,price double)  using hudi options (type = 'mor',primaryKey = 'id',preCombineField = 'price');
   3.set hoodie.parquet.small.file.limit=0;
   4.insert into hudi_table1 select 1,1,1;
   5.insert into hudi_table1 select 2,1,1;
   6.merge into hudi_table2  using (select * from hudi_table1) as b on (hudi_table2.id = b.id and hudi_table2.name=b.name) when not matched then insert *;
   7.select * from hudi_table2;
   8.delete from hudi_table2 where id=1;
   9.merge into hudi_table2  using (select * from hudi_table1) as b on (hudi_table2.id = b.id and hudi_table2.name=b.name) when not matched then insert *;
   10select * from hudi_table2;
   
   
   **Expected behavior**
   
   after step 10, where should be two records in hudi_table2, but now we can only get one which id=2;
   
   **Environment Description**
   
   * Hudi version :0.14.0
   
   * Spark version :3.3.1
   
   * Hive version :3.1.1
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #8034: [SUPPORT]merge into didn`t reinsert the delete record

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8034:
URL: https://github.com/apache/hudi/issues/8034#issuecomment-1453965150

   I can explain whats happening under the hood. 
   not sure how we can fix it properly. Might need to think deep. 
   
   After step 8 above, delete of id=1 goes into a log file in hudi_table2. So, if you do a sanpshot read from table2, you will not see id=1 record. But if you do an index look up, it might show as though id=1 belongs to hudi_table2 untill compaction kicks in. So, during step9, the merge into results in an index lookup (when not matched), both id=1 and id=2 are seen as valid records from hudi_table2. and so it does not re-insert anything. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #8034: [SUPPORT]merge into didn`t reinsert the delete record

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8034:
URL: https://github.com/apache/hudi/issues/8034#issuecomment-1453971798

   Created a ticket https://issues.apache.org/jira/browse/HUDI-5875 to follow up. 
   This will not be an issue w/ COW table or after compaction kicks in for the file group of interest. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] peanut-chenzhong commented on issue #8034: [SUPPORT]merge into didn`t reinsert the delete record

Posted by "peanut-chenzhong (via GitHub)" <gi...@apache.org>.
peanut-chenzhong commented on issue #8034:
URL: https://github.com/apache/hudi/issues/8034#issuecomment-1447477059

   @alexeykudinkin could you kindly help to take care of this ticket?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #8034: [SUPPORT]merge into didn`t reinsert the delete record

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on issue #8034:
URL: https://github.com/apache/hudi/issues/8034#issuecomment-1455057831

   > But if you do an index look up, it might show as though id=1 belongs to hudi_table2 untill compaction kicks in.
   Isn't index updated after delete for MOR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org