You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/09/02 12:42:38 UTC

[GitHub] [incubator-hudi] smdahmed commented on issue #859: Hudi upsert after a delete in partition will cause valid records inserted to disappear.

smdahmed commented on issue #859: Hudi upsert after a delete in partition will cause valid records inserted to disappear.
URL: https://github.com/apache/incubator-hudi/issues/859#issuecomment-527134270
 
 
   Thanks again Vinoth. I think I know the reason why you are not seeing the issue. If you could kindly include partition in your setup, you should see the issue too. As I suggested in the initial report, this issue is only reproduced if there is a partitioned hive table involved. 
   
   Steps: (Let the schema be: id, name, team)
   
   1. Insert data into certain partition (eg: p1) -> (1, kabeer, hudi | 2, vinoth, hudi)
   2. Delete record (1, kabeer, hudi)
   3. Upsert a new record: (3, balaji, hudi)
   
   Please treat the partition column as team column and kindly ensure that the hive table partition path in which all the 3 records should be <base_path_of_table>/team. 
   
   Now when you query the table. You should expect to see the records vinoth and balaji. But you would only see balaji. 
   -----------------------------------------------
   Some additional information based on your setup:
   I have repeated the above test setup by making partition column as id column - i.e. all the records land in their own partition columns based on their ID - so kabeer lands in partition 1, vinoth in 2 and balaji in partition 3. 
   Since the record that gets deleted is kabeer, I am successfully able to see Vinoth and Balaji. So I am now convinced that for a table in a given partition, an upsert followed by delete will cause all previous records to vanish away. 
   
   More information:
   The upsert routine that I use after insert and delete is the same. I can confirm that upserts after insert land me with expected behaviour which goes on to indicate that I am not doing savemode.overwrite It is only after delete that there is an issue. 
   
   Hopefully when you extend your setup with partition information of the table, we can confirm that the issue exists. I am very grateful to you for all  your support. Eagerly looking forward to see your further findings. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services