You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/23 15:24:11 UTC

[GitHub] [hudi] ranjani1993 opened a new issue, #6776: [SUPPORT] HUDI taking longer time for update

ranjani1993 opened a new issue, #6776:
URL: https://github.com/apache/hudi/issues/6776

   Hi Team,
   
   We are trying to implement HUDI for one of workflows in our project.
   
   The problem we are facing is we don't get only updated/changed records from source. We get the entire (unchanged + updated + new records) from source.
   
   Example:
   
   Source table has 1 billion records per partition
   Our target HUDI table has 1 billion records per partition
   
   Out of those 1 billion records in the source few records got updated. We don't know what are all the records got updated.
   
   So when we perform HUDI upsert operation on these 1 billion records in target against 1 billion records in source - HUDI is taking longer time than the regular overwrite operation (regular overwrite - in which we overwrite the entire partition in target table)
   
   We tried to apply optimisation by changing the index type to SIMPLE & other parallelism configs/ Spark configs. But we could not achieve the expected result.
   
   Just wanted to check, whether HUDI would be suitable for our usecase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ranjani1993 closed issue #6776: [SUPPORT] HUDI taking longer time for update

Posted by GitBox <gi...@apache.org>.
ranjani1993 closed issue #6776: [SUPPORT] HUDI taking longer time for update
URL: https://github.com/apache/hudi/issues/6776


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org