You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/12 17:30:31 UTC

[GitHub] [hudi] xushiyan commented on issue #3975: [SUPPORT] Question on hudi's delete statment taking too long

xushiyan commented on issue #3975:
URL: https://github.com/apache/hudi/issues/3975#issuecomment-967290309


   @dmenin A few things
   
   > GLOBAL_INDEX, which prevents data duplication, but is not scalable: as the amount of data grows, the load time also increases.
   
   Do you have any numbers? it'd be valuable to see how slow you're experiencing with it. 
   
   > In other words, if I have key 123 on partition 10 and I receive key 123 again on partition 11, I delete the record from 10 and insert the one from 11.
   
   I think you're replicating the same logic implemented in global index with this flag turn on https://hudi.apache.org/docs/configurations/#hoodiesimpleindexupdatepartitionpath
   
   If you're to improve look up can you try HBase index?
   
   > HBASE index can be employed, if the operational overhead is acceptable and would provide much better lookup times for these tables.
   
   https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org