You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "machadoluiz (via GitHub)" <gi...@apache.org> on 2023/06/01 14:46:36 UTC

[GitHub] [hudi] machadoluiz commented on issue #8824: [SUPPORT] Performance and Data Integrity Issues with Hudi for Long-Term Data Retention

machadoluiz commented on issue #8824:
URL: https://github.com/apache/hudi/issues/8824#issuecomment-1572195137

   @ad1happy2go, the runtime increment happens gradually. In a specific example, it reached 2 minutes and 30 seconds around 300 commits (or 10 months). This poses a challenge for us, given it represents less than a year's worth of data.  Is there any way that could improve this performance, or is this a trade-off we must deal with?
   
   Does Hudi perform operations using actual data or just metadata in the background? 
   
   Does this mean that if we expand the size of the database, the cost/runtime will increase proportionally for managing the metadata? Or is this related only to the filenames, in which case this cost will be somewhat constant, regardless of the size of the database?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org