You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/03/03 18:06:11 UTC

[GitHub] [hudi] nsivabalan commented on issue #8085: [SUPPORT] deltacommit triggering criteria

nsivabalan commented on issue #8085:
URL: https://github.com/apache/hudi/issues/8085#issuecomment-1453904397

   hey hi @tatiana-rackspace :
   Deltastreamer as you might know is a streaming ingestion tool. 
   we have some source limit to consume for each batch. 
   incase fo kafka, its no of msgs. incase of DFS based sources, its number of bytes.
   
   you can configure the source limit using `--source-limit`. More info can be found here https://hudi.apache.org/docs/hoodie_deltastreamer 
   
   also, it depends on how much data was available when sync() was called. 
   lets say you have configured the min-sync-interval to 30 mins(`--min-sync-interval-seconds`), deltastreamer will try to fetch data from source and sync to hudi once every 30 mins, 
   So, at t0, it will consume from source adhering to max limit you have configured. and then after 30 mins, it will again consume from source based on last checkpoint, again adhering to the source limit. 
   
   Let me know if this clarifies things. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org