You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/03/03 18:06:11 UTC
[GitHub] [hudi] nsivabalan commented on issue #8085: [SUPPORT] deltacommit triggering criteria
nsivabalan commented on issue #8085:
URL: https://github.com/apache/hudi/issues/8085#issuecomment-1453904397
hey hi @tatiana-rackspace :
Deltastreamer as you might know is a streaming ingestion tool.
we have some source limit to consume for each batch.
incase fo kafka, its no of msgs. incase of DFS based sources, its number of bytes.
you can configure the source limit using `--source-limit`. More info can be found here https://hudi.apache.org/docs/hoodie_deltastreamer
also, it depends on how much data was available when sync() was called.
lets say you have configured the min-sync-interval to 30 mins(`--min-sync-interval-seconds`), deltastreamer will try to fetch data from source and sync to hudi once every 30 mins,
So, at t0, it will consume from source adhering to max limit you have configured. and then after 30 mins, it will again consume from source based on last checkpoint, again adhering to the source limit.
Let me know if this clarifies things.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org