You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "umustafi (via GitHub)" <gi...@apache.org> on 2023/02/13 19:29:45 UTC

[GitHub] [gobblin] umustafi commented on pull request #3640: [GOBBLIN-1783] Initialize scheduler with batch gets instead of individual get per flow

umustafi commented on PR #3640:
URL: https://github.com/apache/gobblin/pull/3640#issuecomment-1428533867

   > 
   
   1. Current implementation, adds scheduler then the specConsumer to list of services. I considered switching order but scheduler needs to be initialized before consuming specs and trying to add to scheduler. Need to confirm if services are initialized in that other or done concurrently. specConsumer starts consuming from the latestOffset so this should not miss any specs. The offset won't move along unless service is up and able to accept requests and our consumer is processing. 
   
   2. The problem can come up if we are loading flowSpecA from old value and while processing that batch there's API request to update flow and consumer calls onAdd with a newer value first, then scheduler calls with old value. It's very rare but we may want to add modified timestamp to avoid. This technically _could_ have happened in previous case although much more rare chance with the individual gets that in between get and add spec the consumer processed a newer spec version. If we want to use modification time, need to make bigger change to store modified time with spec in `DagManager` or `Scheduler` itself perhaps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org