You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2021/12/22 00:24:39 UTC

[GitHub] [samza] prateekm opened a new pull request #1570: Made ContainerStorageManager restore non-blocking to prevent deadlock when num restoreExecutor threads <= num tasks

prateekm opened a new pull request #1570:
URL: https://github.com/apache/samza/pull/1570


   Symptom: Container can deadlock during state restore on startup.
    
   Cause: ContainerStorageManager kicks off TaskRestoreManager#restore and blocks for restore to complete on the restoreExecutor. TaskRestoreManagers can restore state asynchronously using ContainerStorageManager's restoreExecutor.  If TaskRestoreManagers schedule additional asynchronous tasks on the restoreExecutor and block (Future#get or CompletableFuture#join) for them to complete, it can cause a deadlock if num restore executor threads <= num tasks. This is because all threads in restoreExecutor (if num threads <= num tasks) would be blocked by TaskRestoreCallable's that are waiting for restore to finish and the asynchronous work will never be executed. The workaround to keep num threads > num tasks can be inefficient for containers with a large number of tasks.
    
   Changes: 
   a) Made TaskRestoreManager#restore return a future instead of blocking. Note that restore managers must still take care to not block for completion for futures scheduled on the restore executor. This PR makes it so they're not forced to because of the interface.
   b) Made ContainerStorageManager block for the restore future completion on the main thread instead of restoreExecutor.
    
   Tests: Verified manually that ContainerStorageManager does not block on restore executor now and that there is no deadlock if num threads <= num tasks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] prateekm merged pull request #1570: Made ContainerStorageManager restore non-blocking to prevent deadlock when num restoreExecutor threads <= num tasks

Posted by GitBox <gi...@apache.org>.
prateekm merged pull request #1570:
URL: https://github.com/apache/samza/pull/1570


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org