You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/12 18:53:03 UTC

[GitHub] [spark] dhruve commented on issue #24035: [SPARK-27112] : Spark Scheduler encounters two independent Deadlocks …

dhruve commented on issue #24035: [SPARK-27112] : Spark Scheduler encounters two independent Deadlocks …
URL: https://github.com/apache/spark/pull/24035#issuecomment-472135426
 
 
   I think if we fix the lock ordering for the involved threads, this will solve the issue.
   
   The current order in which locks are being acquired for individual threads is:
   
   TaskResultGetter Order:
   - Lock YarnClusterScheduler
   - Lock CoarseGrainedSchedulerBackend
   
   DispatcherEventLoop Order:
   - Lock CoarseGrainedSchedulerBackend
   - Lock YarnClusterScheduler
   
   SparkDynamicExecutorAllocation Order:
   - Lock ExecutorAllocationManager
   - Lock CoarseGrainedSchedulerBackend
   - Lock TaskSchedulerImpl/YarnClusterScheduler 
   
   Solution:
   The methods which are resulting in the deadlock are from activity in the CoarseGrainedSchedulerBackend.
   
   1. KillExecutors: The only check which requires the lock on TSI/YCS is to check if the executor is busy or not. We can bump up the check for idle executors before synchronizing on CGSB. This will fix the lock order for the dynamic allocation thread.
   
   2. MakeOffers: This currently acquires the lock on CGSB to ensure executors are not killed while a task is being offered on them. And eventually makes the `resourceOffer` on the scheduler which is where it acquires the second lock. I agree with @attilapiros suggestion here to fix the second lock ordering issue by synchronizing on the scheduler first and then the backend.
   
   These 2 changes should align the ordering sequence and seem to be simple to reason about. I think this should solve the issue, but it would be good to have more contributors eyeball this change.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org