You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Kevin Sweeney <ke...@apache.org> on 2014/10/07 21:28:34 UTC
Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/
-----------------------------------------------------------
Review request for Aurora, Bill Farner and Zameer Manji.
Bugs: AURORA-801
https://issues.apache.org/jira/browse/AURORA-801
Repository: aurora
Description
-------
Drop syncrhonized from JobUpdateEventSubscriber
This fixes a startup deadlock.
Diffs
-----
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
Diff: https://reviews.apache.org/r/26422/diff/
Testing
-------
./gradlew -Pq build
Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
Thanks,
Kevin Sweeney
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55854
-----------------------------------------------------------
Ship it!
Thanks for the extra look!
- Bill Farner
On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 8, 2014, 5:27 p.m.)
>
>
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55827
-----------------------------------------------------------
Ship it!
Ship It!
- David McLaughlin
On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 8, 2014, 5:27 p.m.)
>
>
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/
-----------------------------------------------------------
(Updated Oct. 8, 2014, 10:27 a.m.)
Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
Bugs: AURORA-801
https://issues.apache.org/jira/browse/AURORA-801
Repository: aurora
Description
-------
Drop syncrhonized from JobUpdateEventSubscriber
This fixes a startup deadlock.
Diffs
-----
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
Diff: https://reviews.apache.org/r/26422/diff/
Testing
-------
./gradlew -Pq build
Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
Thanks,
Kevin Sweeney
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Kevin Sweeney <ke...@apache.org>.
> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
> >
> > A quick filter to find other potential sources deserving a glance:
> > $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>
> Kevin Sweeney wrote:
> My proposal is to add runtime deadlock detection for these cases via CycleDetectingLockFactory. I have runtime evidence that this deadlock exists and would like to keep this change small in scope. Happy to add this as a followup item to AURORA-800.
>
> Bill Farner wrote:
> That effort shouldn't cause us to skip due diligence of a skim for other places we're vulnerable.
A cursory look through doesn't reveal any immediate concerns. Preemptor does acquire the storage lock in a synchronized method; however the only caller of Preemptor always holds the storage write lock. Others just use synchronization to ensure consistent internal state.
Note I used 'synchronized ' to avoid synchronizedMap.
% grep -Rl 'synchronized ' src/main/java | xargs grep -lE '(.write|.consistentRead|.consistentFetchTasks)'
src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
src/main/java/org/apache/aurora/scheduler/TaskVars.java
src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java
Of course, this doesn't reveal cases where a call to a dependency might cause the storage lock to be acquired, nor does it protect against accidental introduction of new deadlocks so AURORA-800 is still relevant.
- Kevin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------
On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 8, 2014, 10:27 a.m.)
>
>
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Kevin Sweeney <ke...@apache.org>.
> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
> >
> > A quick filter to find other potential sources deserving a glance:
> > $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
My proposal is to add runtime deadlock detection for these cases via CycleDetectingLockFactory. I have runtime evidence that this deadlock exists and would like to keep this change small in scope. Happy to add this as a followup item to AURORA-800.
- Kevin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------
On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 8, 2014, 10:27 a.m.)
>
>
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Bill Farner <wf...@apache.org>.
> On Oct. 8, 2014, 4:19 p.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
> >
> > A quick filter to find other potential sources deserving a glance:
> > $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>
> Kevin Sweeney wrote:
> My proposal is to add runtime deadlock detection for these cases via CycleDetectingLockFactory. I have runtime evidence that this deadlock exists and would like to keep this change small in scope. Happy to add this as a followup item to AURORA-800.
That effort shouldn't cause us to skip due diligence of a skim for other places we're vulnerable.
- Bill
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------
On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 8, 2014, 5:27 p.m.)
>
>
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------
While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
A quick filter to find other potential sources deserving a glance:
$ grep -Rl synchronized src/main/java | xargs grep -l Storage
src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
src/main/java/org/apache/aurora/scheduler/TaskVars.java
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
- Bill Farner
On Oct. 7, 2014, 7:28 p.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 7, 2014, 7:28 p.m.)
>
>
> Review request for Aurora, Bill Farner and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26422: Drop syncrhonized from
JobUpdateEventSubscriber
Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55700
-----------------------------------------------------------
Ship it!
Ship It!
- Zameer Manji
On Oct. 7, 2014, 12:28 p.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
>
> (Updated Oct. 7, 2014, 12:28 p.m.)
>
>
> Review request for Aurora, Bill Farner and Zameer Manji.
>
>
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Drop syncrhonized from JobUpdateEventSubscriber
>
> This fixes a startup deadlock.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1
>
> Diff: https://reviews.apache.org/r/26422/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
>
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
>
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
>
>
> Thanks,
>
> Kevin Sweeney
>
>