You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Kevin Sweeney <ke...@apache.org> on 2014/10/07 21:28:34 UTC

Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/
-----------------------------------------------------------

Review request for Aurora, Bill Farner and Zameer Manji.


Bugs: AURORA-801
    https://issues.apache.org/jira/browse/AURORA-801


Repository: aurora


Description
-------

Drop syncrhonized from JobUpdateEventSubscriber

This fixes a startup deadlock.


Diffs
-----

  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 

Diff: https://reviews.apache.org/r/26422/diff/


Testing
-------

./gradlew -Pq build

Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.

Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).


Thanks,

Kevin Sweeney


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55854
-----------------------------------------------------------

Ship it!


Thanks for the extra look!

- Bill Farner


On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 5:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55827
-----------------------------------------------------------

Ship it!


Ship It!

- David McLaughlin


On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 5:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/
-----------------------------------------------------------

(Updated Oct. 8, 2014, 10:27 a.m.)


Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.


Bugs: AURORA-801
    https://issues.apache.org/jira/browse/AURORA-801


Repository: aurora


Description
-------

Drop syncrhonized from JobUpdateEventSubscriber

This fixes a startup deadlock.


Diffs
-----

  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 

Diff: https://reviews.apache.org/r/26422/diff/


Testing
-------

./gradlew -Pq build

Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.

Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).


Thanks,

Kevin Sweeney


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Kevin Sweeney <ke...@apache.org>.

> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> >     $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> >     src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> >     src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> >     src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> >     src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> >     src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> >     src/main/java/org/apache/aurora/scheduler/TaskVars.java
> >     src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
> 
> Kevin Sweeney wrote:
>     My proposal is to add runtime deadlock detection for these cases via CycleDetectingLockFactory. I have runtime evidence that this deadlock exists and would like to keep this change small in scope. Happy to add this as a followup item to AURORA-800.
> 
> Bill Farner wrote:
>     That effort shouldn't cause us to skip due diligence of a skim for other places we're vulnerable.

A cursory look through doesn't reveal any immediate concerns. Preemptor does acquire the storage lock in a synchronized method; however the only caller of Preemptor always holds the storage write lock. Others just use synchronization to ensure consistent internal state.

Note I used 'synchronized ' to avoid synchronizedMap.
% grep -Rl 'synchronized '  src/main/java | xargs grep -lE '(.write|.consistentRead|.consistentFetchTasks)'
src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
src/main/java/org/apache/aurora/scheduler/TaskVars.java
src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java

Of course, this doesn't reveal cases where a call to a dependency might cause the storage lock to be acquired, nor does it protect against accidental introduction of new deadlocks so AURORA-800 is still relevant.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------


On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 10:27 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Kevin Sweeney <ke...@apache.org>.

> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> >     $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> >     src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> >     src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> >     src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> >     src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> >     src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> >     src/main/java/org/apache/aurora/scheduler/TaskVars.java
> >     src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java

My proposal is to add runtime deadlock detection for these cases via CycleDetectingLockFactory. I have runtime evidence that this deadlock exists and would like to keep this change small in scope. Happy to add this as a followup item to AURORA-800.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------


On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 10:27 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Bill Farner <wf...@apache.org>.

> On Oct. 8, 2014, 4:19 p.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> >     $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> >     src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> >     src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> >     src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> >     src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> >     src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> >     src/main/java/org/apache/aurora/scheduler/TaskVars.java
> >     src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
> 
> Kevin Sweeney wrote:
>     My proposal is to add runtime deadlock detection for these cases via CycleDetectingLockFactory. I have runtime evidence that this deadlock exists and would like to keep this change small in scope. Happy to add this as a followup item to AURORA-800.

That effort shouldn't cause us to skip due diligence of a skim for other places we're vulnerable.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------


On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 5:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------


While our minds are on deadlock risks, it's a good idea to assess other potential vulnerabilities.

A quick filter to find other potential sources deserving a glance:
    $ grep -Rl synchronized src/main/java | xargs grep -l Storage
    src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
    src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
    src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
    src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
    src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
    src/main/java/org/apache/aurora/scheduler/TaskVars.java
    src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java

- Bill Farner


On Oct. 7, 2014, 7:28 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 7:28 p.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55700
-----------------------------------------------------------

Ship it!


Ship It!

- Zameer Manji


On Oct. 7, 2014, 12:28 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 12:28 p.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>