You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Mehrdad Nurolahzade <me...@nurolahzade.com> on 2017/01/04 17:16:53 UTC
Review Request 55179: AURORA-1820 Reduce storage write lock
contention by
adopting Double-Checked Locking pattern in TimedOutTaskHandler
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/
-----------------------------------------------------------
Review request for Aurora, Joshua Cohen and Stephan Erb.
Bugs: AURORA-1820
https://issues.apache.org/jira/browse/AURORA-1820
Repository: aurora
Description
-------
`TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
Diffs
-----
src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
Diff: https://reviews.apache.org/r/55179/diff/
Testing
-------
```
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
...
*** OK (All tests passed) ***
mesos-master start/running, process 22759
+ RETCODE=0
+ restore_netrc
+ mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
+ true
Connection to 127.0.0.1 closed.
real 25m36.144s
user 0m1.358s
sys 0m0.595s
```
Thanks,
Mehrdad Nurolahzade
Re: Review Request 55179: AURORA-1820 Reduce storage write lock
contention
by adopting Double-Checked Locking pattern in TimedOutTaskHandler
Posted by Joshua Cohen <jc...@apache.org>.
> On Jan. 4, 2017, 5:39 p.m., Stephan Erb wrote:
> > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java, line 130
> > <https://reviews.apache.org/r/55179/diff/1/?file=1596711#file1596711line130>
> >
> > Given the motivation of the patch, we should probably check that we did not try to acquire the storage lock.
+1
- Joshua
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160510
-----------------------------------------------------------
On Jan. 4, 2017, 5:16 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2017, 5:16 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
>
>
> Repository: aurora
>
>
> Description
> -------
>
> `TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
>
> The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
>
> This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
>
> Diff: https://reviews.apache.org/r/55179/diff/
>
>
> Testing
> -------
>
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>
> ...
>
> *** OK (All tests passed) ***
>
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
>
> real 25m36.144s
> user 0m1.358s
> sys 0m0.595s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55179: AURORA-1820 Reduce storage write lock
contention
by adopting Double-Checked Locking pattern in TimedOutTaskHandler
Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160510
-----------------------------------------------------------
Fix it, then Ship it!
LGTM.
src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java (line 130)
<https://reviews.apache.org/r/55179/#comment231646>
Given the motivation of the patch, we should probably check that we did not try to acquire the storage lock.
- Stephan Erb
On Jan. 4, 2017, 6:16 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2017, 6:16 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
>
>
> Repository: aurora
>
>
> Description
> -------
>
> `TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
>
> The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
>
> This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
>
> Diff: https://reviews.apache.org/r/55179/diff/
>
>
> Testing
> -------
>
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>
> ...
>
> *** OK (All tests passed) ***
>
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
>
> real 25m36.144s
> user 0m1.358s
> sys 0m0.595s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55179: AURORA-1820 Reduce storage write lock
contention
by adopting Double-Checked Locking pattern in TimedOutTaskHandler
Posted by Joshua Cohen <jc...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160515
-----------------------------------------------------------
Ship it!
LGTM pending additional testing per Stephan's comments.
- Joshua Cohen
On Jan. 4, 2017, 5:16 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2017, 5:16 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
>
>
> Repository: aurora
>
>
> Description
> -------
>
> `TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
>
> The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
>
> This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
>
> Diff: https://reviews.apache.org/r/55179/diff/
>
>
> Testing
> -------
>
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>
> ...
>
> *** OK (All tests passed) ***
>
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
>
> real 25m36.144s
> user 0m1.358s
> sys 0m0.595s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55179: AURORA-1820 Reduce storage write lock
contention
by adopting Double-Checked Locking pattern in TimedOutTaskHandler
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160540
-----------------------------------------------------------
Ship it!
Master (21ad18e) is green with this patch.
./build-support/jenkins/build.sh
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On Jan. 4, 2017, 9:23 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2017, 9:23 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
>
>
> Repository: aurora
>
>
> Description
> -------
>
> `TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
>
> The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
>
> This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
>
> Diff: https://reviews.apache.org/r/55179/diff/
>
>
> Testing
> -------
>
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>
> ...
>
> *** OK (All tests passed) ***
>
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
>
> real 25m36.144s
> user 0m1.358s
> sys 0m0.595s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55179: AURORA-1820 Reduce storage write lock
contention
by adopting Double-Checked Locking pattern in TimedOutTaskHandler
Posted by Mehrdad Nurolahzade <me...@nurolahzade.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/
-----------------------------------------------------------
(Updated Jan. 4, 2017, 1:23 p.m.)
Review request for Aurora, Joshua Cohen and Stephan Erb.
Changes
-------
Added expectations for storage calls
Bugs: AURORA-1820
https://issues.apache.org/jira/browse/AURORA-1820
Repository: aurora
Description
-------
`TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
Diffs (updated)
-----
src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
Diff: https://reviews.apache.org/r/55179/diff/
Testing
-------
```
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
...
*** OK (All tests passed) ***
mesos-master start/running, process 22759
+ RETCODE=0
+ restore_netrc
+ mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
+ true
Connection to 127.0.0.1 closed.
real 25m36.144s
user 0m1.358s
sys 0m0.595s
```
Thanks,
Mehrdad Nurolahzade
Re: Review Request 55179: AURORA-1820 Reduce storage write lock
contention
by adopting Double-Checked Locking pattern in TimedOutTaskHandler
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55179/#review160512
-----------------------------------------------------------
Ship it!
Master (21ad18e) is green with this patch.
./build-support/jenkins/build.sh
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On Jan. 4, 2017, 5:16 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55179/
> -----------------------------------------------------------
>
> (Updated Jan. 4, 2017, 5:16 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1820
> https://issues.apache.org/jira/browse/AURORA-1820
>
>
> Repository: aurora
>
>
> Description
> -------
>
> `TimedOutTaskHandler` acquires storage write lock for every task every time they transition to a transient state. It then verifies after a default time-out period of 5 minutes if the task has transitioned out of the transient state.
>
> The verification step takes place while holding the storage write lock. In over 99% of cases the logic short-circuits and returns from `StateManagerImpl.updateTaskAndExternalState()` once it learns task has transitioned out of the transient state.
>
> This patch reduces storage write lock contention by adopting Double-Checked Locking pattern in `TimedOutTaskHandler.run()`.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskTimeout.java 2dc9bc2c6916595270187f0f29d5bd8c5ba7e9ad
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskTimeoutTest.java 1006ddb6caea015c2d4e014bd044f2933541c84f
>
> Diff: https://reviews.apache.org/r/55179/diff/
>
>
> Testing
> -------
>
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>
> ...
>
> *** OK (All tests passed) ***
>
> mesos-master start/running, process 22759
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
>
> real 25m36.144s
> user 0m1.358s
> sys 0m0.595s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>