You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Mehrdad Nurolahzade <me...@apache.org> on 2017/05/04 00:09:45 UTC

Review Request 58979: Expose stats on log storage lock wait time per caller

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58979/
-----------------------------------------------------------

Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.


Repository: aurora


Description
-------

`LogStorage` exposes aggregated stats on its write lock wait time (`log_storage_write_lock_wait`). However, this does not help much with performance studies where a breakdown of wait time by caller is required.

This patch exposes stats on the writer processes trying to acquire `LogStorage` write lock. Metric collection has been put behind a feature flag (`enable_log_storage_lock_wait_tracking`) to ensure it does not depreciate scheduler performance (although the induced overhead should be minimal). It is disabled by default but can be enabled for performance studies or in test clusters.


Diffs
-----

  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 


Diff: https://reviews.apache.org/r/58979/diff/1/


Testing
-------

- Manually under Vagrant
- end to end test script

```
$ curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_nanos_total '

log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__nanos_total 81407978
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__nanos_total 2337
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__nanos_total 3257
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__nanos_total 3628
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__nanos_total 2101
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__nanos_total 79698582
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__nanos_total 172993
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___nanos_total 3323
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__nanos_total 1928

curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_events '

log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__events 2
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__events 35
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__events 1
```


Thanks,

Mehrdad Nurolahzade


Re: Review Request 58979: Expose stats on log storage lock wait time per caller

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58979/#review173915
-----------------------------------------------------------


Ship it!




Master (8a6e01c) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On May 4, 2017, 3:57 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58979/
> -----------------------------------------------------------
> 
> (Updated May 4, 2017, 3:57 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> `LogStorage` exposes aggregated stats on its write lock wait time (`log_storage_write_lock_wait`). However, this does not help much with performance studies where a breakdown of wait time by caller is required.
> 
> This patch exposes stats on the writer processes trying to acquire `LogStorage` write lock. Metric collection has been put behind a feature flag (`enable_log_storage_lock_wait_tracking`) to ensure it does not depreciate scheduler performance (although the induced overhead should be minimal). It is disabled by default but can be enabled for performance studies or in test clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 
> 
> 
> Diff: https://reviews.apache.org/r/58979/diff/2/
> 
> 
> Testing
> -------
> 
> - Manually under Vagrant
> - end to end test script
> 
> ```
> $ curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_nanos_total '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__nanos_total 81407978
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__nanos_total 2337
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__nanos_total 3257
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__nanos_total 3628
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__nanos_total 2101
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__nanos_total 79698582
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__nanos_total 172993
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___nanos_total 3323
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__nanos_total 1928
> 
> curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_events '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__events 2
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__events 35
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__events 1
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>


Re: Review Request 58979: Expose stats on log storage lock wait time per caller

Posted by Mehrdad Nurolahzade <me...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58979/
-----------------------------------------------------------

(Updated May 4, 2017, 8:57 a.m.)


Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.


Changes
-------

Minor refactoring suggested by Stephan


Repository: aurora


Description
-------

`LogStorage` exposes aggregated stats on its write lock wait time (`log_storage_write_lock_wait`). However, this does not help much with performance studies where a breakdown of wait time by caller is required.

This patch exposes stats on the writer processes trying to acquire `LogStorage` write lock. Metric collection has been put behind a feature flag (`enable_log_storage_lock_wait_tracking`) to ensure it does not depreciate scheduler performance (although the induced overhead should be minimal). It is disabled by default but can be enabled for performance studies or in test clusters.


Diffs (updated)
-----

  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 


Diff: https://reviews.apache.org/r/58979/diff/2/

Changes: https://reviews.apache.org/r/58979/diff/1-2/


Testing
-------

- Manually under Vagrant
- end to end test script

```
$ curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_nanos_total '

log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__nanos_total 81407978
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__nanos_total 2337
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__nanos_total 3257
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__nanos_total 3628
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__nanos_total 2101
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__nanos_total 79698582
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__nanos_total 172993
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___nanos_total 3323
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__nanos_total 1928

curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_events '

log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__events 2
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__events 35
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___events 1
log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__events 1
```


Thanks,

Mehrdad Nurolahzade


Re: Review Request 58979: Expose stats on log storage lock wait time per caller

Posted by Mehrdad Nurolahzade <me...@apache.org>.

> On May 4, 2017, 6:32 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > Lines 210 (patched)
> > <https://reviews.apache.org/r/58979/diff/1/?file=1708018#file1708018line210>
> >
> >     Here you call it "nanos", in the stats above it is called "ns". Should be consistent I suppose.

I did consider that. However after looking at usage of `SlidingStats` constructor I realized `log_storage_write_lock_wait` is the only metric that uses `ns` and all others use `nanos`.


> On May 4, 2017, 6:32 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > Lines 559 (patched)
> > <https://reviews.apache.org/r/58979/diff/1/?file=1708018#file1708018line559>
> >
> >     I am wondering why you get the frame here at the top, rather than moving `getCallerFrame()` into an `if (enableLogStorageLockWaitTracking)` where needed below. Did you fear a performance penalty doing this while holding the lock?

It ended up looking like that after a bit of refactoring I did. 
Now that you mention it I can see that it looks weird. Going to fix.


- Mehrdad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58979/#review173897
-----------------------------------------------------------


On May 3, 2017, 5:09 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58979/
> -----------------------------------------------------------
> 
> (Updated May 3, 2017, 5:09 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> `LogStorage` exposes aggregated stats on its write lock wait time (`log_storage_write_lock_wait`). However, this does not help much with performance studies where a breakdown of wait time by caller is required.
> 
> This patch exposes stats on the writer processes trying to acquire `LogStorage` write lock. Metric collection has been put behind a feature flag (`enable_log_storage_lock_wait_tracking`) to ensure it does not depreciate scheduler performance (although the induced overhead should be minimal). It is disabled by default but can be enabled for performance studies or in test clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 
> 
> 
> Diff: https://reviews.apache.org/r/58979/diff/1/
> 
> 
> Testing
> -------
> 
> - Manually under Vagrant
> - end to end test script
> 
> ```
> $ curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_nanos_total '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__nanos_total 81407978
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__nanos_total 2337
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__nanos_total 3257
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__nanos_total 3628
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__nanos_total 2101
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__nanos_total 79698582
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__nanos_total 172993
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___nanos_total 3323
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__nanos_total 1928
> 
> curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_events '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__events 2
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__events 35
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__events 1
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>


Re: Review Request 58979: Expose stats on log storage lock wait time per caller

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58979/#review173897
-----------------------------------------------------------


Fix it, then Ship it!




Looks good to me. Two minor things below


src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
Lines 210 (patched)
<https://reviews.apache.org/r/58979/#comment246978>

    Here you call it "nanos", in the stats above it is called "ns". Should be consistent I suppose.



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
Lines 559 (patched)
<https://reviews.apache.org/r/58979/#comment246979>

    I am wondering why you get the frame here at the top, rather than moving `getCallerFrame()` into an `if (enableLogStorageLockWaitTracking)` where needed below. Did you fear a performance penalty doing this while holding the lock?


- Stephan Erb


On May 4, 2017, 2:09 a.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58979/
> -----------------------------------------------------------
> 
> (Updated May 4, 2017, 2:09 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> `LogStorage` exposes aggregated stats on its write lock wait time (`log_storage_write_lock_wait`). However, this does not help much with performance studies where a breakdown of wait time by caller is required.
> 
> This patch exposes stats on the writer processes trying to acquire `LogStorage` write lock. Metric collection has been put behind a feature flag (`enable_log_storage_lock_wait_tracking`) to ensure it does not depreciate scheduler performance (although the induced overhead should be minimal). It is disabled by default but can be enabled for performance studies or in test clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 
> 
> 
> Diff: https://reviews.apache.org/r/58979/diff/1/
> 
> 
> Testing
> -------
> 
> - Manually under Vagrant
> - end to end test script
> 
> ```
> $ curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_nanos_total '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__nanos_total 81407978
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__nanos_total 2337
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__nanos_total 3257
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__nanos_total 3628
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__nanos_total 2101
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__nanos_total 79698582
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__nanos_total 172993
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___nanos_total 3323
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__nanos_total 1928
> 
> curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_events '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__events 2
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__events 35
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__events 1
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>


Re: Review Request 58979: Expose stats on log storage lock wait time per caller

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58979/#review173854
-----------------------------------------------------------


Ship it!




Master (8a6e01c) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On May 4, 2017, 12:09 a.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58979/
> -----------------------------------------------------------
> 
> (Updated May 4, 2017, 12:09 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> `LogStorage` exposes aggregated stats on its write lock wait time (`log_storage_write_lock_wait`). However, this does not help much with performance studies where a breakdown of wait time by caller is required.
> 
> This patch exposes stats on the writer processes trying to acquire `LogStorage` write lock. Metric collection has been put behind a feature flag (`enable_log_storage_lock_wait_tracking`) to ensure it does not depreciate scheduler performance (although the induced overhead should be minimal). It is disabled by default but can be enabled for performance studies or in test clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 835f1604c0c5d913a87d570ee01d053bbbf92ecb 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 
> 
> 
> Diff: https://reviews.apache.org/r/58979/diff/1/
> 
> 
> Testing
> -------
> 
> - Manually under Vagrant
> - end to end test script
> 
> ```
> $ curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_nanos_total '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__nanos_total 81407978
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__nanos_total 2337
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__nanos_total 3257
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__nanos_total 3628
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__nanos_total 2101
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__nanos_total 79698582
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__nanos_total 172993
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___nanos_total 3323
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__nanos_total 1928
> 
> curl -s localhost:8081/vars | egrep -e 'log_storage_write_lock_wait_for_.*_events '
> 
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.BatchWorker.processBatch_BatchWorker.java_207__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.SchedulerLifecycle_4.accept_SchedulerLifecycle.java_226__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.handleRegistration_MesosCallbackHandler.java_182__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.mesos.MesosCallbackHandler_MesosCallbackHandlerImpl.lambda_handleOffers_3_MesosCallbackHandler.java_206__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.Storage_MutateWork_NoResult.apply_Storage.java_152__events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.db.RowGarbageCollector.runOneIteration_RowGarbageCollector.java_83__events 2
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage.replay_LogStorage.java_469__events 35
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.storage.log.LogStorage__EnhancerByGuice__924dd57b.CGLIB_start_6__generated___events 1
> log_storage_write_lock_wait_for_org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume_JobUpdateControllerImpl.java_302__events 1
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>