You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Mehrdad Nurolahzade <me...@nurolahzade.com> on 2017/01/05 18:59:09 UTC
Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/
-----------------------------------------------------------
Review request for Aurora, Joshua Cohen and Stephan Erb.
Bugs: AURORA-1847
https://issues.apache.org/jira/browse/AURORA-1847
Repository: aurora
Description
-------
If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
Diffs
-----
src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
Diff: https://reviews.apache.org/r/55217/diff/
Testing
-------
Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
```java
public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
@Setup(Level.Trial)
@Override
public void setUp() {
storage = Guice.createInjector(
Modules.combine(
DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
new AbstractModule() {
@Override
protected void configure() {
bind(StatsProvider.class).toInstance(new FakeStatsProvider());
bind(Clock.class).toInstance(new FakeClock());
}
}))
.getInstance(Storage.class);
}
@Setup(Level.Iteration)
public void setUpIteration() {
createTasks(numTasks);
}
@TearDown(Level.Iteration)
public void tearDownIteration() {
deleteTasks();
}
@Benchmark
public Iterable<IJobKey> run() {
return storage.read(store -> store.getTaskStore().getJobKeys());
}
}
```
Benchmark results BEFORE patch:
```
Benchmark (numTasks) Mode Cnt Score Error Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
```
Benchmark results AFTER patch:
```
Benchmark (numTasks) Mode Cnt Score Error Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
```
Thanks,
Mehrdad Nurolahzade
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160830
-----------------------------------------------------------
Ship it!
Ship It!
- Reza Motamedi
On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 6:59 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Mehrdad Nurolahzade <me...@nurolahzade.com>.
> On Jan. 5, 2017, 11:17 a.m., Reza Motamedi wrote:
> > src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java, line 93
> > <https://reviews.apache.org/r/55217/diff/1/?file=1597676#file1597676line93>
> >
> > newb question here... I thought that FakeClock does not advance by itself. I am wondering what it the benchamrks are coming from or, if it is just added to fix a dependency?
Correct, this is a fix for a missing dependency.
Not sure what requires an injection of `Clock` here, perhaps something related to stats. Anyways, JMH does not do timing based on Aurora's internal abstraction of `Clock`, therefore benchmarks are not affected by the choice of `Clock` abstraction.
- Mehrdad
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160616
-----------------------------------------------------------
On Jan. 5, 2017, 10:59 a.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 10:59 a.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160616
-----------------------------------------------------------
src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java (line 93)
<https://reviews.apache.org/r/55217/#comment231751>
newb question here... I thought that FakeClock does not advance by itself. I am wondering what it the benchamrks are coming from or, if it is just added to fix a dependency?
- Reza Motamedi
On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 6:59 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160815
-----------------------------------------------------------
Ship it!
Ship It!
- Stephan Erb
On Jan. 5, 2017, 7:59 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 7:59 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Mehrdad Nurolahzade <me...@nurolahzade.com>.
> On Jan. 5, 2017, 11:38 a.m., Dmitriy Shirchenko wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java, lines 117-126
> > <https://reviews.apache.org/r/55217/diff/1/?file=1597677#file1597677line117>
> >
> > :nit any reason why you didn't extract second index declration on 117:120
I initially did but PMD did not like it, so I let it be: `src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java:99: Perhaps 'hostIndex' could be replaced by a local variable.
:pmdMain FAILED`
- Mehrdad
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160620
-----------------------------------------------------------
On Jan. 5, 2017, 10:59 a.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 10:59 a.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Dmitriy Shirchenko <ca...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160620
-----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java (lines 116 - 120)
<https://reviews.apache.org/r/55217/#comment231758>
:nit any reason why you didn't extract second index declration on 117:120
- Dmitriy Shirchenko
On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 6:59 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160617
-----------------------------------------------------------
Master (d4ebb56) is green with this patch.
./build-support/jenkins/build.sh
However, it appears that it might lack test coverage.
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 6:59 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>
Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in
MemTaskStore.getJobKeys()
Posted by Joshua Cohen <jc...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review161318
-----------------------------------------------------------
Ship it!
Ship It!
- Joshua Cohen
On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
>
> (Updated Jan. 5, 2017, 6:59 p.m.)
>
>
> Review request for Aurora, Joshua Cohen and Stephan Erb.
>
>
> Bugs: AURORA-1847
> https://issues.apache.org/jira/browse/AURORA-1847
>
>
> Repository: aurora
>
>
> Description
> -------
>
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
>
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`).
>
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
>
>
> Diffs
> -----
>
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b
>
> Diff: https://reviews.apache.org/r/55217/diff/
>
>
> Testing
> -------
>
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
> @Setup(Level.Trial)
> @Override
> public void setUp() {
> storage = Guice.createInjector(
> Modules.combine(
> DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
> new AbstractModule() {
> @Override
> protected void configure() {
> bind(StatsProvider.class).toInstance(new FakeStatsProvider());
> bind(Clock.class).toInstance(new FakeClock());
> }
> }))
> .getInstance(Storage.class);
>
> }
>
> @Setup(Level.Iteration)
> public void setUpIteration() {
> createTasks(numTasks);
> }
>
> @TearDown(Level.Iteration)
> public void tearDownIteration() {
> deleteTasks();
> }
>
> @Benchmark
> public Iterable<IJobKey> run() {
> return storage.read(store -> store.getTaskStore().getJobKeys());
> }
> }
> ```
>
> Benchmark results BEFORE patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 572.761 � 168.865 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 80.697 � 8.516 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 25.102 � 3.244 ops/s
> ```
>
> Benchmark results AFTER patch:
> ```
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 327336.998 � 10207.402 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 320506.958 � 23430.527 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 287962.695 � 51917.245 ops/s
> ```
>
>
> Thanks,
>
> Mehrdad Nurolahzade
>
>