You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@aurora.apache.org by Mehrdad Nurolahzade <me...@nurolahzade.com> on 2017/01/05 18:59:09 UTC

Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/
-----------------------------------------------------------

Review request for Aurora, Joshua Cohen and Stephan Erb.


Bugs: AURORA-1847
    https://issues.apache.org/jira/browse/AURORA-1847


Repository: aurora


Description
-------

If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.

The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 

This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.


Diffs
-----

  src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 

Diff: https://reviews.apache.org/r/55217/diff/


Testing
-------

Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
```java
public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
    @Setup(Level.Trial)
    @Override
    public void setUp() {
      storage = Guice.createInjector(
          Modules.combine(
              DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
              new AbstractModule() {
                @Override
                protected void configure() {
                  bind(StatsProvider.class).toInstance(new FakeStatsProvider());
                  bind(Clock.class).toInstance(new FakeClock());
                }
              }))
          .getInstance(Storage.class);

    }

    @Setup(Level.Iteration)
    public void setUpIteration() {
      createTasks(numTasks);
    }

    @TearDown(Level.Iteration)
    public void tearDownIteration() {
      deleteTasks();
    }

    @Benchmark
    public Iterable<IJobKey> run() {
      return storage.read(store -> store.getTaskStore().getJobKeys());
    }
  }
```

Benchmark results BEFORE patch:
```
Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
```

Benchmark results AFTER patch:
```
Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
```


Thanks,

Mehrdad Nurolahzade

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Reza Motamedi <re...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160830
-----------------------------------------------------------


Ship it!




Ship It!

- Reza Motamedi


On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 6:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Mehrdad Nurolahzade <me...@nurolahzade.com>.


> On Jan. 5, 2017, 11:17 a.m., Reza Motamedi wrote:
> > src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java, line 93
> > <https://reviews.apache.org/r/55217/diff/1/?file=1597676#file1597676line93>
> >
> >     newb question here... I thought that FakeClock does not advance by itself. I am wondering what it the benchamrks are coming from or, if it is just added to fix a dependency?

Correct, this is a fix for a missing dependency.

Not sure what requires an injection of `Clock` here, perhaps something related to stats. Anyways, JMH does not do timing based on Aurora's internal abstraction of `Clock`, therefore benchmarks are not affected by the choice of `Clock` abstraction.


- Mehrdad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160616
-----------------------------------------------------------


On Jan. 5, 2017, 10:59 a.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 10:59 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Reza Motamedi <re...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160616
-----------------------------------------------------------




src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java (line 93)
<https://reviews.apache.org/r/55217/#comment231751>

    newb question here... I thought that FakeClock does not advance by itself. I am wondering what it the benchamrks are coming from or, if it is just added to fix a dependency?


- Reza Motamedi


On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 6:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Stephan Erb <se...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160815
-----------------------------------------------------------


Ship it!




Ship It!

- Stephan Erb


On Jan. 5, 2017, 7:59 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 7:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Mehrdad Nurolahzade <me...@nurolahzade.com>.


> On Jan. 5, 2017, 11:38 a.m., Dmitriy Shirchenko wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java, lines 117-126
> > <https://reviews.apache.org/r/55217/diff/1/?file=1597677#file1597677line117>
> >
> >     :nit any reason why you didn't extract second index declration on 117:120

I initially did but PMD did not like it, so I let it be: `src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java:99:        Perhaps 'hostIndex' could be replaced by a local variable.
:pmdMain FAILED`


- Mehrdad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160620
-----------------------------------------------------------


On Jan. 5, 2017, 10:59 a.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 10:59 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Dmitriy Shirchenko <ca...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160620
-----------------------------------------------------------




src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java (lines 116 - 120)
<https://reviews.apache.org/r/55217/#comment231758>

    :nit any reason why you didn't extract second index declration on 117:120


- Dmitriy Shirchenko


On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 6:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Aurora ReviewBot <wf...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review160617
-----------------------------------------------------------



Master (d4ebb56) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 6:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Re: Review Request 55217: AURORA-1847 Eliminate sequential scan in MemTaskStore.getJobKeys()

Posted by Joshua Cohen <jc...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55217/#review161318
-----------------------------------------------------------


Ship it!




Ship It!

- Joshua Cohen


On Jan. 5, 2017, 6:59 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55217/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2017, 6:59 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1847
>     https://issues.apache.org/jira/browse/AURORA-1847
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If scheduler is configured to run with the `MemTaskStore` every hit on scheduler landing page (`/scheduler`) causes a call to `MemTaskStore.getJobKeys()` through `ReadOnlyScheduler.getRoleSummary()`.
> 
> The implementation of `MemTaskStore.getJobKeys()` is currently very inefficient as it requires a sequential scan of the task store and mapping to their respective job keys. In Twitter clusters this method is currently taking half a second per call (`mem_storage_get_job_keys`). 
> 
> This patch eliminates the sequential scan and mapping to job key by simply returning an immutable copy of the key set of the existing secondary index `job`.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java f2f00b92bf901c438e40bbc177e9f5a112be1bbc 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java fc272ddb45be8b2f55f01c3d54f7fa9058202c0b 
> 
> Diff: https://reviews.apache.org/r/55217/diff/
> 
> 
> Testing
> -------
> 
> Using the following modified version of existing `TaskStoreBenchmarks.MemFetchTasksBenchmark` which benchmarks `TaskStore.getJobKeys()`:
> ```java
> public static class MemFetchTasksBenchmark extends AbstractFetchTasksBenchmark {
>     @Setup(Level.Trial)
>     @Override
>     public void setUp() {
>       storage = Guice.createInjector(
>           Modules.combine(
>               DbModule.testModuleWithWorkQueue(PLAIN, Optional.of(new InMemStoresModule(PLAIN))),
>               new AbstractModule() {
>                 @Override
>                 protected void configure() {
>                   bind(StatsProvider.class).toInstance(new FakeStatsProvider());
>                   bind(Clock.class).toInstance(new FakeClock());
>                 }
>               }))
>           .getInstance(Storage.class);
> 
>     }
> 
>     @Setup(Level.Iteration)
>     public void setUpIteration() {
>       createTasks(numTasks);
>     }
> 
>     @TearDown(Level.Iteration)
>     public void tearDownIteration() {
>       deleteTasks();
>     }
> 
>     @Benchmark
>     public Iterable<IJobKey> run() {
>       return storage.read(store -> store.getTaskStore().getJobKeys());
>     }
>   }
> ```
> 
> Benchmark results BEFORE patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt    Score     Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  572.761 � 168.865  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5   80.697 �   8.516  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5   25.102 �   3.244  ops/s
> ```
> 
> Benchmark results AFTER patch:
> ```
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  327336.998 � 10207.402  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  320506.958 � 23430.527  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  287962.695 � 51917.245  ops/s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>