You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Mehrdad Nurolahzade (JIRA)" <ji...@apache.org> on 2016/12/06 23:18:58 UTC
[jira] [Updated] (AURORA-1847) Eliminate sequential scan in MemTaskStore.getJobKeys()

     [ https://issues.apache.org/jira/browse/AURORA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mehrdad Nurolahzade updated AURORA-1847:
----------------------------------------
    Description: 
The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
{code}
Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
TaskStoreBenchmarks.DBFetchTasksBenchmark.run        10000  thrpt    5  320271.082 ± 30842.727  ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run        50000  thrpt    5  334805.551 ± 20435.139  ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run       100000  thrpt    5  317395.890 ± 45302.180  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5     624.944 ±    54.038  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5      91.335 ±     9.241  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5      27.712 ±     8.128  ops/s
{code}

If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page ({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}. 

The implementation of this method is currently very inefficient as it results in a sequential scan of the task store and then mapping to their respective job keys. The sequential scan and mapping to job key can be eliminated by simply returning the key set of the existing secondary index  {{job}}.

  was:
The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
{code}
Benchmark                                       (numTasks)   Mode  Cnt      Score      Error  Units
TaskStoreBenchmarks.DBFetchTasksBenchmark.run        10000  thrpt    5  78430.531 ± 3255.027  ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run        50000  thrpt    5  50774.988 ± 8986.951  ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run       100000  thrpt    5   2480.074 ± 9833.122  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5   1189.568 ±  108.146  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5    124.990 ±   27.605  ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5     35.724 ±   15.101  ops/s
{code}

If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page ({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}. 

The implementation of this method is currently very inefficient as it results in a sequential scan of the task store and then mapping to their respective job keys. The sequential scan and mapping to job key can be eliminated by simply returning the key set of the existing secondary index  {{job}}.


> Eliminate sequential scan in MemTaskStore.getJobKeys()
> ------------------------------------------------------
>
>                 Key: AURORA-1847
>                 URL: https://issues.apache.org/jira/browse/AURORA-1847
>             Project: Aurora
>          Issue Type: Story
>          Components: Efficiency, UI
>            Reporter: Mehrdad Nurolahzade
>            Priority: Minor
>              Labels: newbie
>
> The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
> {code}
> Benchmark                                       (numTasks)   Mode  Cnt       Score       Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run        10000  thrpt    5  320271.082 ± 30842.727  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run        50000  thrpt    5  334805.551 ± 20435.139  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run       100000  thrpt    5  317395.890 ± 45302.180  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5     624.944 ±    54.038  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5      91.335 ±     9.241  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5      27.712 ±     8.128  ops/s
> {code}
> If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page ({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}. 
> The implementation of this method is currently very inefficient as it results in a sequential scan of the task store and then mapping to their respective job keys. The sequential scan and mapping to job key can be eliminated by simply returning the key set of the existing secondary index  {{job}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)