You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Mehrdad Nurolahzade (JIRA)" <ji...@apache.org> on 2016/12/06 23:18:58 UTC
[jira] [Updated] (AURORA-1847) Eliminate sequential scan in
MemTaskStore.getJobKeys()
[ https://issues.apache.org/jira/browse/AURORA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mehrdad Nurolahzade updated AURORA-1847:
----------------------------------------
Description:
The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
{code}
Benchmark (numTasks) Mode Cnt Score Error Units
TaskStoreBenchmarks.DBFetchTasksBenchmark.run 10000 thrpt 5 320271.082 ± 30842.727 ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run 50000 thrpt 5 334805.551 ± 20435.139 ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run 100000 thrpt 5 317395.890 ± 45302.180 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 624.944 ± 54.038 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 91.335 ± 9.241 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 27.712 ± 8.128 ops/s
{code}
If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page ({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}.
The implementation of this method is currently very inefficient as it results in a sequential scan of the task store and then mapping to their respective job keys. The sequential scan and mapping to job key can be eliminated by simply returning the key set of the existing secondary index {{job}}.
was:
The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
{code}
Benchmark (numTasks) Mode Cnt Score Error Units
TaskStoreBenchmarks.DBFetchTasksBenchmark.run 10000 thrpt 5 78430.531 ± 3255.027 ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run 50000 thrpt 5 50774.988 ± 8986.951 ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run 100000 thrpt 5 2480.074 ± 9833.122 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 1189.568 ± 108.146 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 124.990 ± 27.605 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 35.724 ± 15.101 ops/s
{code}
If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page ({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}.
The implementation of this method is currently very inefficient as it results in a sequential scan of the task store and then mapping to their respective job keys. The sequential scan and mapping to job key can be eliminated by simply returning the key set of the existing secondary index {{job}}.
> Eliminate sequential scan in MemTaskStore.getJobKeys()
> ------------------------------------------------------
>
> Key: AURORA-1847
> URL: https://issues.apache.org/jira/browse/AURORA-1847
> Project: Aurora
> Issue Type: Story
> Components: Efficiency, UI
> Reporter: Mehrdad Nurolahzade
> Priority: Minor
> Labels: newbie
>
> The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
> {code}
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run 10000 thrpt 5 320271.082 ± 30842.727 ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run 50000 thrpt 5 334805.551 ± 20435.139 ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run 100000 thrpt 5 317395.890 ± 45302.180 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 624.944 ± 54.038 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 91.335 ± 9.241 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 27.712 ± 8.128 ops/s
> {code}
> If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page ({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}.
> The implementation of this method is currently very inefficient as it results in a sequential scan of the task store and then mapping to their respective job keys. The sequential scan and mapping to job key can be eliminated by simply returning the key set of the existing secondary index {{job}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)