You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Chen Luo (JIRA)" <ji...@apache.org> on 2019/04/05 22:12:00 UTC
[jira] [Updated] (ASTERIXDB-2541) Introduce GreedyScheduler

     [ https://issues.apache.org/jira/browse/ASTERIXDB-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chen Luo updated ASTERIXDB-2541:
--------------------------------
    Description: 
Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance.

Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic.

In order for GreedyScheduler to work, we need the following two changes:
* Keep track of the number of scanned pages of index cursors so that we will know how many pages left;
* Introduce a mechanism to activate/deactivate merge operations

NOTE: GreedyScheduler should only be used during runtime (with a controlled data arrival process) so that it can reduce the number of disk components at its best effort. It CANNOT be used when benchmarking the system by writing as fast as possible since large merges will be starved. The measured write throughput will be high but unsustainable.

  was:
Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance.

Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic.

In order for GreedyScheduler to work, we need the following two changes:
* Keep track of the number of scanned pages of index cursors so that we will know how many pages left;
* Introduce a mechanism to activate/deactivate merge operations


> Introduce GreedyScheduler
> -------------------------
>
>                 Key: ASTERIXDB-2541
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: STO - Storage
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>            Priority: Major
>
> Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance.
> Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic.
> In order for GreedyScheduler to work, we need the following two changes:
> * Keep track of the number of scanned pages of index cursors so that we will know how many pages left;
> * Introduce a mechanism to activate/deactivate merge operations
> NOTE: GreedyScheduler should only be used during runtime (with a controlled data arrival process) so that it can reduce the number of disk components at its best effort. It CANNOT be used when benchmarking the system by writing as fast as possible since large merges will be starved. The measured write throughput will be high but unsustainable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)