You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Kalashnikov (Jira)" <ji...@apache.org> on 2019/10/17 09:39:00 UTC

[jira] [Commented] (IGNITE-12295) Faster index eviction

    [ https://issues.apache.org/jira/browse/IGNITE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953560#comment-16953560 ] 

Sergey Kalashnikov commented on IGNITE-12295:
---------------------------------------------

The following changes were made to demonstrate the approach.

1) A special cursor {{PurgeCursor}} is introduced to B+Tree to clean up tree items according to a provided filter.
The filter is passed a page IO object and item index. An existing {{TreeRowClosure}} is re-used for the filter.

For the purpose of the index partition clearing the filter passed to the above cursor does the following:
- reads the link of the page item;
- extracts partition id from the link value;
- determines if the partition is requested for clearing.
This code is made a part of {{H2Tree}} class (which represents one segment of the index), method {{purge}}.

The purge cursor may delete multiple items from the page at once. To reflect that, the new physical delta WAL record {{PurgeRecord}} is introduced.
The new record contains the indexes of all removed items and the new count of items that remains in the page.

2) A new {{GridQueryIndexing}} method to provide the caller with a collection of runnables and futures each responsible for clearing a single index.
{{List<IgniteBiTuple<Runnable, IgniteInternalFuture<Void>>> purgeIndexPartitions(CacheGroupContext grp, Set<Integer> parts)}}

This specific choice of API hides the details of enumerating tables and indexes of cache group from the caller and allows to move actual processing to some other context.

The implementation is done in {{IgniteH2Indexing}} class.

3) Changes in pipeline of {{PartitionsEvictManager}}.

An API entry point for starting index partition eviction mechanism on a cache group and listening for single result.
{{public IgniteInternalFuture<Void> purgePartitionsExclusively(CacheGroupContext grp, List<GridDhtLocalPartition> parts) throws IgniteCheckedException}}

It makes sure that all the given partitions are not reserved and are not being cleared at the moment.
The {{clearFuture}} futures of every partition is reinitialized to prevent further clashes with regular clearing requests.
After all partitions are available for purge, the index purge runnables provided by {{GridQueryIndexing}} are converted to tasks that are put through usual queues/pipeline of {{PartitionsEvictManager}}.

Once all index purge tasks are completed, the additional tasks are run:
- to clean up all the CacheMapEntries for affected partitions;
- to clean up H2RowCache.

5) Limitations/applicability.
- It is assumed that no concurrent changes are made to affected partitions, otherwise there is no guarantee that index is free of rows of such partitions after iteration is completed.
In current design of file-based rebalancing, the concurrent changes are not applied to the data storage.
- Only applicable to indexes that use Ignite's internal B+Tree. Cache groups with full text indexes cannot be cleared that way.
- Per-entry event notifications are missing, since we are not clearing actual data store partition.


> Faster index eviction
> ---------------------
>
>                 Key: IGNITE-12295
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12295
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Sergey Kalashnikov
>            Assignee: Sergey Kalashnikov
>            Priority: Major
>
> For the file-based rebalancing approach, it seems feasible to avoid iterating the old partition data in order to clear the indexes.
> One can independently clear the shared index structures of all the rows referencing entries from moving partitions by deducing partition id from the links in the leaf pages.
> The proposed algorithm is simple and takes the set of integer partition ids as an input:
> 1. Iterate over leaf pages of the index and remove items attributed to any of indicated partitions, unless it is the only or the rightmost item on a page.
> 2. If the rightmost item (or the only item) on a page happens to belong to any of the indicated partitions, employ a regular remove algorithm (descending from the root) so that inner pages get correctly updated.
> Restart iteration from the leaf page where the removed item would be inserted (descend from the root to find it).
> The use of such algorithm can be justified (as having performance advantage) when the number of keys that'd be removed is bigger than the number of leaf pages in the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)