You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Aleksey Plekhanov (Jira)" <ji...@apache.org> on 2019/10/03 13:29:01 UTC
[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

    [ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943607#comment-16943607 ] 

Aleksey Plekhanov commented on IGNITE-6930:
-------------------------------------------

The patch is ready. To minimize WAL record I've used next approach:

There is a small on-heap pages list cache allocated for each bucket. There are three types of operations with free-lists: put the page to the tail of the bucket (after insert and remove row), take a page from the tail of the bucket (before insert row), remove the page from the bucket (before remove row), each of these operations first look into the pages cache, then work with page memory.

There is no WAL record needed if the page uses only buckets pages cache. So, it's possible then the page was put into free-list, moved through the bucket, leave the free list and hasn't produced any free-list WAL record at all.

On-heap pages cache is flushed to page memory before each checkpoint to ensure the same recovery guarantees as now (physical WAL records are restored from WAL only to the moment of the last unsuccessful checkpoint if it was started, so we need only final buckets state at the moment of checkpoint). 

[~ivan.glukos], could you please have a look?

 

> Optionally to do not write free list updates to WAL
> ---------------------------------------------------
>
>                 Key: IGNITE-6930
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6930
>             Project: Ignite
>          Issue Type: Task
>          Components: cache
>            Reporter: Vladimir Ozerov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: IEP-8, performance
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When entry is updated, we need to update free list(s) several times. Currently free list is persistent structure, so every update to it must be logged to be able to recover after crash. This may incur significant overhead, especially for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, order=1510667560607, nodeOrder=1], partId=0, partCnt=4]]]], super=WALRecord [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, pageId=0001000000000006, grpId=94416770, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000006, super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=129, chainSize=0, pos=null, type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=0001000000000005, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000008, super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000004, super=WALRecord [size=47, chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=30, chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, pageId=0001000000000008, grpId=94416770, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000008, super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=0001000000000005, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000006, super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly here), you will see that data update required ~300 bytes, so do free list update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)