You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Zhenya Stanilovsky <ar...@mail.ru> on 2021/01/12 09:59:47 UTC

Re[2]: Questions related to check pointing



 
>Hi Zhenya,
> 
>Thanks for the pointers - I will look into them.
> 
>I have been doing some additional reading into this and discovered we are using a 4.0 NFS client, which seems to be the first 'no-no'; we will look at updating to use the 41 NFS client.
> 
>We have modified our default timer cadence for checkpointing from 3 minutes to 1 minutes, which seems to be giving us better performance. We will continue to measure the impact that has.
> 
>Lastly, I'm planning to merge our two data regions into a single region to reduce 'too many dirty pages' checkpoints due to high write activity in a small region.
> 
>Would using larger pages sizes (eg: 16kb) be useful with EFS?
Hi, Raymond.
I have no info about it, it would be helpful if will you share your research.
thanks !
> 
>Raymond.  
>On Tue, Jan 12, 2021 at 8:27 PM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>hope it would be helpful too:
>>https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs
>>https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html
>>> 
>>>Hi Zhenya,
>>> 
>>>The matching checkpoint finished log is this:
>>> 
>>>2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer]  Checkpoint finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, pagesWrite=1150ms, fsync=37104ms, total=38571ms]  
>>> 
>>>Regards your comment that 3/4 of pages in whole data region need to be dirty to trigger this, can you confirm this is 3/4 of the maximum size of the data region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)
>>> 
>>>Are data regions independently checkpointed, or are they checkpointed as a whole, so that a 'too many dirty pages' condition affects all data regions in terms of write blocking?
>>> 
>>>Can you comment on my query regarding should we set Min and Max size of the data region to be the same? Ie: Don't bother with growing the data region memory use on demand, just allocate the maximum?  
>>> 
>>>In terms of the checkpoint lock hold time metric, of the checkpoints quoting 'too many dirty pages' there is one instance apart from the one I have provided earlier violating this limit, ie:
>>> 
>>>2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, splitAndSortCpPagesDuration=276ms, pages=77774, reason=' too many dirty pages ']  
>>> 
>>>This is out of a population of 16 instances I can find. The remainder have lock times of 16-17ms.
>>> 
>>>Regarding writes of pages to the persistent store, does the check pointing system parallelise writes across partitions ro maximise throughput? 
>>> 
>>>Thanks,
>>>Raymond.
>>> 
>>>   
>>>On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>>>
>>>>All write operations will be blocked for this timeout :  checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :    reason=' too many dirty pages ' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them). And please attach other part of cp message starting with : Checkpoint finished.
>>>>
>>>>
>>>> 
>>>>>In ( https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood ), there is a mention of a dirty pages limit that is a factor that can trigger check points.
>>>>> 
>>>>>I also found this issue:  http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>>> 
>>>>>After reviewing our logs I found this: (one example)
>>>>> 
>>>>>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages ']   
>>>>> 
>>>>>Which suggests we may have the issue where writes are frozen until the check point is completed.
>>>>> 
>>>>>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>>>>> 
>>>>>    /**
>>>>>     * Threshold to calculate limit for pages list on-heap caches.
>>>>>     * <p>
>>>>>     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
>>>>>     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
>>>>>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
>>>>>     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
>>>>>     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
>>>>>     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>>>>     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
>>>>>     */
>>>>>     private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  =  0.1 ;
>>>>> 
>>>>>This raises two questions: 
>>>>> 
>>>>>1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>>>> 
>>>>>The 'limit holder' is calculated like this:
>>>>> 
>>>>>    /**
>>>>>     *  @return  Holder for page list cache limit for given data region.
>>>>>     */
>>>>>     public   AtomicLong   pageListCacheLimitHolder ( DataRegion   dataRegion ) {
>>>>>         if  ( dataRegion . config (). isPersistenceEnabled ()) {
>>>>>             return   pageListCacheLimits . computeIfAbsent ( dataRegion . config (). getName (), name  ->   new   AtomicLong (
>>>>>                ( long )(((PageMemoryEx) dataRegion . pageMemory ()). totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>>>>        }  
>>>>>         return   null ;
>>>>>    }
>>>>> 
>>>>>... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
>>>>> 
>>>>>2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
>>>>> 
>>>>>The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
>>>>> 
>>>>>Thanks,
>>>>>Raymond.
>>>>>   
>>>>>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wilson@trimble.com > wrote:
>>>>>>I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.  
>>>>>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>>>>>>
>>>>>>>Don`t think so, checkpointing work perfectly well already before this fix.
>>>>>>>Need additional info for start digging your problem, can you share ignite logs somewhere?
>>>>>>>   
>>>>>>>>I noticed an entry in the Ignite 2.9.1 changelog:
>>>>>>>>*  Improved checkpoint concurrent behaviour
>>>>>>>>I am having trouble finding the relevant Jira ticket for this in the 2.9.1 Jira area at  https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>>>>>> 
>>>>>>>>Perhaps this change may improve the checkpointing issue we are seeing?
>>>>>>>> 
>>>>>>>>Raymond.
>>>>>>>>   
>>>>>>>>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < raymond_wilson@trimble.com > wrote:
>>>>>>>>>Hi Zhenya,
>>>>>>>>> 
>>>>>>>>>1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
>>>>>>>>> 
>>>>>>>>>2. Thanks for the detail, we will look for that in thread dumps when we can create them.
>>>>>>>>> 
>>>>>>>>>3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
>>>>>>>>> 
>>>>>>>>>4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
>>>>>>>>> 
>>>>>>>>>Thanks,
>>>>>>>>>Raymond.
>>>>>>>>>    
>>>>>>>>>   
>>>>>>>>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>>>>>>>>>*  
>>>>>>>>>>Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
>>>>>>>>>>*  Log will shows you something like :
>>>>>>>>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
>>>>>>>>>>Unparking thread=
>>>>>>>>>>*  No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
>>>>>>>>>>*  90 seconds or longer  —    Seems like problems in io or system tuning, it`s very bad score i hope. 
>>>>>>>>>>[1]  https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 
>>>>>>>>>>>Hi,
>>>>>>>>>>> 
>>>>>>>>>>>We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>>>>>>>> 
>>>>>>>>>>>I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
>>>>>>>>>>> 
>>>>>>>>>>>1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
>>>>>>>>>>> 
>>>>>>>>>>>2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>>>>>>>>> 
>>>>>>>>>>>3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
>>>>>>>>>>> 
>>>>>>>>>>>We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
>>>>>>>>>>> 
>>>>>>>>>>>It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
>>>>>>>>>>> 
>>>>>>>>>>>Thanks,
>>>>>>>>>>>Raymond.
>>>>>>>>>>>  --
>>>>>>>>>>>
>>>>>>>>>>>Raymond Wilson
>>>>>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>>  --
>>>>>>>>>
>>>>>>>>>Raymond Wilson
>>>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>>>>>+64-21-2013317  Mobile
>>>>>>>>>raymond_wilson@trimble.com
>>>>>>>>>         
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>  --
>>>>>>>>
>>>>>>>>Raymond Wilson
>>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>>>>+64-21-2013317  Mobile
>>>>>>>>raymond_wilson@trimble.com
>>>>>>>>         
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>> 
>>>>>>  --
>>>>>>
>>>>>>Raymond Wilson
>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>>+64-21-2013317  Mobile
>>>>>>raymond_wilson@trimble.com
>>>>>>         
>>>>>> 
>>>>> 
>>>>>  --
>>>>>
>>>>>Raymond Wilson
>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>+64-21-2013317  Mobile
>>>>>raymond_wilson@trimble.com
>>>>>         
>>>>> 
>>>> 
>>>> 
>>>> 
>>>>  
>>> 
>>>  --
>>>
>>>Raymond Wilson
>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>+64-21-2013317  Mobile
>>>raymond_wilson@trimble.com
>>>         
>>> 
>>>
>>>
>>>  
>> 
>> 
>> 
>>  
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>raymond_wilson@trimble.com
>         
>