You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Zhenya Stanilovsky <ar...@mail.ru> on 2020/12/30 11:48:49 UTC

Re[4]: Questions related to check pointing

Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 
>In ( https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood ), there is a mention of a dirty pages limit that is a factor that can trigger check points.
> 
>I also found this issue:  http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
> 
>After reviewing our logs I found this: (one example)
> 
>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages ']   
> 
>Which suggests we may have the issue where writes are frozen until the check point is completed.
> 
>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
> 
>    /**
>     * Threshold to calculate limit for pages list on-heap caches.
>     * <p>
>     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
>     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
>     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
>     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
>     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
>     */
>     private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  =  0.1 ;
> 
>This raises two questions: 
> 
>1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
> 
>The 'limit holder' is calculated like this:
> 
>    /**
>     *  @return  Holder for page list cache limit for given data region.
>     */
>     public   AtomicLong   pageListCacheLimitHolder ( DataRegion   dataRegion ) {
>         if  ( dataRegion . config (). isPersistenceEnabled ()) {
>             return   pageListCacheLimits . computeIfAbsent ( dataRegion . config (). getName (), name  ->   new   AtomicLong (
>                ( long )(((PageMemoryEx) dataRegion . pageMemory ()). totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>        }  
>         return   null ;
>    }
> 
>... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
> 
>2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
> 
>The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
> 
>Thanks,
>Raymond.
>   
>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wilson@trimble.com > wrote:
>>I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.  
>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>>
>>>Don`t think so, checkpointing work perfectly well already before this fix.
>>>Need additional info for start digging your problem, can you share ignite logs somewhere?
>>>   
>>>>I noticed an entry in the Ignite 2.9.1 changelog:
>>>>*  Improved checkpoint concurrent behaviour
>>>>I am having trouble finding the relevant Jira ticket for this in the 2.9.1 Jira area at  https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>> 
>>>>Perhaps this change may improve the checkpointing issue we are seeing?
>>>> 
>>>>Raymond.
>>>>   
>>>>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < raymond_wilson@trimble.com > wrote:
>>>>>Hi Zhenya,
>>>>> 
>>>>>1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
>>>>> 
>>>>>2. Thanks for the detail, we will look for that in thread dumps when we can create them.
>>>>> 
>>>>>3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
>>>>> 
>>>>>4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
>>>>> 
>>>>>Thanks,
>>>>>Raymond.
>>>>>    
>>>>>   
>>>>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>>>>>*  
>>>>>>Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
>>>>>>*  Log will shows you something like :
>>>>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
>>>>>>Unparking thread=
>>>>>>*  No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
>>>>>>*  90 seconds or longer  —    Seems like problems in io or system tuning, it`s very bad score i hope. 
>>>>>>[1]  https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>>>
>>>>>>
>>>>>> 
>>>>>>>Hi,
>>>>>>> 
>>>>>>>We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>>>> 
>>>>>>>I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
>>>>>>> 
>>>>>>>1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
>>>>>>> 
>>>>>>>2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>>>>> 
>>>>>>>3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
>>>>>>> 
>>>>>>>We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
>>>>>>> 
>>>>>>>It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
>>>>>>> 
>>>>>>>Thanks,
>>>>>>>Raymond.
>>>>>>>  --
>>>>>>>
>>>>>>>Raymond Wilson
>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>> 
>>>>>  --
>>>>>
>>>>>Raymond Wilson
>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>+64-21-2013317  Mobile
>>>>>raymond_wilson@trimble.com
>>>>>         
>>>>> 
>>>> 
>>>>  --
>>>>
>>>>Raymond Wilson
>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>+64-21-2013317  Mobile
>>>>raymond_wilson@trimble.com
>>>>         
>>>> 
>>> 
>>> 
>>> 
>>>  
>> 
>>  --
>>
>>Raymond Wilson
>>Solution Architect, Civil Construction Software Systems (CCSS)
>>11 Birmingham Drive |  Christchurch, New Zealand
>>+64-21-2013317  Mobile
>>raymond_wilson@trimble.com
>>         
>> 
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>+64-21-2013317  Mobile
>raymond_wilson@trimble.com
>         
>

Re: Re[4]: Questions related to check pointing

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

I think it's a sensible explanation.

Regards,
-- 
Ilya Kasnacheev


ср, 6 янв. 2021 г. в 14:32, Raymond Wilson <ra...@trimble.com>:

> I checked our code that creates the primary data region, and it does set
> the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in
> that region.
>
> The secondary data region is much smaller, and is set to min/max = 128 Mb
> of memory.
>
> The checkpoints with the "too many dirty pages" reason were quoting less
> than 100,000 dirty pages, so this must have been triggered on the size of
> the smaller data region.
>
> Both these data regions have persistence, and I think this may have been a
> sub-optimal way to set it up. My aim was to provide a dedicated channel for
> inbound data arriving to be queued that was not impacted by updates due to
> processing of that data. I think it may be better to will change this
> arrangement to use a single data region to make the checkpointing process
> simpler and reduce cases where it decides there are too many dirty pages.
>
> On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I guess it's pool.pages() * 3L / 4
>> Since, counter intuitively, the default ThrottlingPolicy is not
>> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.
>>
>> Regards,
>>
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <ra...@trimble.com>:
>>
>>> Regards this section of code:
>>>
>>>             maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>>>                 ? pool.pages() * 3L / 4
>>>                 : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>
>>> I think the correct ratio will be 2/3 of pages as we do not have a
>>> throttling policy defined, correct?.
>>>
>>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <ar...@mail.ru>
>>> wrote:
>>>
>>>> Correct code is running from here:
>>>>
>>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
>>>>     break;else {
>>>>     CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
>>>>
>>>> and near you can see that :
>>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED    ? pool.pages() * 3L / 4    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>>
>>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>>>> cp.
>>>>
>>>>
>>>> In (
>>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>>>> there is a mention of a dirty pages limit that is a factor that can trigger
>>>> check points.
>>>>
>>>> I also found this issue:
>>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>>
>>>> After reviewing our logs I found this: (one example)
>>>>
>>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer]
>>>> Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>>>> pages']
>>>>
>>>> Which suggests we may have the issue where writes are frozen until the
>>>> check point is completed.
>>>>
>>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>>>> appears to be 0.1 (10%), via this entry
>>>> in GridCacheDatabaseSharedManager.java:
>>>>
>>>>     /**
>>>>      * Threshold to calculate limit for pages list on-heap caches.
>>>>      * <p>
>>>>
>>>>      * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
>>>>
>>>>      * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
>>>>
>>>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
>>>>
>>>>      * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
>>>>
>>>>      * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
>>>>
>>>>      * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>>>
>>>>      * more than 2 pages). Also some amount of page memory needed to store page list metadata.
>>>>      */
>>>>     private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>>>>
>>>> This raises two questions:
>>>>
>>>> 1. The data region where most writes are occurring has 4Gb allocated to
>>>> it, though it is permitted to start at a much lower level. 4Gb should be
>>>> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>>>
>>>> The 'limit holder' is calculated like this:
>>>>
>>>>     /**
>>>>      * @return Holder for page list cache limit for given data region.
>>>>      */
>>>>     public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>>>>         if (dataRegion.config().isPersistenceEnabled()) {
>>>>             return pageListCacheLimits.computeIfAbsent(dataRegion.
>>>> config().getName(), name -> new AtomicLong(
>>>>                 (long)(((PageMemoryEx)dataRegion.pageMemory()).
>>>> totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>>>         }
>>>>
>>>>         return null;
>>>>     }
>>>>
>>>> ... but I am unsure if totalPages() is referring to the current size of
>>>> the data region, or the size it is permitted to grow to. ie: Could the
>>>> 'dirty page limit' be a sliding limit based on the growth of the data
>>>> region? Is it better to set the initial and maximum sizes of data regions
>>>> to be the same number?
>>>>
>>>> 2. We have two data regions, one supporting inbound arrival of data
>>>> (with low numbers of writes), and one supporting storage of processed
>>>> results from the arriving data (with many more writes).
>>>>
>>>> The block on writes due to the number of dirty pages appears to affect
>>>> all data regions, not just the one which has violated the dirty page limit.
>>>> Is that correct? If so, is this something that can be improved?
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>>
>>>> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <
>>>> raymond_wilson@trimble.com
>>>> <//...@trimble.com>>
>>>> wrote:
>>>>
>>>> I'm working on getting automatic JVM thread stack dumping occurring if
>>>> we detect long delays in put (PutIfAbsent) operations. Hopefully this will
>>>> provide more information.
>>>>
>>>> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@mail.ru
>>>> <//...@mail.ru>> wrote:
>>>>
>>>>
>>>> Don`t think so, checkpointing work perfectly well already before this
>>>> fix.
>>>> Need additional info for start digging your problem, can you share
>>>> ignite logs somewhere?
>>>>
>>>>
>>>>
>>>> I noticed an entry in the Ignite 2.9.1 changelog:
>>>>
>>>>    - Improved checkpoint concurrent behaviour
>>>>
>>>> I am having trouble finding the relevant Jira ticket for this in the
>>>> 2.9.1 Jira area at
>>>> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>>
>>>> Perhaps this change may improve the checkpointing issue we are seeing?
>>>>
>>>> Raymond.
>>>>
>>>>
>>>> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <
>>>> raymond_wilson@trimble.com
>>>> <ht...@trimble.com>>
>>>> wrote:
>>>>
>>>> Hi Zhenya,
>>>>
>>>> 1. We currently use AWS EFS for primary storage, with provisioned IOPS
>>>> to provide sufficient IO. Our Ignite cluster currently tops out at ~10%
>>>> usage (with at least 5 nodes writing to it, including WAL and WAL archive),
>>>> so we are not saturating the EFS interface. We use the default page size
>>>> (experiments with larger page sizes showed instability when checkpointing
>>>> due to free page starvation, so we reverted to the default size).
>>>>
>>>> 2. Thanks for the detail, we will look for that in thread dumps when we
>>>> can create them.
>>>>
>>>> 3. We are using the default CP buffer size, which is max(256Mb,
>>>> DataRagionSize / 4) according to the Ignite documentation, so this should
>>>> have more than enough checkpoint buffer space to cope with writes. As
>>>> additional information, the cache which is displaying very slow writes is
>>>> in a data region with relatively slow write traffic. There is a primary
>>>> (default) data region with large write traffic, and the vast majority of
>>>> pages being written in a checkpoint will be for that default data region.
>>>>
>>>> 4. Yes, this is very surprising. Anecdotally from our logs it appears
>>>> write traffic into the low write traffic cache is blocked during
>>>> checkpoints.
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@mail.ru
>>>> <ht...@mail.ru>> wrote:
>>>>
>>>>
>>>>    1. Additionally to Ilya reply you can check vendors page for
>>>>    additional info, all in this page are applicable for ignite too [1].
>>>>    Increasing threads number leads to concurrent io usage, thus if your have
>>>>    something like nvme — it`s up to you but in case of sas possibly better
>>>>    would be to reduce this param.
>>>>    2. Log will shows you something like :
>>>>
>>>>    Parking thread=%Thread name% for timeout(ms)= %time%
>>>>
>>>>    and appropriate :
>>>>
>>>>    Unparking thread=
>>>>
>>>>    3. No additional looging with cp buffer usage are provided. cp
>>>>    buffer need to be more than 10% of overall persistent  DataRegions size.
>>>>    4. 90 seconds or longer —  Seems like problems in io or system
>>>>    tuning, it`s very bad score i hope.
>>>>
>>>> [1]
>>>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> We have been investigating some issues which appear to be related to
>>>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>
>>>> I have been trying to gain clarity on how certain aspects of the Ignite
>>>> configuration relate to the checkpointing process:
>>>>
>>>> 1. Number of check pointing threads. This defaults to 4, but I don't
>>>> understand how it applies to the checkpointing process. Are more threads
>>>> generally better (eg: because it makes the disk IO parallel across the
>>>> threads), or does it only have a positive effect if you have many data
>>>> storage regions? Or something else? If this could be clarified in the
>>>> documentation (or a pointer to it which Google has not yet found), that
>>>> would be good.
>>>>
>>>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was
>>>> thinking that reducing this time would result in smaller
>>>> less disruptive check points. Setting it to 60 seconds seems pretty
>>>> safe, but is there a practical lower limit that should be used for use
>>>> cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>>
>>>> 3. Write exclusivity constraints during checkpointing. I understand
>>>> that while a checkpoint is occurring ongoing writes will be supported into
>>>> the caches being check pointed, and if those are writes to existing pages
>>>> then those will be duplicated into the checkpoint buffer. If this buffer
>>>> becomes full or stressed then Ignite will throttle, and perhaps block,
>>>> writes until the checkpoint is complete. If this is the case then Ignite
>>>> will emit logging (warning or informational?) that writes are being
>>>> throttled.
>>>>
>>>> We have cases where simple puts to caches (a few requests per second)
>>>> are taking up to 90 seconds to execute when there is an active check point
>>>> occurring, where the check point has been triggered by the checkpoint
>>>> timer. When a checkpoint is not occurring the time to do this is usually in
>>>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>>>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>>>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>>>> pages at the standard 4kb page size), and one small region with 128Mb.
>>>> There is no 'throttling' logging being emitted that we can tell, so the
>>>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>>>> for the second smaller region in this case) does not look like it can fill
>>>> up during the checkpoint.
>>>>
>>>> It seems like the checkpoint is affecting the put operations, but I
>>>> don't understand why that may be given the documented checkpointing
>>>> process, and the checkpoint itself (at least via Informational logging) is
>>>> not advertising any restrictions.
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wilson@trimble.com
>>>> <ht...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wilson@trimble.com
>>>> <ht...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wilson@trimble.com
>>>> <//...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wilson@trimble.com
>>>> <//...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wilson@trimble.com
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wilson@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>

Re: Re[4]: Questions related to check pointing

Posted by Raymond Wilson <ra...@trimble.com>.

I checked our code that creates the primary data region, and it does set
the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in
that region.

The secondary data region is much smaller, and is set to min/max = 128 Mb
of memory.

The checkpoints with the "too many dirty pages" reason were quoting less
than 100,000 dirty pages, so this must have been triggered on the size of
the smaller data region.

Both these data regions have persistence, and I think this may have been a
sub-optimal way to set it up. My aim was to provide a dedicated channel for
inbound data arriving to be queued that was not impacted by updates due to
processing of that data. I think it may be better to will change this
arrangement to use a single data region to make the checkpointing process
simpler and reduce cases where it decides there are too many dirty pages.

On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> I guess it's pool.pages() * 3L / 4
> Since, counter intuitively, the default ThrottlingPolicy is not
> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.
>
> Regards,
>
> --
> Ilya Kasnacheev
>
>
> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <ra...@trimble.com>:
>
>> Regards this section of code:
>>
>>             maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>>                 ? pool.pages() * 3L / 4
>>                 : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>
>> I think the correct ratio will be 2/3 of pages as we do not have a
>> throttling policy defined, correct?.
>>
>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <ar...@mail.ru>
>> wrote:
>>
>>> Correct code is running from here:
>>>
>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
>>>     break;else {
>>>     CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
>>>
>>> and near you can see that :
>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED    ? pool.pages() * 3L / 4    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>
>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>>> cp.
>>>
>>>
>>> In (
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>>> there is a mention of a dirty pages limit that is a factor that can trigger
>>> check points.
>>>
>>> I also found this issue:
>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>
>>> After reviewing our logs I found this: (one example)
>>>
>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
>>> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>>> pages']
>>>
>>> Which suggests we may have the issue where writes are frozen until the
>>> check point is completed.
>>>
>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>>> appears to be 0.1 (10%), via this entry
>>> in GridCacheDatabaseSharedManager.java:
>>>
>>>     /**
>>>      * Threshold to calculate limit for pages list on-heap caches.
>>>      * <p>
>>>
>>>      * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
>>>
>>>      * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
>>>
>>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
>>>
>>>      * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
>>>
>>>      * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
>>>
>>>      * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>>
>>>      * more than 2 pages). Also some amount of page memory needed to store page list metadata.
>>>      */
>>>     private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>>>
>>> This raises two questions:
>>>
>>> 1. The data region where most writes are occurring has 4Gb allocated to
>>> it, though it is permitted to start at a much lower level. 4Gb should be
>>> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>>
>>> The 'limit holder' is calculated like this:
>>>
>>>     /**
>>>      * @return Holder for page list cache limit for given data region.
>>>      */
>>>     public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>>>         if (dataRegion.config().isPersistenceEnabled()) {
>>>             return pageListCacheLimits.computeIfAbsent(dataRegion.config
>>> ().getName(), name -> new AtomicLong(
>>>                 (long)(((PageMemoryEx)dataRegion.pageMemory()).
>>> totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>>         }
>>>
>>>         return null;
>>>     }
>>>
>>> ... but I am unsure if totalPages() is referring to the current size of
>>> the data region, or the size it is permitted to grow to. ie: Could the
>>> 'dirty page limit' be a sliding limit based on the growth of the data
>>> region? Is it better to set the initial and maximum sizes of data regions
>>> to be the same number?
>>>
>>> 2. We have two data regions, one supporting inbound arrival of data
>>> (with low numbers of writes), and one supporting storage of processed
>>> results from the arriving data (with many more writes).
>>>
>>> The block on writes due to the number of dirty pages appears to affect
>>> all data regions, not just the one which has violated the dirty page limit.
>>> Is that correct? If so, is this something that can be improved?
>>>
>>> Thanks,
>>> Raymond.
>>>
>>>
>>> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <
>>> raymond_wilson@trimble.com
>>> <//...@trimble.com>>
>>> wrote:
>>>
>>> I'm working on getting automatic JVM thread stack dumping occurring if
>>> we detect long delays in put (PutIfAbsent) operations. Hopefully this will
>>> provide more information.
>>>
>>> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@mail.ru
>>> <//...@mail.ru>> wrote:
>>>
>>>
>>> Don`t think so, checkpointing work perfectly well already before this
>>> fix.
>>> Need additional info for start digging your problem, can you share
>>> ignite logs somewhere?
>>>
>>>
>>>
>>> I noticed an entry in the Ignite 2.9.1 changelog:
>>>
>>>    - Improved checkpoint concurrent behaviour
>>>
>>> I am having trouble finding the relevant Jira ticket for this in the
>>> 2.9.1 Jira area at
>>> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>
>>> Perhaps this change may improve the checkpointing issue we are seeing?
>>>
>>> Raymond.
>>>
>>>
>>> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <
>>> raymond_wilson@trimble.com
>>> <ht...@trimble.com>>
>>> wrote:
>>>
>>> Hi Zhenya,
>>>
>>> 1. We currently use AWS EFS for primary storage, with provisioned IOPS
>>> to provide sufficient IO. Our Ignite cluster currently tops out at ~10%
>>> usage (with at least 5 nodes writing to it, including WAL and WAL archive),
>>> so we are not saturating the EFS interface. We use the default page size
>>> (experiments with larger page sizes showed instability when checkpointing
>>> due to free page starvation, so we reverted to the default size).
>>>
>>> 2. Thanks for the detail, we will look for that in thread dumps when we
>>> can create them.
>>>
>>> 3. We are using the default CP buffer size, which is max(256Mb,
>>> DataRagionSize / 4) according to the Ignite documentation, so this should
>>> have more than enough checkpoint buffer space to cope with writes. As
>>> additional information, the cache which is displaying very slow writes is
>>> in a data region with relatively slow write traffic. There is a primary
>>> (default) data region with large write traffic, and the vast majority of
>>> pages being written in a checkpoint will be for that default data region.
>>>
>>> 4. Yes, this is very surprising. Anecdotally from our logs it appears
>>> write traffic into the low write traffic cache is blocked during
>>> checkpoints.
>>>
>>> Thanks,
>>> Raymond.
>>>
>>>
>>>
>>> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@mail.ru
>>> <ht...@mail.ru>> wrote:
>>>
>>>
>>>    1. Additionally to Ilya reply you can check vendors page for
>>>    additional info, all in this page are applicable for ignite too [1].
>>>    Increasing threads number leads to concurrent io usage, thus if your have
>>>    something like nvme — it`s up to you but in case of sas possibly better
>>>    would be to reduce this param.
>>>    2. Log will shows you something like :
>>>
>>>    Parking thread=%Thread name% for timeout(ms)= %time%
>>>
>>>    and appropriate :
>>>
>>>    Unparking thread=
>>>
>>>    3. No additional looging with cp buffer usage are provided. cp
>>>    buffer need to be more than 10% of overall persistent  DataRegions size.
>>>    4. 90 seconds or longer —  Seems like problems in io or system
>>>    tuning, it`s very bad score i hope.
>>>
>>> [1]
>>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>
>>>
>>>
>>>
>>>
>>> Hi,
>>>
>>> We have been investigating some issues which appear to be related to
>>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>
>>> I have been trying to gain clarity on how certain aspects of the Ignite
>>> configuration relate to the checkpointing process:
>>>
>>> 1. Number of check pointing threads. This defaults to 4, but I don't
>>> understand how it applies to the checkpointing process. Are more threads
>>> generally better (eg: because it makes the disk IO parallel across the
>>> threads), or does it only have a positive effect if you have many data
>>> storage regions? Or something else? If this could be clarified in the
>>> documentation (or a pointer to it which Google has not yet found), that
>>> would be good.
>>>
>>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was
>>> thinking that reducing this time would result in smaller
>>> less disruptive check points. Setting it to 60 seconds seems pretty
>>> safe, but is there a practical lower limit that should be used for use
>>> cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>
>>> 3. Write exclusivity constraints during checkpointing. I understand that
>>> while a checkpoint is occurring ongoing writes will be supported into the
>>> caches being check pointed, and if those are writes to existing pages then
>>> those will be duplicated into the checkpoint buffer. If this buffer becomes
>>> full or stressed then Ignite will throttle, and perhaps block, writes until
>>> the checkpoint is complete. If this is the case then Ignite will emit
>>> logging (warning or informational?) that writes are being throttled.
>>>
>>> We have cases where simple puts to caches (a few requests per second)
>>> are taking up to 90 seconds to execute when there is an active check point
>>> occurring, where the check point has been triggered by the checkpoint
>>> timer. When a checkpoint is not occurring the time to do this is usually in
>>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>>> pages at the standard 4kb page size), and one small region with 128Mb.
>>> There is no 'throttling' logging being emitted that we can tell, so the
>>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>>> for the second smaller region in this case) does not look like it can fill
>>> up during the checkpoint.
>>>
>>> It seems like the checkpoint is affecting the put operations, but I
>>> don't understand why that may be given the documented checkpointing
>>> process, and the checkpoint itself (at least via Informational logging) is
>>> not advertising any restrictions.
>>>
>>> Thanks,
>>> Raymond.
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wilson@trimble.com
>>> <ht...@trimble.com>
>>>
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wilson@trimble.com
>>> <ht...@trimble.com>
>>>
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wilson@trimble.com
>>> <//...@trimble.com>
>>>
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wilson@trimble.com
>>> <//...@trimble.com>
>>>
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> +64-21-2013317 Mobile
>> raymond_wilson@trimble.com
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>

-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wilson@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Re: Re[4]: Questions related to check pointing

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

I guess it's pool.pages() * 3L / 4
Since, counter intuitively, the default ThrottlingPolicy is not
ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.

Regards,

-- 
Ilya Kasnacheev


чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <ra...@trimble.com>:

> Regards this section of code:
>
>             maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>                 ? pool.pages() * 3L / 4
>                 : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>
> I think the correct ratio will be 2/3 of pages as we do not have a
> throttling policy defined, correct?.
>
> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <ar...@mail.ru>
> wrote:
>
>> Correct code is running from here:
>>
>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
>>     break;else {
>>     CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
>>
>> and near you can see that :
>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED    ? pool.pages() * 3L / 4    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>
>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>> cp.
>>
>>
>> In (
>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>> there is a mention of a dirty pages limit that is a factor that can trigger
>> check points.
>>
>> I also found this issue:
>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>
>> After reviewing our logs I found this: (one example)
>>
>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
>> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>> pages']
>>
>> Which suggests we may have the issue where writes are frozen until the
>> check point is completed.
>>
>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>> appears to be 0.1 (10%), via this entry
>> in GridCacheDatabaseSharedManager.java:
>>
>>     /**
>>      * Threshold to calculate limit for pages list on-heap caches.
>>      * <p>
>>
>>      * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
>>
>>      * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
>>
>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
>>
>>      * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
>>
>>      * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
>>
>>      * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>
>>      * more than 2 pages). Also some amount of page memory needed to store page list metadata.
>>      */
>>     private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>>
>> This raises two questions:
>>
>> 1. The data region where most writes are occurring has 4Gb allocated to
>> it, though it is permitted to start at a much lower level. 4Gb should be
>> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>
>> The 'limit holder' is calculated like this:
>>
>>     /**
>>      * @return Holder for page list cache limit for given data region.
>>      */
>>     public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>>         if (dataRegion.config().isPersistenceEnabled()) {
>>             return pageListCacheLimits.computeIfAbsent(dataRegion.config
>> ().getName(), name -> new AtomicLong(
>>                 (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
>> () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>         }
>>
>>         return null;
>>     }
>>
>> ... but I am unsure if totalPages() is referring to the current size of
>> the data region, or the size it is permitted to grow to. ie: Could the
>> 'dirty page limit' be a sliding limit based on the growth of the data
>> region? Is it better to set the initial and maximum sizes of data regions
>> to be the same number?
>>
>> 2. We have two data regions, one supporting inbound arrival of data (with
>> low numbers of writes), and one supporting storage of processed results
>> from the arriving data (with many more writes).
>>
>> The block on writes due to the number of dirty pages appears to affect
>> all data regions, not just the one which has violated the dirty page limit.
>> Is that correct? If so, is this something that can be improved?
>>
>> Thanks,
>> Raymond.
>>
>>
>> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <
>> raymond_wilson@trimble.com
>> <//...@trimble.com>> wrote:
>>
>> I'm working on getting automatic JVM thread stack dumping occurring if we
>> detect long delays in put (PutIfAbsent) operations. Hopefully this will
>> provide more information.
>>
>> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@mail.ru
>> <//...@mail.ru>> wrote:
>>
>>
>> Don`t think so, checkpointing work perfectly well already before this fix.
>> Need additional info for start digging your problem, can you share ignite
>> logs somewhere?
>>
>>
>>
>> I noticed an entry in the Ignite 2.9.1 changelog:
>>
>>    - Improved checkpoint concurrent behaviour
>>
>> I am having trouble finding the relevant Jira ticket for this in the
>> 2.9.1 Jira area at
>> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>
>> Perhaps this change may improve the checkpointing issue we are seeing?
>>
>> Raymond.
>>
>>
>> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <
>> raymond_wilson@trimble.com
>> <ht...@trimble.com>>
>> wrote:
>>
>> Hi Zhenya,
>>
>> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
>> provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
>> (with at least 5 nodes writing to it, including WAL and WAL archive), so we
>> are not saturating the EFS interface. We use the default page size
>> (experiments with larger page sizes showed instability when checkpointing
>> due to free page starvation, so we reverted to the default size).
>>
>> 2. Thanks for the detail, we will look for that in thread dumps when we
>> can create them.
>>
>> 3. We are using the default CP buffer size, which is max(256Mb,
>> DataRagionSize / 4) according to the Ignite documentation, so this should
>> have more than enough checkpoint buffer space to cope with writes. As
>> additional information, the cache which is displaying very slow writes is
>> in a data region with relatively slow write traffic. There is a primary
>> (default) data region with large write traffic, and the vast majority of
>> pages being written in a checkpoint will be for that default data region.
>>
>> 4. Yes, this is very surprising. Anecdotally from our logs it appears
>> write traffic into the low write traffic cache is blocked during
>> checkpoints.
>>
>> Thanks,
>> Raymond.
>>
>>
>>
>> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@mail.ru
>> <ht...@mail.ru>> wrote:
>>
>>
>>    1. Additionally to Ilya reply you can check vendors page for
>>    additional info, all in this page are applicable for ignite too [1].
>>    Increasing threads number leads to concurrent io usage, thus if your have
>>    something like nvme — it`s up to you but in case of sas possibly better
>>    would be to reduce this param.
>>    2. Log will shows you something like :
>>
>>    Parking thread=%Thread name% for timeout(ms)= %time%
>>
>>    and appropriate :
>>
>>    Unparking thread=
>>
>>    3. No additional looging with cp buffer usage are provided. cp buffer
>>    need to be more than 10% of overall persistent  DataRegions size.
>>    4. 90 seconds or longer —  Seems like problems in io or system
>>    tuning, it`s very bad score i hope.
>>
>> [1]
>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>
>>
>>
>>
>>
>> Hi,
>>
>> We have been investigating some issues which appear to be related to
>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>
>> I have been trying to gain clarity on how certain aspects of the Ignite
>> configuration relate to the checkpointing process:
>>
>> 1. Number of check pointing threads. This defaults to 4, but I don't
>> understand how it applies to the checkpointing process. Are more threads
>> generally better (eg: because it makes the disk IO parallel across the
>> threads), or does it only have a positive effect if you have many data
>> storage regions? Or something else? If this could be clarified in the
>> documentation (or a pointer to it which Google has not yet found), that
>> would be good.
>>
>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
>> that reducing this time would result in smaller less disruptive check
>> points. Setting it to 60 seconds seems pretty safe, but is there a
>> practical lower limit that should be used for use cases with new data
>> constantly being added, eg: 5 seconds, 10 seconds?
>>
>> 3. Write exclusivity constraints during checkpointing. I understand that
>> while a checkpoint is occurring ongoing writes will be supported into the
>> caches being check pointed, and if those are writes to existing pages then
>> those will be duplicated into the checkpoint buffer. If this buffer becomes
>> full or stressed then Ignite will throttle, and perhaps block, writes until
>> the checkpoint is complete. If this is the case then Ignite will emit
>> logging (warning or informational?) that writes are being throttled.
>>
>> We have cases where simple puts to caches (a few requests per second) are
>> taking up to 90 seconds to execute when there is an active check point
>> occurring, where the check point has been triggered by the checkpoint
>> timer. When a checkpoint is not occurring the time to do this is usually in
>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>> pages at the standard 4kb page size), and one small region with 128Mb.
>> There is no 'throttling' logging being emitted that we can tell, so the
>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>> for the second smaller region in this case) does not look like it can fill
>> up during the checkpoint.
>>
>> It seems like the checkpoint is affecting the put operations, but I don't
>> understand why that may be given the documented checkpointing process, and
>> the checkpoint itself (at least via Informational logging) is not
>> advertising any restrictions.
>>
>> Thanks,
>> Raymond.
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> +64-21-2013317 Mobile
>> raymond_wilson@trimble.com
>> <ht...@trimble.com>
>>
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> +64-21-2013317 Mobile
>> raymond_wilson@trimble.com
>> <ht...@trimble.com>
>>
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> +64-21-2013317 Mobile
>> raymond_wilson@trimble.com
>> <//...@trimble.com>
>>
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Solution Architect, Civil Construction Software Systems (CCSS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> +64-21-2013317 Mobile
>> raymond_wilson@trimble.com
>> <//...@trimble.com>
>>
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>>
>>
>>
>>
>>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wilson@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>

Re: Re[4]: Questions related to check pointing

Posted by Raymond Wilson <ra...@trimble.com>.

Regards this section of code:

            maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
                ? pool.pages() * 3L / 4
                : Math.min(pool.pages() * 2L / 3, cpPoolPages);

I think the correct ratio will be 2/3 of pages as we do not have a
throttling policy defined, correct?.

On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <ar...@mail.ru>
wrote:

> Correct code is running from here:
>
> if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
>     break;else {
>     CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
>
> and near you can see that :
> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED    ? pool.pages() * 3L / 4    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>
> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
>
>
> In (
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
> there is a mention of a dirty pages limit that is a factor that can trigger
> check points.
>
> I also found this issue:
> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
>
> After reviewing our logs I found this: (one example)
>
> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
> pages']
>
> Which suggests we may have the issue where writes are frozen until the
> check point is completed.
>
> Looking at the AI 2.8.1 source code, the dirty page limit fraction appears
> to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>
>     /**
>      * Threshold to calculate limit for pages list on-heap caches.
>      * <p>
>
>      * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
>
>      * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
>
> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
>
>      * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
>
>      * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
>
>      * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>
>      * more than 2 pages). Also some amount of page memory needed to store page list metadata.
>      */
>     private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>
> This raises two questions:
>
> 1. The data region where most writes are occurring has 4Gb allocated to
> it, though it is permitted to start at a much lower level. 4Gb should be
> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>
> The 'limit holder' is calculated like this:
>
>     /**
>      * @return Holder for page list cache limit for given data region.
>      */
>     public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>         if (dataRegion.config().isPersistenceEnabled()) {
>             return pageListCacheLimits.computeIfAbsent(dataRegion.config
> ().getName(), name -> new AtomicLong(
>                 (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
> () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>         }
>
>         return null;
>     }
>
> ... but I am unsure if totalPages() is referring to the current size of
> the data region, or the size it is permitted to grow to. ie: Could the
> 'dirty page limit' be a sliding limit based on the growth of the data
> region? Is it better to set the initial and maximum sizes of data regions
> to be the same number?
>
> 2. We have two data regions, one supporting inbound arrival of data (with
> low numbers of writes), and one supporting storage of processed results
> from the arriving data (with many more writes).
>
> The block on writes due to the number of dirty pages appears to affect all
> data regions, not just the one which has violated the dirty page limit. Is
> that correct? If so, is this something that can be improved?
>
> Thanks,
> Raymond.
>
>
> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@trimble.com
> <//...@trimble.com>> wrote:
>
> I'm working on getting automatic JVM thread stack dumping occurring if we
> detect long delays in put (PutIfAbsent) operations. Hopefully this will
> provide more information.
>
> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@mail.ru
> <//...@mail.ru>> wrote:
>
>
> Don`t think so, checkpointing work perfectly well already before this fix.
> Need additional info for start digging your problem, can you share ignite
> logs somewhere?
>
>
>
> I noticed an entry in the Ignite 2.9.1 changelog:
>
>    - Improved checkpoint concurrent behaviour
>
> I am having trouble finding the relevant Jira ticket for this in the 2.9.1
> Jira area at
> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>
> Perhaps this change may improve the checkpointing issue we are seeing?
>
> Raymond.
>
>
> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@trimble.com
> <ht...@trimble.com>>
> wrote:
>
> Hi Zhenya,
>
> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
> provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
> (with at least 5 nodes writing to it, including WAL and WAL archive), so we
> are not saturating the EFS interface. We use the default page size
> (experiments with larger page sizes showed instability when checkpointing
> due to free page starvation, so we reverted to the default size).
>
> 2. Thanks for the detail, we will look for that in thread dumps when we
> can create them.
>
> 3. We are using the default CP buffer size, which is max(256Mb,
> DataRagionSize / 4) according to the Ignite documentation, so this should
> have more than enough checkpoint buffer space to cope with writes. As
> additional information, the cache which is displaying very slow writes is
> in a data region with relatively slow write traffic. There is a primary
> (default) data region with large write traffic, and the vast majority of
> pages being written in a checkpoint will be for that default data region.
>
> 4. Yes, this is very surprising. Anecdotally from our logs it appears
> write traffic into the low write traffic cache is blocked during
> checkpoints.
>
> Thanks,
> Raymond.
>
>
>
> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@mail.ru
> <ht...@mail.ru>> wrote:
>
>
>    1. Additionally to Ilya reply you can check vendors page for
>    additional info, all in this page are applicable for ignite too [1].
>    Increasing threads number leads to concurrent io usage, thus if your have
>    something like nvme — it`s up to you but in case of sas possibly better
>    would be to reduce this param.
>    2. Log will shows you something like :
>
>    Parking thread=%Thread name% for timeout(ms)= %time%
>
>    and appropriate :
>
>    Unparking thread=
>
>    3. No additional looging with cp buffer usage are provided. cp buffer
>    need to be more than 10% of overall persistent  DataRegions size.
>    4. 90 seconds or longer —  Seems like problems in io or system tuning,
>    it`s very bad score i hope.
>
> [1]
> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>
>
>
>
>
> Hi,
>
> We have been investigating some issues which appear to be related to
> checkpointing. We currently use the IA 2.8.1 with the C# client.
>
> I have been trying to gain clarity on how certain aspects of the Ignite
> configuration relate to the checkpointing process:
>
> 1. Number of check pointing threads. This defaults to 4, but I don't
> understand how it applies to the checkpointing process. Are more threads
> generally better (eg: because it makes the disk IO parallel across the
> threads), or does it only have a positive effect if you have many data
> storage regions? Or something else? If this could be clarified in the
> documentation (or a pointer to it which Google has not yet found), that
> would be good.
>
> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
> that reducing this time would result in smaller less disruptive check
> points. Setting it to 60 seconds seems pretty safe, but is there a
> practical lower limit that should be used for use cases with new data
> constantly being added, eg: 5 seconds, 10 seconds?
>
> 3. Write exclusivity constraints during checkpointing. I understand that
> while a checkpoint is occurring ongoing writes will be supported into the
> caches being check pointed, and if those are writes to existing pages then
> those will be duplicated into the checkpoint buffer. If this buffer becomes
> full or stressed then Ignite will throttle, and perhaps block, writes until
> the checkpoint is complete. If this is the case then Ignite will emit
> logging (warning or informational?) that writes are being throttled.
>
> We have cases where simple puts to caches (a few requests per second) are
> taking up to 90 seconds to execute when there is an active check point
> occurring, where the check point has been triggered by the checkpoint
> timer. When a checkpoint is not occurring the time to do this is usually in
> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
> and are updating up to 30,000-40,000 pages, across a pair of data storage
> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
> pages at the standard 4kb page size), and one small region with 128Mb.
> There is no 'throttling' logging being emitted that we can tell, so the
> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
> for the second smaller region in this case) does not look like it can fill
> up during the checkpoint.
>
> It seems like the checkpoint is affecting the put operations, but I don't
> understand why that may be given the documented checkpointing process, and
> the checkpoint itself (at least via Informational logging) is not
> advertising any restrictions.
>
> Thanks,
> Raymond.
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wilson@trimble.com
> <ht...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wilson@trimble.com
> <ht...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wilson@trimble.com
> <//...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wilson@trimble.com
> <//...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>