You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Raymond Wilson <ra...@trimble.com> on 2020/04/08 11:22:53 UTC

Re: Out of memory with eviction failure on persisted cache

Evgenii,

Have you had a chance to look into the reproducer?

Thanks,
Raymond.

On Fri, Mar 6, 2020 at 2:51 PM Raymond Wilson <ra...@trimble.com>
wrote:

> Evgenii,
>
> I have created a reproducer that triggers the error with the buffer size
> set to 64Mb. The program.cs/csproj and log for the run that triggered the
> error are attached.
>
> Thanks,
> Raymond.
>
>
>
> On Fri, Mar 6, 2020 at 1:08 PM Raymond Wilson <ra...@trimble.com>
> wrote:
>
>> The reproducer is my development system, which is hard to share.
>>
>> I have increased the size of the buffer to 256Mb, and it copes with the
>> example data load, though I have not tried larger data sets.
>>
>> From an analytical perspective, is this an error that is possible or
>> expected to occur when using a cache with a persistent data region defined?
>>
>> I'll see if I can make a small reproducer.
>>
>> On Fri, Mar 6, 2020 at 11:34 AM Evgenii Zhuravlev <
>> e.zhuravlev.wk@gmail.com> wrote:
>>
>>> Hi Raymond,
>>>
>>> I tried to reproduce it, but without success. Can you share the
>>> reproducer?
>>>
>>> Also, have you tried to load much more data with 256mb data region? I
>>> think it should work without issues.
>>>
>>> Thanks,
>>> Evgenii
>>>
>>> ср, 4 мар. 2020 г. в 16:14, Raymond Wilson <ra...@trimble.com>:
>>>
>>>> Hi Evgenii,
>>>>
>>>> I am individually Put()ing the elements using PutIfAbsent(). Each
>>>> element can range 2kb-35Kb in size.
>>>>
>>>> Actually, the process that writes the data does not write the data
>>>> directly to the cache, it uses a compute function to send the payload to
>>>> the process that is doing the reading. The compute function applies
>>>> validation logic and uses PutIfAbsent() to write the data into the cache.
>>>>
>>>> Sorry for the confusion.
>>>>
>>>> Raymond.
>>>>
>>>>
>>>> On Thu, Mar 5, 2020 at 1:09 PM Evgenii Zhuravlev <
>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> How are you loading the data? Do you use putAll or DataStreamer?
>>>>>
>>>>> Evgenii
>>>>>
>>>>> ср, 4 мар. 2020 г. в 15:37, Raymond Wilson <raymond_wilson@trimble.com
>>>>> >:
>>>>>
>>>>>> To add some further detail:
>>>>>>
>>>>>> There are two processes interacting with the cache. One process is
>>>>>> writing
>>>>>> data into the cache, while the second process is extracting data from
>>>>>> the
>>>>>> cache using a continuous query. The process that is the reader of the
>>>>>> data
>>>>>> is throwing the exception.
>>>>>>
>>>>>> Increasing the cache size further to 256 Mb resolves the problem for
>>>>>> this
>>>>>> data set, however we have data sets more than 100 times this size
>>>>>> which we
>>>>>> will be processing.
>>>>>>
>>>>>> Thanks,
>>>>>> Raymond.
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 5, 2020 at 12:10 PM Raymond Wilson <
>>>>>> raymond_wilson@trimble.com>
>>>>>> wrote:
>>>>>>
>>>>>> > I've been having a sporadic issue with the Ignite 2.7.5 JVM halting
>>>>>> due to
>>>>>> > out of memory error related to a cache with persistence enabled
>>>>>> >
>>>>>> > I just upgraded to the C#.Net, Ignite 2.7.6 client to pick up
>>>>>> support for
>>>>>> > C# affinity functions and now have this issue appearing regularly
>>>>>> while
>>>>>> > adding around 400Mb of data into the cache which is configured to
>>>>>> have
>>>>>> > 128Mb of memory (this was 64Mb but I increased it to see if the
>>>>>> failure
>>>>>> > would resolve.
>>>>>> >
>>>>>> > The error I get is:
>>>>>> >
>>>>>> > 2020-03-05 11:58:57,568 [542] ERR [MutableCacheComputeServer] JVM
>>>>>> will be
>>>>>> > halted immediately due to the failure: [failureCtx=FailureContext
>>>>>> > [type=CRITICAL_ERROR, err=class
>>>>>> o.a.i.i.mem.IgniteOutOfMemoryException:
>>>>>> > Failed to find a page for eviction [segmentCapacity=1700,
>>>>>> loaded=676,
>>>>>> > maxDirtyPages=507, dirtyPages=675, cpPages=0, pinnedInSegment=2,
>>>>>> > failedToPrepare=675]
>>>>>> > Out of memory in data region [name=TAGFileBufferQueue,
>>>>>> initSize=128.0 MiB,
>>>>>> > maxSize=128.0 MiB, persistenceEnabled=true] Try the following:
>>>>>> >   ^-- Increase maximum off-heap memory size
>>>>>> > (DataRegionConfiguration.maxSize)
>>>>>> >   ^-- Enable Ignite persistence
>>>>>> > (DataRegionConfiguration.persistenceEnabled)
>>>>>> >   ^-- Enable eviction or expiration policies]]
>>>>>> >
>>>>>> > I'm not running an eviction policy as I thought this was not
>>>>>> required for
>>>>>> > caches with persistence enabled.
>>>>>> >
>>>>>> > I'm surprised by this behaviour as I expected the persistence
>>>>>> mechanism to
>>>>>> > handle it. The error relating to failure to find a page for eviction
>>>>>> > suggest the persistence mechanism has fallen behind. If this is the
>>>>>> case,
>>>>>> > this seems like an unfriendly failure mode.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Raymond.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>

Re: Out of memory with eviction failure on persisted cache

Posted by Evgenii Zhuravlev <e....@gmail.com>.
Raymond,

I've seen this behaviour before, it occurs on massive data loading to a
cluster with a small data region. It's not reproducible with data regions
with normal sizes, I think that this is the reason why this issue is not
fixed yet.

Best Regards,
Evgenii

ср, 8 апр. 2020 г. в 04:23, Raymond Wilson <ra...@trimble.com>:

> Evgenii,
>
> Have you had a chance to look into the reproducer?
>
> Thanks,
> Raymond.
>
> On Fri, Mar 6, 2020 at 2:51 PM Raymond Wilson <ra...@trimble.com>
> wrote:
>
>> Evgenii,
>>
>> I have created a reproducer that triggers the error with the buffer size
>> set to 64Mb. The program.cs/csproj and log for the run that triggered the
>> error are attached.
>>
>> Thanks,
>> Raymond.
>>
>>
>>
>> On Fri, Mar 6, 2020 at 1:08 PM Raymond Wilson <ra...@trimble.com>
>> wrote:
>>
>>> The reproducer is my development system, which is hard to share.
>>>
>>> I have increased the size of the buffer to 256Mb, and it copes with the
>>> example data load, though I have not tried larger data sets.
>>>
>>> From an analytical perspective, is this an error that is possible or
>>> expected to occur when using a cache with a persistent data region defined?
>>>
>>> I'll see if I can make a small reproducer.
>>>
>>> On Fri, Mar 6, 2020 at 11:34 AM Evgenii Zhuravlev <
>>> e.zhuravlev.wk@gmail.com> wrote:
>>>
>>>> Hi Raymond,
>>>>
>>>> I tried to reproduce it, but without success. Can you share the
>>>> reproducer?
>>>>
>>>> Also, have you tried to load much more data with 256mb data region? I
>>>> think it should work without issues.
>>>>
>>>> Thanks,
>>>> Evgenii
>>>>
>>>> ср, 4 мар. 2020 г. в 16:14, Raymond Wilson <raymond_wilson@trimble.com
>>>> >:
>>>>
>>>>> Hi Evgenii,
>>>>>
>>>>> I am individually Put()ing the elements using PutIfAbsent(). Each
>>>>> element can range 2kb-35Kb in size.
>>>>>
>>>>> Actually, the process that writes the data does not write the data
>>>>> directly to the cache, it uses a compute function to send the payload to
>>>>> the process that is doing the reading. The compute function applies
>>>>> validation logic and uses PutIfAbsent() to write the data into the cache.
>>>>>
>>>>> Sorry for the confusion.
>>>>>
>>>>> Raymond.
>>>>>
>>>>>
>>>>> On Thu, Mar 5, 2020 at 1:09 PM Evgenii Zhuravlev <
>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> How are you loading the data? Do you use putAll or DataStreamer?
>>>>>>
>>>>>> Evgenii
>>>>>>
>>>>>> ср, 4 мар. 2020 г. в 15:37, Raymond Wilson <
>>>>>> raymond_wilson@trimble.com>:
>>>>>>
>>>>>>> To add some further detail:
>>>>>>>
>>>>>>> There are two processes interacting with the cache. One process is
>>>>>>> writing
>>>>>>> data into the cache, while the second process is extracting data
>>>>>>> from the
>>>>>>> cache using a continuous query. The process that is the reader of
>>>>>>> the data
>>>>>>> is throwing the exception.
>>>>>>>
>>>>>>> Increasing the cache size further to 256 Mb resolves the problem for
>>>>>>> this
>>>>>>> data set, however we have data sets more than 100 times this size
>>>>>>> which we
>>>>>>> will be processing.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Raymond.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 5, 2020 at 12:10 PM Raymond Wilson <
>>>>>>> raymond_wilson@trimble.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > I've been having a sporadic issue with the Ignite 2.7.5 JVM
>>>>>>> halting due to
>>>>>>> > out of memory error related to a cache with persistence enabled
>>>>>>> >
>>>>>>> > I just upgraded to the C#.Net, Ignite 2.7.6 client to pick up
>>>>>>> support for
>>>>>>> > C# affinity functions and now have this issue appearing regularly
>>>>>>> while
>>>>>>> > adding around 400Mb of data into the cache which is configured to
>>>>>>> have
>>>>>>> > 128Mb of memory (this was 64Mb but I increased it to see if the
>>>>>>> failure
>>>>>>> > would resolve.
>>>>>>> >
>>>>>>> > The error I get is:
>>>>>>> >
>>>>>>> > 2020-03-05 11:58:57,568 [542] ERR [MutableCacheComputeServer] JVM
>>>>>>> will be
>>>>>>> > halted immediately due to the failure: [failureCtx=FailureContext
>>>>>>> > [type=CRITICAL_ERROR, err=class
>>>>>>> o.a.i.i.mem.IgniteOutOfMemoryException:
>>>>>>> > Failed to find a page for eviction [segmentCapacity=1700,
>>>>>>> loaded=676,
>>>>>>> > maxDirtyPages=507, dirtyPages=675, cpPages=0, pinnedInSegment=2,
>>>>>>> > failedToPrepare=675]
>>>>>>> > Out of memory in data region [name=TAGFileBufferQueue,
>>>>>>> initSize=128.0 MiB,
>>>>>>> > maxSize=128.0 MiB, persistenceEnabled=true] Try the following:
>>>>>>> >   ^-- Increase maximum off-heap memory size
>>>>>>> > (DataRegionConfiguration.maxSize)
>>>>>>> >   ^-- Enable Ignite persistence
>>>>>>> > (DataRegionConfiguration.persistenceEnabled)
>>>>>>> >   ^-- Enable eviction or expiration policies]]
>>>>>>> >
>>>>>>> > I'm not running an eviction policy as I thought this was not
>>>>>>> required for
>>>>>>> > caches with persistence enabled.
>>>>>>> >
>>>>>>> > I'm surprised by this behaviour as I expected the persistence
>>>>>>> mechanism to
>>>>>>> > handle it. The error relating to failure to find a page for
>>>>>>> eviction
>>>>>>> > suggest the persistence mechanism has fallen behind. If this is
>>>>>>> the case,
>>>>>>> > this seems like an unfriendly failure mode.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Raymond.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>

Re: Out of memory with eviction failure on persisted cache

Posted by Evgenii Zhuravlev <e....@gmail.com>.
Raymond,

I've seen this behaviour before, it occurs on massive data loading to a
cluster with a small data region. It's not reproducible with data regions
with normal sizes, I think that this is the reason why this issue is not
fixed yet.

Best Regards,
Evgenii

ср, 8 апр. 2020 г. в 04:23, Raymond Wilson <ra...@trimble.com>:

> Evgenii,
>
> Have you had a chance to look into the reproducer?
>
> Thanks,
> Raymond.
>
> On Fri, Mar 6, 2020 at 2:51 PM Raymond Wilson <ra...@trimble.com>
> wrote:
>
>> Evgenii,
>>
>> I have created a reproducer that triggers the error with the buffer size
>> set to 64Mb. The program.cs/csproj and log for the run that triggered the
>> error are attached.
>>
>> Thanks,
>> Raymond.
>>
>>
>>
>> On Fri, Mar 6, 2020 at 1:08 PM Raymond Wilson <ra...@trimble.com>
>> wrote:
>>
>>> The reproducer is my development system, which is hard to share.
>>>
>>> I have increased the size of the buffer to 256Mb, and it copes with the
>>> example data load, though I have not tried larger data sets.
>>>
>>> From an analytical perspective, is this an error that is possible or
>>> expected to occur when using a cache with a persistent data region defined?
>>>
>>> I'll see if I can make a small reproducer.
>>>
>>> On Fri, Mar 6, 2020 at 11:34 AM Evgenii Zhuravlev <
>>> e.zhuravlev.wk@gmail.com> wrote:
>>>
>>>> Hi Raymond,
>>>>
>>>> I tried to reproduce it, but without success. Can you share the
>>>> reproducer?
>>>>
>>>> Also, have you tried to load much more data with 256mb data region? I
>>>> think it should work without issues.
>>>>
>>>> Thanks,
>>>> Evgenii
>>>>
>>>> ср, 4 мар. 2020 г. в 16:14, Raymond Wilson <raymond_wilson@trimble.com
>>>> >:
>>>>
>>>>> Hi Evgenii,
>>>>>
>>>>> I am individually Put()ing the elements using PutIfAbsent(). Each
>>>>> element can range 2kb-35Kb in size.
>>>>>
>>>>> Actually, the process that writes the data does not write the data
>>>>> directly to the cache, it uses a compute function to send the payload to
>>>>> the process that is doing the reading. The compute function applies
>>>>> validation logic and uses PutIfAbsent() to write the data into the cache.
>>>>>
>>>>> Sorry for the confusion.
>>>>>
>>>>> Raymond.
>>>>>
>>>>>
>>>>> On Thu, Mar 5, 2020 at 1:09 PM Evgenii Zhuravlev <
>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> How are you loading the data? Do you use putAll or DataStreamer?
>>>>>>
>>>>>> Evgenii
>>>>>>
>>>>>> ср, 4 мар. 2020 г. в 15:37, Raymond Wilson <
>>>>>> raymond_wilson@trimble.com>:
>>>>>>
>>>>>>> To add some further detail:
>>>>>>>
>>>>>>> There are two processes interacting with the cache. One process is
>>>>>>> writing
>>>>>>> data into the cache, while the second process is extracting data
>>>>>>> from the
>>>>>>> cache using a continuous query. The process that is the reader of
>>>>>>> the data
>>>>>>> is throwing the exception.
>>>>>>>
>>>>>>> Increasing the cache size further to 256 Mb resolves the problem for
>>>>>>> this
>>>>>>> data set, however we have data sets more than 100 times this size
>>>>>>> which we
>>>>>>> will be processing.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Raymond.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 5, 2020 at 12:10 PM Raymond Wilson <
>>>>>>> raymond_wilson@trimble.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > I've been having a sporadic issue with the Ignite 2.7.5 JVM
>>>>>>> halting due to
>>>>>>> > out of memory error related to a cache with persistence enabled
>>>>>>> >
>>>>>>> > I just upgraded to the C#.Net, Ignite 2.7.6 client to pick up
>>>>>>> support for
>>>>>>> > C# affinity functions and now have this issue appearing regularly
>>>>>>> while
>>>>>>> > adding around 400Mb of data into the cache which is configured to
>>>>>>> have
>>>>>>> > 128Mb of memory (this was 64Mb but I increased it to see if the
>>>>>>> failure
>>>>>>> > would resolve.
>>>>>>> >
>>>>>>> > The error I get is:
>>>>>>> >
>>>>>>> > 2020-03-05 11:58:57,568 [542] ERR [MutableCacheComputeServer] JVM
>>>>>>> will be
>>>>>>> > halted immediately due to the failure: [failureCtx=FailureContext
>>>>>>> > [type=CRITICAL_ERROR, err=class
>>>>>>> o.a.i.i.mem.IgniteOutOfMemoryException:
>>>>>>> > Failed to find a page for eviction [segmentCapacity=1700,
>>>>>>> loaded=676,
>>>>>>> > maxDirtyPages=507, dirtyPages=675, cpPages=0, pinnedInSegment=2,
>>>>>>> > failedToPrepare=675]
>>>>>>> > Out of memory in data region [name=TAGFileBufferQueue,
>>>>>>> initSize=128.0 MiB,
>>>>>>> > maxSize=128.0 MiB, persistenceEnabled=true] Try the following:
>>>>>>> >   ^-- Increase maximum off-heap memory size
>>>>>>> > (DataRegionConfiguration.maxSize)
>>>>>>> >   ^-- Enable Ignite persistence
>>>>>>> > (DataRegionConfiguration.persistenceEnabled)
>>>>>>> >   ^-- Enable eviction or expiration policies]]
>>>>>>> >
>>>>>>> > I'm not running an eviction policy as I thought this was not
>>>>>>> required for
>>>>>>> > caches with persistence enabled.
>>>>>>> >
>>>>>>> > I'm surprised by this behaviour as I expected the persistence
>>>>>>> mechanism to
>>>>>>> > handle it. The error relating to failure to find a page for
>>>>>>> eviction
>>>>>>> > suggest the persistence mechanism has fallen behind. If this is
>>>>>>> the case,
>>>>>>> > this seems like an unfriendly failure mode.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Raymond.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>