You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Shiva Kumar <sh...@gmail.com> on 2019/09/17 14:22:07 UTC

Re: nodes are restarting when i try to drop a table created with persistence enabled

Hi dmagda,

I am trying to drop the table which has around 10 million records and I am
seeing "*Out of memory in data region*" error messages in Ignite logs and
ignite node [Ignite pod on kubernetes] is restarting.
I have configured 3GB for default data region, 7GB for JVM and total 15GB
for Ignite container and enabled native persistence.
Earlier I was in an impression that restart was caused by "
*SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
*SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*

This is the error messages in logs:

""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
Failed to find a page for eviction* [segmentCapacity=971652, loaded=381157,
maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
failedToPrepare=381155]
*Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size
(DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]

Could you please help me on why *drop table operation* causing  "*Out of
memory in data region"*? and how I can avoid it?

We have a use case where application inserts records to many tables in
Ignite simultaneously for some time period and other applications run a
query on that time period data and update the dashboard. we need to delete
the records inserted in the previous time period before inserting new
records.

even during *delete from table* operation, I have seen:

"Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class
o.a.i.IgniteException: *Checkpoint read lock acquisition has been
timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read
lock acquisition has been timed out.|

On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:

> Hi Shiva,
>
> That was designed to prevent global cluster performance degradation or
> other outages. Have you tried to apply my recommendation of turning of the
> failure handler for this system threads?
>
> -
> Denis
>
>
> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
> wrote:
>
>> HI Denis,
>>
>> is there any specific reason for the blocking of critical thread, like CPU
>> is full or Heap is full ?
>> We are again and again hitting this issue.
>> is there any other way to drop tables/cache ?
>> This looks like a critical issue.
>>
>> regards,
>> shiva
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by maheshkr76private <ma...@gmail.com>.

Shivakumar's system configuration and mine could be different. But I feel, we
are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go
out of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump
attached. Please go through this.

Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a
ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP
TABLE?
All the caches we destroy are SQL caches. SO we use drop table.
IgniteCache.destroy gives an exception.
Exception in thread "main" class org.apache.ignite.IgniteException: Only
cache created with cache API may be removed with direct call to destroyCache
[cacheName=SQL_PUBLIC_PERSON1000]

2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use
SQL caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data
is in memory, we have about 100GB in memory and the data regions together
are about 128GB.

I don't want to confuse this thread as Shivakumar scenario could be
different
I don't mind discussing this on the original thread I opened (specified
above. memory leaks)
Bottom line is: deleting tables from one thick client, is causing other
thick clients to go OOM. This can be seen on 2.7.6 too.

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by maheshkr76private <ma...@gmail.com>.

Hello, please ignore the below comment on this topic

>>>
https://issues.apache.org/jira/browse/IGNITE-12255
Upon reviewing 12255, the description of this issue shows an exception
occurring on the thick client side. 
However, the logs, that I attached show a null pointer exception on the ALL
the server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Mahesh Renduchintala <ma...@aline-consulting.com>.

https://issues.apache.org/jira/browse/IGNITE-12255
Upon reviewing 12255, the description of this issue shows an exception occurring on the thick client side.
However, the logs, that I attached show a null pointer exception on the ALL the server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Mahesh Renduchintala <ma...@aline-consulting.com>.

Shivakumar's system configuration and mine could be different. But I feel, we are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go out of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump attached. Please go through this.

Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
All the caches we destroy are SQL caches. SO we use drop table. IgniteCache.destroy gives an exception.
Exception in thread "main" class org.apache.ignite.IgniteException: Only cache created with cache API may be removed with direct call to destroyCache [cacheName=SQL_PUBLIC_PERSON1000]

2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use SQL caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data is in memory, we have about 100GB in memory and the data regions together are about 128GB.

I don't want to confuse this thread as Shivakumar scenario could be different
I don't mind discussing this on the other thread I opened (specified above. memory leaks)
Bottom line is: deleting tables from one thick client, is causing other thick clients to go OOM. This can be seen on 2.7.6 too.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

Stacktrace and exception message has some valuable details:
org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
find a page for eviction [segmentCapacity=126515, loaded=49628,
maxDirtyPages=37221, dirtyPages=49627, cpPages=0, pinnedInSegment=1,
failedToPrepare=49628]

I see a following:
1. Not all data fits data region memory.
2. Exception occurs when underlying cache is destroyed
(IgniteCacheOffheapManagerImpl.stopCache/removeCacheData call in stack
trace).
3. Page for replacement to disk was not found (loaded=49628,
failedToPrepare=49628). Almost all pages are dirty (dirtyPages=49627).

Answering several questions can help:
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
2. Does the same occur if SQL is not enabled for a cache?
3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
4. Need to figure out why almost all pages are dirty. It might be a clue.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

Stacktrace and exception message has some valuable details:
org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
find a page for eviction [segmentCapacity=126515, loaded=49628,
maxDirtyPages=37221, dirtyPages=49627, cpPages=0, pinnedInSegment=1,
failedToPrepare=49628]

I see a following:
1. Not all data fits data region memory.
2. Exception occurs when underlying cache is destroyed
(IgniteCacheOffheapManagerImpl.stopCache/removeCacheData call in stack
trace).
3. Page for replacement to disk was not found (loaded=49628,
failedToPrepare=49628). Almost all pages are dirty (dirtyPages=49627).

Answering several questions can help:
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
2. Does the same occur if SQL is not enabled for a cache?
3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
4. Need to figure out why almost all pages are dirty. It might be a clue.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Mahesh Renduchintala <ma...@aline-consulting.com>.

We noted the same on 2.7.6 as well. Deleting tables continuously from a thick client causes out of memory exceptions in other thick clients.
The fix, regarding grid partition message exchanges,  that went in 2.7.6 does not seem to work.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Ivan, Igor, Andrey, as SQL experts,

Does this sound like a known limitation or issue? If not, what do we need
to reproduce the scenario - heapdums?

-
Denis


On Thu, Sep 26, 2019 at 2:12 AM Shiva Kumar <sh...@gmail.com>
wrote:

> Hi dmagda,
>
> When I insert many records (~ 10 or 20 million) to the same table and try
> to drop table or delete records from the table, nodes are restarting, the
> restarts happens In the middle of drop or delete operation.
> According to the logs the cause for restart looks like OOM in the data
> region.
>
> regards,
> shiva
>
> On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov <dm...@gmail.com>
> wrote:
>
>> I think, the issue is that Ignite can't recover from
>> IgniteOutOfMemory, even by removing data.
>> Shiva, did IgniteOutOfMemory occur for the first time when you did the
>> DROP TABLE, or before that?
>>
>> Denis
>>
>> ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
>> >
>> > Shiva,
>> >
>> > Does this issue still exist? Ignite Dev how do we debug this sort of
>> thing?
>> >
>> > -
>> > Denis
>> >
>> >
>> > On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
>> wrote:
>> >>
>> >> Hi dmagda,
>> >>
>> >> I am trying to drop the table which has around 10 million records and
>> I am seeing "Out of memory in data region" error messages in Ignite logs
>> and ignite node [Ignite pod on kubernetes] is restarting.
>> >> I have configured 3GB for default data region, 7GB for JVM and total
>> 15GB for Ignite container and enabled native persistence.
>> >> Earlier I was in an impression that restart was caused by
>> "SYSTEM_WORKER_BLOCKED" errors but now I am realized that
>> "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual
>> cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>> >>
>> >> This is the error messages in logs:
>> >>
>> >> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted
>> immediately due to the failure: [failureCtx=FailureContext
>> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
>> Failed to find a page for eviction [segmentCapacity=971652, loaded=381157,
>> maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
>> failedToPrepare=381155]
>> >> Out of memory in data region [name=Default_Region, initSize=500.0 MiB,
>> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>> >>   ^-- Increase maximum off-heap memory size
>> (DataRegionConfiguration.maxSize)
>> >>   ^-- Enable Ignite persistence
>> (DataRegionConfiguration.persistenceEnabled)
>> >>   ^-- Enable eviction or expiration policies]]
>> >>
>> >> Could you please help me on why drop table operation causing  "Out of
>> memory in data region"? and how I can avoid it?
>> >>
>> >> We have a use case where application inserts records to many tables in
>> Ignite simultaneously for some time period and other applications run a
>> query on that time period data and update the dashboard. we need to delete
>> the records inserted in the previous time period before inserting new
>> records.
>> >>
>> >> even during delete from table operation, I have seen:
>> >>
>> >> "Critical system error detected. Will be handled accordingly to
>> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
>> timeout=0, super=AbstractFailureHandler
>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
>> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock
>> acquisition has been timed out.]] class org.apache.ignite.IgniteException:
>> Checkpoint read lock acquisition has been timed out.|
>> >>
>> >>
>> >>
>> >> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org>
>> wrote:
>> >>>
>> >>> Hi Shiva,
>> >>>
>> >>> That was designed to prevent global cluster performance degradation
>> or other outages. Have you tried to apply my recommendation of turning of
>> the failure handler for this system threads?
>> >>>
>> >>> -
>> >>> Denis
>> >>>
>> >>>
>> >>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
>> wrote:
>> >>>>
>> >>>> HI Denis,
>> >>>>
>> >>>> is there any specific reason for the blocking of critical thread,
>> like CPU
>> >>>> is full or Heap is full ?
>> >>>> We are again and again hitting this issue.
>> >>>> is there any other way to drop tables/cache ?
>> >>>> This looks like a critical issue.
>> >>>>
>> >>>> regards,
>> >>>> shiva
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Shiva Kumar <sh...@gmail.com>.

Hi dmagda,

When I insert many records (~ 10 or 20 million) to the same table and try
to drop table or delete records from the table, nodes are restarting, the
restarts happens In the middle of drop or delete operation.
According to the logs the cause for restart looks like OOM in the data
region.

regards,
shiva

On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov <dm...@gmail.com>
wrote:

> I think, the issue is that Ignite can't recover from
> IgniteOutOfMemory, even by removing data.
> Shiva, did IgniteOutOfMemory occur for the first time when you did the
> DROP TABLE, or before that?
>
> Denis
>
> ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
> >
> > Shiva,
> >
> > Does this issue still exist? Ignite Dev how do we debug this sort of
> thing?
> >
> > -
> > Denis
> >
> >
> > On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
> wrote:
> >>
> >> Hi dmagda,
> >>
> >> I am trying to drop the table which has around 10 million records and I
> am seeing "Out of memory in data region" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> >> I have configured 3GB for default data region, 7GB for JVM and total
> 15GB for Ignite container and enabled native persistence.
> >> Earlier I was in an impression that restart was caused by
> "SYSTEM_WORKER_BLOCKED" errors but now I am realized that
> "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual
> cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
> >>
> >> This is the error messages in logs:
> >>
> >> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction [segmentCapacity=971652, loaded=381157,
> maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
> failedToPrepare=381155]
> >> Out of memory in data region [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
> >>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
> >>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
> >>   ^-- Enable eviction or expiration policies]]
> >>
> >> Could you please help me on why drop table operation causing  "Out of
> memory in data region"? and how I can avoid it?
> >>
> >> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
> >>
> >> even during delete from table operation, I have seen:
> >>
> >> "Critical system error detected. Will be handled accordingly to
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
> timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock
> acquisition has been timed out.]] class org.apache.ignite.IgniteException:
> Checkpoint read lock acquisition has been timed out.|
> >>
> >>
> >>
> >> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
> >>>
> >>> Hi Shiva,
> >>>
> >>> That was designed to prevent global cluster performance degradation or
> other outages. Have you tried to apply my recommendation of turning of the
> failure handler for this system threads?
> >>>
> >>> -
> >>> Denis
> >>>
> >>>
> >>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
> wrote:
> >>>>
> >>>> HI Denis,
> >>>>
> >>>> is there any specific reason for the blocking of critical thread,
> like CPU
> >>>> is full or Heap is full ?
> >>>> We are again and again hitting this issue.
> >>>> is there any other way to drop tables/cache ?
> >>>> This looks like a critical issue.
> >>>>
> >>>> regards,
> >>>> shiva
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Mekhanikov <dm...@gmail.com>.

I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Mekhanikov <dm...@gmail.com>.

I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Shiva,

Does this issue still exist? Ignite Dev how do we debug this sort of thing?

-
Denis


On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
wrote:

> Hi dmagda,
>
> I am trying to drop the table which has around 10 million records and I am
> seeing "*Out of memory in data region*" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> I have configured 3GB for default data region, 7GB for JVM and total 15GB
> for Ignite container and enabled native persistence.
> Earlier I was in an impression that restart was caused by "
> *SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
> *SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
> cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*
>
> This is the error messages in logs:
>
> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction* [segmentCapacity=971652,
> loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0,
> pinnedInSegment=3, failedToPrepare=381155]
> *Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
>   ^-- Enable eviction or expiration policies]]
>
> Could you please help me on why *drop table operation* causing  "*Out of
> memory in data region"*? and how I can avoid it?
>
> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
>
> even during *delete from table* operation, I have seen:
>
> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class o.a.i.IgniteException: *Checkpoint read lock acquisition has been timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>
>
>
> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>
>> Hi Shiva,
>>
>> That was designed to prevent global cluster performance degradation or
>> other outages. Have you tried to apply my recommendation of turning of the
>> failure handler for this system threads?
>>
>> -
>> Denis
>>
>>
>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
>> wrote:
>>
>>> HI Denis,
>>>
>>> is there any specific reason for the blocking of critical thread, like
>>> CPU
>>> is full or Heap is full ?
>>> We are again and again hitting this issue.
>>> is there any other way to drop tables/cache ?
>>> This looks like a critical issue.
>>>
>>> regards,
>>> shiva
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Shiva,

Does this issue still exist? Ignite Dev how do we debug this sort of thing?

-
Denis


On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
wrote:

> Hi dmagda,
>
> I am trying to drop the table which has around 10 million records and I am
> seeing "*Out of memory in data region*" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> I have configured 3GB for default data region, 7GB for JVM and total 15GB
> for Ignite container and enabled native persistence.
> Earlier I was in an impression that restart was caused by "
> *SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
> *SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
> cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*
>
> This is the error messages in logs:
>
> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction* [segmentCapacity=971652,
> loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0,
> pinnedInSegment=3, failedToPrepare=381155]
> *Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
>   ^-- Enable eviction or expiration policies]]
>
> Could you please help me on why *drop table operation* causing  "*Out of
> memory in data region"*? and how I can avoid it?
>
> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
>
> even during *delete from table* operation, I have seen:
>
> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class o.a.i.IgniteException: *Checkpoint read lock acquisition has been timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>
>
>
> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>
>> Hi Shiva,
>>
>> That was designed to prevent global cluster performance degradation or
>> other outages. Have you tried to apply my recommendation of turning of the
>> failure handler for this system threads?
>>
>> -
>> Denis
>>
>>
>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
>> wrote:
>>
>>> HI Denis,
>>>
>>> is there any specific reason for the blocking of critical thread, like
>>> CPU
>>> is full or Heap is full ?
>>> We are again and again hitting this issue.
>>> is there any other way to drop tables/cache ?
>>> This looks like a critical issue.
>>>
>>> regards,
>>> shiva
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>