You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by shivakumar <sh...@gmail.com> on 2019/04/16 04:13:30 UTC

nodes are restarting when i try to drop a table created with persistence enabled

Hi all,
I created a table with JDBC connection with native persistence enabled in
partitioned mode and i have 2 ignite nodes (2.7.0 version) running in
kubernetes environment, then i ingested 1500000 records, when i try to drop
the table both the pods are restarting one after the other.
Please find the attached thread dump logs 
and after this drop statement is unsuccessful 

0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|           TABLE_CAT            |          TABLE_SCHEM           |          
TABLE_NAME           |           TABLE_TYPE           |            REMARKS             
|
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|                                | PUBLIC                         | DEVICE                        
| TABLE                          |                                 |
|                                | PUBLIC                         |
DIMENSIONS                     | TABLE                          |                                
|
|                                | PUBLIC                         | CELL                          
| TABLE                          |                                 |
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
0: jdbc:ignite:thin://ignite-service.cign.svc> DROP TABLE IF EXISTS
PUBLIC.DEVICE;
Error: Statement is closed. (state=,code=0)
java.sql.SQLException: Statement is closed.
        at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.ensureNotClosed(JdbcThinStatement.java:862)
        at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.getWarnings(JdbcThinStatement.java:454)
        at sqlline.Commands.execute(Commands.java:849)
        at sqlline.Commands.sql(Commands.java:733)
        at sqlline.SqlLine.dispatch(SqlLine.java:795)
        at sqlline.SqlLine.begin(SqlLine.java:668)
        at sqlline.SqlLine.start(SqlLine.java:373)
        at sqlline.SqlLine.main(SqlLine.java:265)
0: jdbc:ignite:thin://ignite-service.cign.svc> !quit
Closing: org.apache.ignite.internal.jdbc.thin.JdbcThinConnection
[root@vm-10-99-26-135 bin]# ./sqlline.sh --verbose=true -u
"jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;"
issuing: !connect
jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
'' '' org.apache.ignite.IgniteJdbcThinDriver
Connecting to
jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
Connected to: Apache Ignite (version 2.7.0#19700101-sha1:00000000)
Driver: Apache Ignite Thin JDBC Driver (version
2.7.0#20181130-sha1:256ae401)
Autocommit status: true
Transaction isolation: TRANSACTION_REPEATABLE_READ
sqlline version 1.3.0
0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|           TABLE_CAT            |          TABLE_SCHEM           |          
TABLE_NAME           |           TABLE_TYPE           |            REMARKS             
|
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|                                | PUBLIC                         | DEVICE                        
| TABLE                          |                                 |
|                                | PUBLIC                         |
DIMENSIONS                     | TABLE                          |                                
|
|                                | PUBLIC                         | CELL                          
| TABLE                          |                                 |
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
0: jdbc:ignite:thin://ignite-service.cign.svc> select count(*) from DEVICE;
+--------------------------------+
|            COUNT(*)            |
+--------------------------------+
| 1500000                        |
+--------------------------------+
1 row selected (5.665 seconds)
0: jdbc:ignite:thin://ignite-service.cign.svc>

ignite_thread_dump.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t2244/ignite_thread_dump.txt>   


shiva





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by maheshkr76private <ma...@gmail.com>.

Shivakumar's system configuration and mine could be different. But I feel, we
are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go
out of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump
attached. Please go through this.

Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a
ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP
TABLE?
All the caches we destroy are SQL caches. SO we use drop table.
IgniteCache.destroy gives an exception.
Exception in thread "main" class org.apache.ignite.IgniteException: Only
cache created with cache API may be removed with direct call to destroyCache
[cacheName=SQL_PUBLIC_PERSON1000]

2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use
SQL caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data
is in memory, we have about 100GB in memory and the data regions together
are about 128GB.

I don't want to confuse this thread as Shivakumar scenario could be
different
I don't mind discussing this on the original thread I opened (specified
above. memory leaks)
Bottom line is: deleting tables from one thick client, is causing other
thick clients to go OOM. This can be seen on 2.7.6 too.

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by maheshkr76private <ma...@gmail.com>.

Hello, please ignore the below comment on this topic

>>>
https://issues.apache.org/jira/browse/IGNITE-12255
Upon reviewing 12255, the description of this issue shows an exception
occurring on the thick client side. 
However, the logs, that I attached show a null pointer exception on the ALL
the server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Mahesh Renduchintala <ma...@aline-consulting.com>.

https://issues.apache.org/jira/browse/IGNITE-12255
Upon reviewing 12255, the description of this issue shows an exception occurring on the thick client side.
However, the logs, that I attached show a null pointer exception on the ALL the server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Mahesh Renduchintala <ma...@aline-consulting.com>.

Shivakumar's system configuration and mine could be different. But I feel, we are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go out of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump attached. Please go through this.

Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
All the caches we destroy are SQL caches. SO we use drop table. IgniteCache.destroy gives an exception.
Exception in thread "main" class org.apache.ignite.IgniteException: Only cache created with cache API may be removed with direct call to destroyCache [cacheName=SQL_PUBLIC_PERSON1000]

2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use SQL caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data is in memory, we have about 100GB in memory and the data regions together are about 128GB.

I don't want to confuse this thread as Shivakumar scenario could be different
I don't mind discussing this on the other thread I opened (specified above. memory leaks)
Bottom line is: deleting tables from one thick client, is causing other thick clients to go OOM. This can be seen on 2.7.6 too.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

Stacktrace and exception message has some valuable details:
org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
find a page for eviction [segmentCapacity=126515, loaded=49628,
maxDirtyPages=37221, dirtyPages=49627, cpPages=0, pinnedInSegment=1,
failedToPrepare=49628]

I see a following:
1. Not all data fits data region memory.
2. Exception occurs when underlying cache is destroyed
(IgniteCacheOffheapManagerImpl.stopCache/removeCacheData call in stack
trace).
3. Page for replacement to disk was not found (loaded=49628,
failedToPrepare=49628). Almost all pages are dirty (dirtyPages=49627).

Answering several questions can help:
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
2. Does the same occur if SQL is not enabled for a cache?
3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
4. Need to figure out why almost all pages are dirty. It might be a clue.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

Stacktrace and exception message has some valuable details:
org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
find a page for eviction [segmentCapacity=126515, loaded=49628,
maxDirtyPages=37221, dirtyPages=49627, cpPages=0, pinnedInSegment=1,
failedToPrepare=49628]

I see a following:
1. Not all data fits data region memory.
2. Exception occurs when underlying cache is destroyed
(IgniteCacheOffheapManagerImpl.stopCache/removeCacheData call in stack
trace).
3. Page for replacement to disk was not found (loaded=49628,
failedToPrepare=49628). Almost all pages are dirty (dirtyPages=49627).

Answering several questions can help:
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
2. Does the same occur if SQL is not enabled for a cache?
3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
4. Need to figure out why almost all pages are dirty. It might be a clue.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Mahesh Renduchintala <ma...@aline-consulting.com>.

We noted the same on 2.7.6 as well. Deleting tables continuously from a thick client causes out of memory exceptions in other thick clients.
The fix, regarding grid partition message exchanges,  that went in 2.7.6 does not seem to work.

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Ivan, Igor, Andrey, as SQL experts,

Does this sound like a known limitation or issue? If not, what do we need
to reproduce the scenario - heapdums?

-
Denis


On Thu, Sep 26, 2019 at 2:12 AM Shiva Kumar <sh...@gmail.com>
wrote:

> Hi dmagda,
>
> When I insert many records (~ 10 or 20 million) to the same table and try
> to drop table or delete records from the table, nodes are restarting, the
> restarts happens In the middle of drop or delete operation.
> According to the logs the cause for restart looks like OOM in the data
> region.
>
> regards,
> shiva
>
> On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov <dm...@gmail.com>
> wrote:
>
>> I think, the issue is that Ignite can't recover from
>> IgniteOutOfMemory, even by removing data.
>> Shiva, did IgniteOutOfMemory occur for the first time when you did the
>> DROP TABLE, or before that?
>>
>> Denis
>>
>> ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
>> >
>> > Shiva,
>> >
>> > Does this issue still exist? Ignite Dev how do we debug this sort of
>> thing?
>> >
>> > -
>> > Denis
>> >
>> >
>> > On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
>> wrote:
>> >>
>> >> Hi dmagda,
>> >>
>> >> I am trying to drop the table which has around 10 million records and
>> I am seeing "Out of memory in data region" error messages in Ignite logs
>> and ignite node [Ignite pod on kubernetes] is restarting.
>> >> I have configured 3GB for default data region, 7GB for JVM and total
>> 15GB for Ignite container and enabled native persistence.
>> >> Earlier I was in an impression that restart was caused by
>> "SYSTEM_WORKER_BLOCKED" errors but now I am realized that
>> "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual
>> cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>> >>
>> >> This is the error messages in logs:
>> >>
>> >> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted
>> immediately due to the failure: [failureCtx=FailureContext
>> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
>> Failed to find a page for eviction [segmentCapacity=971652, loaded=381157,
>> maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
>> failedToPrepare=381155]
>> >> Out of memory in data region [name=Default_Region, initSize=500.0 MiB,
>> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>> >>   ^-- Increase maximum off-heap memory size
>> (DataRegionConfiguration.maxSize)
>> >>   ^-- Enable Ignite persistence
>> (DataRegionConfiguration.persistenceEnabled)
>> >>   ^-- Enable eviction or expiration policies]]
>> >>
>> >> Could you please help me on why drop table operation causing  "Out of
>> memory in data region"? and how I can avoid it?
>> >>
>> >> We have a use case where application inserts records to many tables in
>> Ignite simultaneously for some time period and other applications run a
>> query on that time period data and update the dashboard. we need to delete
>> the records inserted in the previous time period before inserting new
>> records.
>> >>
>> >> even during delete from table operation, I have seen:
>> >>
>> >> "Critical system error detected. Will be handled accordingly to
>> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
>> timeout=0, super=AbstractFailureHandler
>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
>> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock
>> acquisition has been timed out.]] class org.apache.ignite.IgniteException:
>> Checkpoint read lock acquisition has been timed out.|
>> >>
>> >>
>> >>
>> >> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org>
>> wrote:
>> >>>
>> >>> Hi Shiva,
>> >>>
>> >>> That was designed to prevent global cluster performance degradation
>> or other outages. Have you tried to apply my recommendation of turning of
>> the failure handler for this system threads?
>> >>>
>> >>> -
>> >>> Denis
>> >>>
>> >>>
>> >>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
>> wrote:
>> >>>>
>> >>>> HI Denis,
>> >>>>
>> >>>> is there any specific reason for the blocking of critical thread,
>> like CPU
>> >>>> is full or Heap is full ?
>> >>>> We are again and again hitting this issue.
>> >>>> is there any other way to drop tables/cache ?
>> >>>> This looks like a critical issue.
>> >>>>
>> >>>> regards,
>> >>>> shiva
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Shiva Kumar <sh...@gmail.com>.

Hi dmagda,

When I insert many records (~ 10 or 20 million) to the same table and try
to drop table or delete records from the table, nodes are restarting, the
restarts happens In the middle of drop or delete operation.
According to the logs the cause for restart looks like OOM in the data
region.

regards,
shiva

On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov <dm...@gmail.com>
wrote:

> I think, the issue is that Ignite can't recover from
> IgniteOutOfMemory, even by removing data.
> Shiva, did IgniteOutOfMemory occur for the first time when you did the
> DROP TABLE, or before that?
>
> Denis
>
> ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
> >
> > Shiva,
> >
> > Does this issue still exist? Ignite Dev how do we debug this sort of
> thing?
> >
> > -
> > Denis
> >
> >
> > On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
> wrote:
> >>
> >> Hi dmagda,
> >>
> >> I am trying to drop the table which has around 10 million records and I
> am seeing "Out of memory in data region" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> >> I have configured 3GB for default data region, 7GB for JVM and total
> 15GB for Ignite container and enabled native persistence.
> >> Earlier I was in an impression that restart was caused by
> "SYSTEM_WORKER_BLOCKED" errors but now I am realized that
> "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual
> cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
> >>
> >> This is the error messages in logs:
> >>
> >> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction [segmentCapacity=971652, loaded=381157,
> maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
> failedToPrepare=381155]
> >> Out of memory in data region [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
> >>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
> >>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
> >>   ^-- Enable eviction or expiration policies]]
> >>
> >> Could you please help me on why drop table operation causing  "Out of
> memory in data region"? and how I can avoid it?
> >>
> >> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
> >>
> >> even during delete from table operation, I have seen:
> >>
> >> "Critical system error detected. Will be handled accordingly to
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
> timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock
> acquisition has been timed out.]] class org.apache.ignite.IgniteException:
> Checkpoint read lock acquisition has been timed out.|
> >>
> >>
> >>
> >> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
> >>>
> >>> Hi Shiva,
> >>>
> >>> That was designed to prevent global cluster performance degradation or
> other outages. Have you tried to apply my recommendation of turning of the
> failure handler for this system threads?
> >>>
> >>> -
> >>> Denis
> >>>
> >>>
> >>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
> wrote:
> >>>>
> >>>> HI Denis,
> >>>>
> >>>> is there any specific reason for the blocking of critical thread,
> like CPU
> >>>> is full or Heap is full ?
> >>>> We are again and again hitting this issue.
> >>>> is there any other way to drop tables/cache ?
> >>>> This looks like a critical issue.
> >>>>
> >>>> regards,
> >>>> shiva
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Mekhanikov <dm...@gmail.com>.

I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Mekhanikov <dm...@gmail.com>.

I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <dm...@apache.org>:
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Shiva,

Does this issue still exist? Ignite Dev how do we debug this sort of thing?

-
Denis


On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
wrote:

> Hi dmagda,
>
> I am trying to drop the table which has around 10 million records and I am
> seeing "*Out of memory in data region*" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> I have configured 3GB for default data region, 7GB for JVM and total 15GB
> for Ignite container and enabled native persistence.
> Earlier I was in an impression that restart was caused by "
> *SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
> *SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
> cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*
>
> This is the error messages in logs:
>
> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction* [segmentCapacity=971652,
> loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0,
> pinnedInSegment=3, failedToPrepare=381155]
> *Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
>   ^-- Enable eviction or expiration policies]]
>
> Could you please help me on why *drop table operation* causing  "*Out of
> memory in data region"*? and how I can avoid it?
>
> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
>
> even during *delete from table* operation, I have seen:
>
> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class o.a.i.IgniteException: *Checkpoint read lock acquisition has been timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>
>
>
> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>
>> Hi Shiva,
>>
>> That was designed to prevent global cluster performance degradation or
>> other outages. Have you tried to apply my recommendation of turning of the
>> failure handler for this system threads?
>>
>> -
>> Denis
>>
>>
>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
>> wrote:
>>
>>> HI Denis,
>>>
>>> is there any specific reason for the blocking of critical thread, like
>>> CPU
>>> is full or Heap is full ?
>>> We are again and again hitting this issue.
>>> is there any other way to drop tables/cache ?
>>> This looks like a critical issue.
>>>
>>> regards,
>>> shiva
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Shiva,

Does this issue still exist? Ignite Dev how do we debug this sort of thing?

-
Denis


On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <sh...@gmail.com>
wrote:

> Hi dmagda,
>
> I am trying to drop the table which has around 10 million records and I am
> seeing "*Out of memory in data region*" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> I have configured 3GB for default data region, 7GB for JVM and total 15GB
> for Ignite container and enabled native persistence.
> Earlier I was in an impression that restart was caused by "
> *SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
> *SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
> cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*
>
> This is the error messages in logs:
>
> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction* [segmentCapacity=971652,
> loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0,
> pinnedInSegment=3, failedToPrepare=381155]
> *Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
>   ^-- Enable eviction or expiration policies]]
>
> Could you please help me on why *drop table operation* causing  "*Out of
> memory in data region"*? and how I can avoid it?
>
> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
>
> even during *delete from table* operation, I have seen:
>
> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class o.a.i.IgniteException: *Checkpoint read lock acquisition has been timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>
>
>
> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:
>
>> Hi Shiva,
>>
>> That was designed to prevent global cluster performance degradation or
>> other outages. Have you tried to apply my recommendation of turning of the
>> failure handler for this system threads?
>>
>> -
>> Denis
>>
>>
>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
>> wrote:
>>
>>> HI Denis,
>>>
>>> is there any specific reason for the blocking of critical thread, like
>>> CPU
>>> is full or Heap is full ?
>>> We are again and again hitting this issue.
>>> is there any other way to drop tables/cache ?
>>> This looks like a critical issue.
>>>
>>> regards,
>>> shiva
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Shiva Kumar <sh...@gmail.com>.

Hi dmagda,

I am trying to drop the table which has around 10 million records and I am
seeing "*Out of memory in data region*" error messages in Ignite logs and
ignite node [Ignite pod on kubernetes] is restarting.
I have configured 3GB for default data region, 7GB for JVM and total 15GB
for Ignite container and enabled native persistence.
Earlier I was in an impression that restart was caused by "
*SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
*SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*

This is the error messages in logs:

""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
Failed to find a page for eviction* [segmentCapacity=971652, loaded=381157,
maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
failedToPrepare=381155]
*Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size
(DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]

Could you please help me on why *drop table operation* causing  "*Out of
memory in data region"*? and how I can avoid it?

We have a use case where application inserts records to many tables in
Ignite simultaneously for some time period and other applications run a
query on that time period data and update the dashboard. we need to delete
the records inserted in the previous time period before inserting new
records.

even during *delete from table* operation, I have seen:

"Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class
o.a.i.IgniteException: *Checkpoint read lock acquisition has been
timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read
lock acquisition has been timed out.|

On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <dm...@apache.org> wrote:

> Hi Shiva,
>
> That was designed to prevent global cluster performance degradation or
> other outages. Have you tried to apply my recommendation of turning of the
> failure handler for this system threads?
>
> -
> Denis
>
>
> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
> wrote:
>
>> HI Denis,
>>
>> is there any specific reason for the blocking of critical thread, like CPU
>> is full or Heap is full ?
>> We are again and again hitting this issue.
>> is there any other way to drop tables/cache ?
>> This looks like a critical issue.
>>
>> regards,
>> shiva
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Hi Shiva,

That was designed to prevent global cluster performance degradation or
other outages. Have you tried to apply my recommendation of turning of the
failure handler for this system threads?

-
Denis

On Sun, Apr 28, 2019 at 10:28 AM shivakumar <sh...@gmail.com>
wrote:

> HI Denis,
>
> is there any specific reason for the blocking of critical thread, like CPU
> is full or Heap is full ?
> We are again and again hitting this issue.
> is there any other way to drop tables/cache ?
> This looks like a critical issue.
>
> regards,
> shiva
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by shivakumar <sh...@gmail.com>.

HI Denis,

is there any specific reason for the blocking of critical thread, like CPU
is full or Heap is full ?
We are again and again hitting this issue.
is there any other way to drop tables/cache ?
This looks like a critical issue.

regards,
shiva 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes are restarting when i try to drop a table created with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Seems that the system worker has been blocked on your end for more
than 30 seconds and this caused the shutdown due to an watchdog:

[2019-04-12T10:52:27,451][ERROR][tcp-disco-msg-worker-#2][G] Blocked
system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=db-checkpoint-thread,
blockedFor=32s]
[2019-04-12T10:52:27,451][WARN ][tcp-disco-msg-worker-#2][G] Thread
[name="db-checkpoint-thread-#61", id=115, state=WAITING, blockCnt=39,
waitCnt=309]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@92173e8,
ownerName=null, ownerId=-1]

[2019-04-12T10:52:27,451][ERROR][tcp-disco-msg-worker-#2][] Critical
system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=db-checkpoint-thread,
igniteInstanceName=null, finished=false, heartbeatTs=1555066315438]]]


Try to tune this watchdog or disable. That's what the docs say:

https://apacheignite.readme.io/docs/critical-failures-handling#section-critical-workers-health-check

Ignite has an internal mechanism for verifying that critical workers
are operational. Each worker is regularly checked whether it's alive
and is updating its heartbeat timestamp. If either of the conditions
is not observed for the configured period of time, the worker is
regarded as blocked and Ignite will output that information to the log
file. The period of inactivity is specified by the
IgniteConfiguration.systemWorkerBlockedTimeout property (in
milliseconds; the default value equals the failure detection timeout
<https://apacheignite.readme.io/docs/tcpip-discovery#section-failure-detection-timeout>).


This behavior will be revisited in Ignite soon:
http://apache-ignite-developers.2346864.n4.nabble.com/GridDhtInvalidPartitionException-takes-the-cluster-down-td41459.html


-
Denis


On Mon, Apr 15, 2019 at 9:13 PM shivakumar <sh...@gmail.com> wrote:

> Hi all,
> I created a table with JDBC connection with native persistence enabled in
> partitioned mode and i have 2 ignite nodes (2.7.0 version) running in
> kubernetes environment, then i ingested 1500000 records, when i try to drop
> the table both the pods are restarting one after the other.
> Please find the attached thread dump logs
> and after this drop statement is unsuccessful
>
> 0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> |           TABLE_CAT            |          TABLE_SCHEM           |
>
> TABLE_NAME           |           TABLE_TYPE           |
> REMARKS
> |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> |                                | PUBLIC                         |
> DEVICE
> | TABLE                          |                                 |
> |                                | PUBLIC                         |
> DIMENSIONS                     | TABLE                          |
>
> |
> |                                | PUBLIC                         | CELL
>
> | TABLE                          |                                 |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> 0: jdbc:ignite:thin://ignite-service.cign.svc> DROP TABLE IF EXISTS
> PUBLIC.DEVICE;
> Error: Statement is closed. (state=,code=0)
> java.sql.SQLException: Statement is closed.
>         at
>
> org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.ensureNotClosed(JdbcThinStatement.java:862)
>         at
>
> org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.getWarnings(JdbcThinStatement.java:454)
>         at sqlline.Commands.execute(Commands.java:849)
>         at sqlline.Commands.sql(Commands.java:733)
>         at sqlline.SqlLine.dispatch(SqlLine.java:795)
>         at sqlline.SqlLine.begin(SqlLine.java:668)
>         at sqlline.SqlLine.start(SqlLine.java:373)
>         at sqlline.SqlLine.main(SqlLine.java:265)
> 0: jdbc:ignite:thin://ignite-service.cign.svc> !quit
> Closing: org.apache.ignite.internal.jdbc.thin.JdbcThinConnection
> [root@vm-10-99-26-135 bin]# ./sqlline.sh --verbose=true -u
>
> "jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;"
> issuing: !connect
>
> jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
> '' '' org.apache.ignite.IgniteJdbcThinDriver
> Connecting to
>
> jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
> Connected to: Apache Ignite (version 2.7.0#19700101-sha1:00000000)
> Driver: Apache Ignite Thin JDBC Driver (version
> 2.7.0#20181130-sha1:256ae401)
> Autocommit status: true
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> sqlline version 1.3.0
> 0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> |           TABLE_CAT            |          TABLE_SCHEM           |
>
> TABLE_NAME           |           TABLE_TYPE           |
> REMARKS
> |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> |                                | PUBLIC                         |
> DEVICE
> | TABLE                          |                                 |
> |                                | PUBLIC                         |
> DIMENSIONS                     | TABLE                          |
>
> |
> |                                | PUBLIC                         | CELL
>
> | TABLE                          |                                 |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> 0: jdbc:ignite:thin://ignite-service.cign.svc> select count(*) from DEVICE;
> +--------------------------------+
> |            COUNT(*)            |
> +--------------------------------+
> | 1500000                        |
> +--------------------------------+
> 1 row selected (5.665 seconds)
> 0: jdbc:ignite:thin://ignite-service.cign.svc>
>
> ignite_thread_dump.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2244/ignite_thread_dump.txt>
>
>
>
> shiva
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>