You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by praveeng <pr...@gmail.com> on 2019/03/22 12:10:05 UTC

Ignite node is down due to full RAM usage

Hi,

Ignite version : 1.8
One of the ignite node in 3node cluster is down due to full usage of RAM.

At that point of time i can observe the following logs on this node:

[00:32:02,119][INFO
][grid-timeout-worker-#7%CasinoApacheIgniteServices%][IgniteKernal%CasinoApacheIgniteServices]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=9f8df386, name=CasinoApacheIgniteServices,
uptime=23:21:45:744]
    ^-- H/N/C [hosts=8, nodes=8, CPUs=44]
    ^-- CPU [cur=8.33%, avg=1.6%, GC=0%]
    ^-- Heap [used=3886MB, free=36.65%, comm=6134MB]
    ^-- Non heap [used=78MB, free=85.96%, comm=529MB]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]

[00:33:24,674][WARN
][exchange-worker-#23%CasinoApacheIgniteServices%][GridCachePartitionExchangeManager]
Failed to wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=84, minorTopVer=0], node=9f8df386-2886-451f-b1ff-53713878d432].
Dumping pending objects that might be the cause:
[00:33:24,674][WARN
][exchange-worker-#23%CasinoApacheIgniteServices%][GridCachePartitionExchangeManager]
Failed to wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=84, minorTopVer=0], node=9f8df386-2886-451f-b1ff-53713878d432].
Dumping pending objects that might be the cause:


SAR stats for memory usage on this date: 

-- mar 6
12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit  
%commit  kbactive   kbinact   kbdirty
12:10:01 PM    170120  16090232     98.95         0   3393384   8222696    
45.02   9887268   2088504        60
01:50:01 PM    168176  16092176     98.97         0   2120848   8224724    
45.03  10804712   1596792        48
03:10:01 PM    199128  16061224     98.78         0    991832   8224904    
45.04  11384652   1241284       436
04:10:01 PM    153060  16107292     99.06         0    229984   8224880    
45.04  11255628   1627600       208
04:20:01 PM    165580  16094772     98.98         0     78572   8224828    
45.03  11338592   1560944        52
04:30:01 PM    153508  16106844     99.06         0     29740   8224872    
45.03  11436544   1579468        44
04:40:01 PM    162184  16098168     99.00         0     33152   8224892    
45.04  11606584   1580388        24
11:10:01 PM    370956  15889396     97.72         0     74816   8225312    
45.04  11927676   1610828        36
11:20:01 PM    348576  15911776     97.86         0     69012   8225272    
45.04  11929820   1602748        48
11:30:01 PM    359132  15901220     97.79         0     27060   8225308    
45.04  11912656   1577848        36
11:40:01 PM    340252  15920100     97.91         0     24908   8225272    
45.04  11910516   1577668        32
11:50:01 PM    308340  15952012     98.10         0     39208   8242284    
45.13  11914564   1589208        48
Average:       253568  16006784     98.44         0   2317289   8226063    
45.04  10368276   1955525       142

Please find the attached file for the cache configuration.

  ignite-clb-cache-config_dev.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/t1753/ignite-clb-cache-config_dev.xml>  

Please find the memory snapshot which is captured by app dynamics tool in
the attachment.
memorySnapshot.JPG
<http://apache-ignite-users.70518.x6.nabble.com/file/t1753/memorySnapshot.JPG>  

Following is my analysis.
When the data is evicting from on heap to off heap, there is not much space
in off heap.
Due to that off heap memory usage is full and application has become slow
and unresponsive.

Even the data in off heap is not expired because of that there is not much
free memory in RAM.
After i restarted the application on this node, the RAM usage has become to
25% and now it's usage is 45%.

can you please check and suggest once.

Thanks,
Praveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite node is down due to full RAM usage

Posted by praveeng <pr...@gmail.com>.
Hi,

As we can't upgrade java version to 1.8, we can't use the ignite latest
version.
If it is because of Heap Memory issue, i could have got the OOM error in
logs and heap dump might have generated automatically.
This could be because of the data in off heap is not expired and the RAM is
used completely.

Thanks,
Praveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite node is down due to full RAM usage

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Unfortunately I would not expect anyone to be debugging your 1.8 cluster
since most people upgraded to 2.x.

Next time this happens, can you capture heap dump from problematic node?
Dominator graph & per-class histogram may help tremendously.

Regards,
-- 
Ilya Kasnacheev


пт, 22 мар. 2019 г. в 15:10, praveeng <pr...@gmail.com>:

> Hi,
>
> Ignite version : 1.8
> One of the ignite node in 3node cluster is down due to full usage of RAM.
>
> At that point of time i can observe the following logs on this node:
>
> [00:32:02,119][INFO
>
> ][grid-timeout-worker-#7%CasinoApacheIgniteServices%][IgniteKernal%CasinoApacheIgniteServices]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=9f8df386, name=CasinoApacheIgniteServices,
> uptime=23:21:45:744]
>     ^-- H/N/C [hosts=8, nodes=8, CPUs=44]
>     ^-- CPU [cur=8.33%, avg=1.6%, GC=0%]
>     ^-- Heap [used=3886MB, free=36.65%, comm=6134MB]
>     ^-- Non heap [used=78MB, free=85.96%, comm=529MB]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=16, qSize=0]
>     ^-- Outbound messages queue [size=0]
>
> [00:33:24,674][WARN
>
> ][exchange-worker-#23%CasinoApacheIgniteServices%][GridCachePartitionExchangeManager]
> Failed to wait for partition map exchange [topVer=AffinityTopologyVersion
> [topVer=84, minorTopVer=0], node=9f8df386-2886-451f-b1ff-53713878d432].
> Dumping pending objects that might be the cause:
> [00:33:24,674][WARN
>
> ][exchange-worker-#23%CasinoApacheIgniteServices%][GridCachePartitionExchangeManager]
> Failed to wait for partition map exchange [topVer=AffinityTopologyVersion
> [topVer=84, minorTopVer=0], node=9f8df386-2886-451f-b1ff-53713878d432].
> Dumping pending objects that might be the cause:
>
>
> SAR stats for memory usage on this date:
>
> -- mar 6
> 12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> %commit  kbactive   kbinact   kbdirty
> 12:10:01 PM    170120  16090232     98.95         0   3393384   8222696
> 45.02   9887268   2088504        60
> 01:50:01 PM    168176  16092176     98.97         0   2120848   8224724
> 45.03  10804712   1596792        48
> 03:10:01 PM    199128  16061224     98.78         0    991832   8224904
> 45.04  11384652   1241284       436
> 04:10:01 PM    153060  16107292     99.06         0    229984   8224880
> 45.04  11255628   1627600       208
> 04:20:01 PM    165580  16094772     98.98         0     78572   8224828
> 45.03  11338592   1560944        52
> 04:30:01 PM    153508  16106844     99.06         0     29740   8224872
> 45.03  11436544   1579468        44
> 04:40:01 PM    162184  16098168     99.00         0     33152   8224892
> 45.04  11606584   1580388        24
> 11:10:01 PM    370956  15889396     97.72         0     74816   8225312
> 45.04  11927676   1610828        36
> 11:20:01 PM    348576  15911776     97.86         0     69012   8225272
> 45.04  11929820   1602748        48
> 11:30:01 PM    359132  15901220     97.79         0     27060   8225308
> 45.04  11912656   1577848        36
> 11:40:01 PM    340252  15920100     97.91         0     24908   8225272
> 45.04  11910516   1577668        32
> 11:50:01 PM    308340  15952012     98.10         0     39208   8242284
> 45.13  11914564   1589208        48
> Average:       253568  16006784     98.44         0   2317289   8226063
> 45.04  10368276   1955525       142
>
> Please find the attached file for the cache configuration.
>
>   ignite-clb-cache-config_dev.xml
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t1753/ignite-clb-cache-config_dev.xml>
>
>
> Please find the memory snapshot which is captured by app dynamics tool in
> the attachment.
> memorySnapshot.JPG
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t1753/memorySnapshot.JPG>
>
>
> Following is my analysis.
> When the data is evicting from on heap to off heap, there is not much space
> in off heap.
> Due to that off heap memory usage is full and application has become slow
> and unresponsive.
>
> Even the data in off heap is not expired because of that there is not much
> free memory in RAM.
> After i restarted the application on this node, the RAM usage has become to
> 25% and now it's usage is 45%.
>
> can you please check and suggest once.
>
> Thanks,
> Praveen
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>