You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Amit Pundir <am...@gmail.com> on 2017/11/08 19:33:49 UTC

Out of memory in client node freezes complete cluster

Hi,
I am using Ignite 2.0. I have observed that if there is an out of memory
error on any Ignite client node, the complete cluster becomes unresponsive. 

A few details about my caches/operations -
1. Atomicity mode - Transactional
2. Locking - Pessimistic with repeatable read.


Is this expected to happen? If so, what are the options to ensure cluster
availability besides restarting the nodes and allocate large enough memory
to all the nodes to avoid OOM at every cost?


Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Out of memory in client node freezes complete cluster

Posted by Denis Magda <dm...@apache.org>.
Hello Amit,

There are the plans to make the cluster to heal itself by kicking off unstable nodes or unblocking pending transactions if an abnormal situation happens:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-5+Cluster+reaction+if+node+detects+an+extraordinary+situations

Created a ticket for your particular problem:
https://issues.apache.org/jira/browse/IGNITE-6953

Please attache the logs to facilitate with the reproducer.

Anyway, for now I would find out why the OOM happens. Find the root cause and heal it. 

—
Denis

> On Nov 14, 2017, at 4:01 AM, Ilya Kasnacheev <il...@gmail.com> wrote:
> 
> Hello!
> 
> My recommendation here is to always leave some extra RAM and heap so that a hot spot won't cause OOM. Maybe use less RAM-intensive algorithms.
> 
> Without stack traces and logs it's hard to say more, but OOM may not be a recoverable error with Ignite.
> 
> Regards,
> 
> -- 
> Ilya Kasnacheev
> 
> 2017-11-11 19:12 GMT+03:00 Amit Pundir <am...@gmail.com>:
> Hi Ilya,
> Thanks for the response.
> 
> I have been following the release notes for every release - 2.1/2.2/2.3. I
> haven't seen any fixes around this (or similar sounding) issue. Since I am
> using Ignite is a very critical application, I would like to use a stable
> version which meets my requirements. I don't have a usecase for disk
> persistence so I haven't upgraded.
> 
> If there is an open transaction in the grid and OOM happens on one of the
> client node, would it stall the complete cluster? I have tried to allocate
> enough memory to the cluster but there is chance of creating hot spots with
> some nodes getting higher share of cache occupancy.
> 
> I'll share the logs soon.
> 
> 
> Thanks
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 


Re: Out of memory in client node freezes complete cluster

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

My recommendation here is to always leave some extra RAM and heap so that a
hot spot won't cause OOM. Maybe use less RAM-intensive algorithms.

Without stack traces and logs it's hard to say more, but OOM may not be a
recoverable error with Ignite.

Regards,

-- 
Ilya Kasnacheev

2017-11-11 19:12 GMT+03:00 Amit Pundir <am...@gmail.com>:

> Hi Ilya,
> Thanks for the response.
>
> I have been following the release notes for every release - 2.1/2.2/2.3. I
> haven't seen any fixes around this (or similar sounding) issue. Since I am
> using Ignite is a very critical application, I would like to use a stable
> version which meets my requirements. I don't have a usecase for disk
> persistence so I haven't upgraded.
>
> If there is an open transaction in the grid and OOM happens on one of the
> client node, would it stall the complete cluster? I have tried to allocate
> enough memory to the cluster but there is chance of creating hot spots with
> some nodes getting higher share of cache occupancy.
>
> I'll share the logs soon.
>
>
> Thanks
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Out of memory in client node freezes complete cluster

Posted by Amit Pundir <am...@gmail.com>.
Hi Ilya,
Thanks for the response.

I have been following the release notes for every release - 2.1/2.2/2.3. I
haven't seen any fixes around this (or similar sounding) issue. Since I am
using Ignite is a very critical application, I would like to use a stable
version which meets my requirements. I don't have a usecase for disk
persistence so I haven't upgraded.

If there is an open transaction in the grid and OOM happens on one of the
client node, would it stall the complete cluster? I have tried to allocate
enough memory to the cluster but there is chance of creating hot spots with
some nodes getting higher share of cache occupancy.

I'll share the logs soon.


Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Out of memory in client node freezes complete cluster

Posted by "ilya.kasnacheev" <il...@gmail.com>.
Hello!

I would recommend using 2.2 or 2.3 and not 2.0.

Having said that, it makes sense to avoid OOM because in many places
behavior is undefined once you hit OOM. It should not be hard to avoid.

It should not cause cluster to hang, but without logs from server nodes it's
hard to understand what went wrong. Care to provide logs from server node
slightly before and after the client had suffered failure?

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/