You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexey Kukushkin <ku...@gmail.com> on 2022/01/12 06:09:42 UTC

[DISCUSSION] IgniteOutOfMemoryException may not be a critical failure

Hi Igniters,

Currently Ignite treats the "not enough data region capacity" case as a
critical failure and does not allow configuring any of the default critical
failure handlers to ignore that error.

In our company we have different teams using Apache Ignite and none of them
wants to apply a default "stop server" or "restart server" handler when
encountering the problem. We rather want to report this problem to DevOps
and the end users.

We developed a custom failure handler to deal with the problem but the
solution is really clumsy. And the most important thing is we think
treating this problem as a critical failure is not what most users would
want.

What do you think about enhancing Ignite not to treat the "not enough data
region capacity" case as a critical failure?

We opened IGNITE-16272 <https://issues.apache.org/jira/browse/IGNITE-16272> for
this discussion with the description below:

The Problem
Ignite raises the IgniteOutOfMemoryException
<https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/internal/mem/IgniteOutOfMemoryException.java>if
a data region size is exceeded when trying to add more data to a cache.
Ignite considers the IgniteOutOfMemoryException as a critical failure. This
causes shutting down the Ignite server with the default failure handler.

However, reaching the data region capacity does not seem to be such a
critical problem requiring the server shutdown or restart. For example, in
our application we just want to report this problem back to the users and
notify the DevOps without applying the critical failure handler. To achieve
that, we had to define a custom FailureHandler that detects and ignores the
IgniteOutOfMemoryException and all the caused by the
IgniteOutOfMemoryException, allowing the final exception to reach the
application. This solution is clumsy and unreliable since it uses the
internal IgniteOutOfMemoryException definition and relies on a complex
secondary exception structure trying to find the IgniteOutOfMemoryException
among the suppressed exception and causes.

Ignite out-of-the-box failure handlers have the ignoredFailure property
that allows filtering out some kinds of failures. However, the
IgniteOutOfMemoryException is not among the FailureType
<https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/failure/FailureType.java>that
can be ignored.

The Proposal

   1. Does anyone really want to treat the "data region capacity exceeded"
   problem as a critical failure and stop or restart the server?
      - Consider never treating this condition as a critical failure. This
      change is not backward compatible.
      - Or add another item to the FailureType enumeration to optionally
      allow the users not to have that treated as a critical failure. This is
      backward-compatible.
   2. Make the IgniteOutOfMemoryException a public API (now it is in the
   internal package)
   3. Consider renaming IgniteOutOfMemoryException (for example, to
   something like NotEnoughStorageException) since the current name is similar
   to a really critical and usually unrecoverable Java's OutOfMemoryError
   although the IgniteOutOfMemoryException is not that critical.

--
Best regards,
Alexey

Re: [DISCUSSION] IgniteOutOfMemoryException may not be a critical failure

Posted by Valentin Kulichenko <va...@gmail.com>.
I tend to agree that providing proper exception to the client is enough in
this case, no need to stop server nodes. However, I believe that's how it
used to work before we added failure handlers. So probably there was a
reason for the current implementation? Does anyone know?

-Val

On Tue, Jan 11, 2022 at 10:10 PM Alexey Kukushkin <ku...@gmail.com>
wrote:

> Hi Igniters,
>
> Currently Ignite treats the "not enough data region capacity" case as a
> critical failure and does not allow configuring any of the default critical
> failure handlers to ignore that error.
>
> In our company we have different teams using Apache Ignite and none of them
> wants to apply a default "stop server" or "restart server" handler when
> encountering the problem. We rather want to report this problem to DevOps
> and the end users.
>
> We developed a custom failure handler to deal with the problem but the
> solution is really clumsy. And the most important thing is we think
> treating this problem as a critical failure is not what most users would
> want.
>
> What do you think about enhancing Ignite not to treat the "not enough data
> region capacity" case as a critical failure?
>
> We opened IGNITE-16272 <https://issues.apache.org/jira/browse/IGNITE-16272>
> for
> this discussion with the description below:
>
> The Problem
> Ignite raises the IgniteOutOfMemoryException
> <
> https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/internal/mem/IgniteOutOfMemoryException.java
> >if
> a data region size is exceeded when trying to add more data to a cache.
> Ignite considers the IgniteOutOfMemoryException as a critical failure. This
> causes shutting down the Ignite server with the default failure handler.
>
> However, reaching the data region capacity does not seem to be such a
> critical problem requiring the server shutdown or restart. For example, in
> our application we just want to report this problem back to the users and
> notify the DevOps without applying the critical failure handler. To achieve
> that, we had to define a custom FailureHandler that detects and ignores the
> IgniteOutOfMemoryException and all the caused by the
> IgniteOutOfMemoryException, allowing the final exception to reach the
> application. This solution is clumsy and unreliable since it uses the
> internal IgniteOutOfMemoryException definition and relies on a complex
> secondary exception structure trying to find the IgniteOutOfMemoryException
> among the suppressed exception and causes.
>
> Ignite out-of-the-box failure handlers have the ignoredFailure property
> that allows filtering out some kinds of failures. However, the
> IgniteOutOfMemoryException is not among the FailureType
> <
> https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/failure/FailureType.java
> >that
> can be ignored.
>
> The Proposal
>
>    1. Does anyone really want to treat the "data region capacity exceeded"
>    problem as a critical failure and stop or restart the server?
>       - Consider never treating this condition as a critical failure. This
>       change is not backward compatible.
>       - Or add another item to the FailureType enumeration to optionally
>       allow the users not to have that treated as a critical failure. This
> is
>       backward-compatible.
>    2. Make the IgniteOutOfMemoryException a public API (now it is in the
>    internal package)
>    3. Consider renaming IgniteOutOfMemoryException (for example, to
>    something like NotEnoughStorageException) since the current name is
> similar
>    to a really critical and usually unrecoverable Java's OutOfMemoryError
>    although the IgniteOutOfMemoryException is not that critical.
>
> --
> Best regards,
> Alexey
>