You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexander Polovtcev <al...@gmail.com> on 2022/04/08 15:09:04 UTC

Re: [DISCUSSION] Error handling in Ignite 3

Ivan, thanks for the prepared document! I've left a couple questions.

On Wed, Mar 23, 2022 at 10:49 AM Ivan Bessonov <be...@gmail.com>
wrote:

> Hi everyone,
>
> I'd like to continue the discussion. I created IEP with the attempt to
> summarize
> all the information above, you can find it here [1]. What do you think?
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-84%3A+Error+handling
>
> ср, 21 апр. 2021 г. в 20:46, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> >:
>
> > Alexei,
> >
> > I think it's ok to do error conversion if it makes sense, but better to
> > preserve the root cause whenever possible.
> > Another way to solve the described scenario is to introduce something
> like
> > checked IgniteRetryAgainException, which forces the user to retry or
> ignore
> > it.
> > It's difficult to foresee exactly at this point what is the best solution
> > without knowing the exact scenario.
> >
> > Andrey,
> >
> > I've just realized your proposal to add IGN to each string code. This is
> ok
> > to me, for example IGN-TBL-0001
> >
> > ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> >
> > > Aleksei,
> > >
> > > > The method should always report root cause, in your example it will
> be
> > > > B-xxxx, no matter which module API is called
> > >
> > > I may be wrong, but I doubt this will be usable for an end-user. Let's
> > > imagine that the same root exception was raised in different contexts
> > > resulting in two outcomes. The first one is safe to retry (say, the
> root
> > > cause led to a transaction prepare failure), but the second outcome may
> > be
> > > a state in which no matter how many retries will be attempted, the
> > > operation will never succeed. Only the upper-level layer can tell the
> > > difference and return a proper message to the user, so I would say that
> > > some error conversion/wrapping will be required no matter what.
> > >
> > > --AG
> > >
> > > пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com
> > > >:
> > >
> > > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> > > andrey.mashenkov@gmail.com
> > > > >:
> > > >
> > > > > Hi Alexey,
> > > > > I like the idea.
> > > > >
> > > > > 1.
> > > > >
> > > > > >   TBL-0001 is a *string representation* of the error. It is built
> > > from
> > > > 2
> > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > > > internally packed in int. No any kind of parsing will be
> necessary
> > to
> > > > > read
> > > > > > scope/category.
> > > > >
> > > > > I think Alexey mean if it will be possible to make smth like that
> > > > >
> > > > > catch (IgniteException e) {
> > > > >     if (e.getScope() == "TBL" && e.getCode() == 1234)
> > > > >         continue; // E.g. retry my TX
> > > > > }
> > > > >
> > > > > It looks useful to me.
> > > > >
> > > >
> > > > I have in mind something like this:
> > > >
> > > > public class IgniteException extends RuntimeException {
> > > >     private int errorCode;
> > > >
> > > >     public IgniteException(ErrorScope scope, int code, String
> message,
> > > > Throwable cause) {
> > > >         super(message, cause);
> > > >         this.errorCode = make(scope, code);
> > > >     }
> > > >
> > > >     public boolean matches(ErrorScope scope, int code) {
> > > >         return errorCode == make(scope, code);
> > > >     }
> > > >
> > > >     private int make(ErrorScope scope, int code) {
> > > >         return ((scope.ordinal() << 16) | code);
> > > >     }
> > > >
> > > >     public ErrorScope scope() {
> > > >         return ErrorScope.values()[errorCode >> 16];
> > > >     }
> > > >
> > > >     public int code() {
> > > >         return 0xFFFF & errorCode;
> > > >     }
> > > >
> > > >     public static void main(String[] args) {
> > > >         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> > > "test",
> > > > null);
> > > >
> > > >         System.out.println(e.matches(ErrorScope.RAFT, 2));
> > > >         System.out.println(e.scope());
> > > >         System.out.println(e.code());
> > > >
> > > >         try {
> > > >             throw e;
> > > >         }
> > > >         catch (IgniteException ee) {
> > > >             System.out.println(ee.matches(ErrorScope.RAFT, 1));
> > > >         }
> > > >     }
> > > > }
> > > >
> > > >
> > > > >
> > > > > 2. How you see a cross-module exception throwing?
> > > > > Assume, user call -> module A, which recursively call -> module B,
> > > which
> > > > > fails.
> > > > > So, module A component calls a module B component and got an
> > Exception
> > > > with
> > > > > "B-1234" exception.
> > > > > Module A do not expect any exception here and doesn't take care of
> > > > "B-xxx"
> > > > > error codes, but only "A-yyyy.
> > > > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001")
> code
> > > > > or reuse "B-xxxx" code with the own message, pointing original
> > > exception
> > > > as
> > > > > a cause for both cases?
> > > > >
> > > > > The first approach may looks confusing, while the second one
> produces
> > > too
> > > > > many "UNK-" codes.
> > > > > What code should get the user in that case?
> > > > >
> > > > >
> > > > >
> > > > The method should always report root cause, in your example it will
> be
> > > > B-xxxx, no matter which module API is called.
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >
> > > > > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> > > > ilya.kasnacheev@gmail.com
> > > > > >:
> > > > > >
> > > > > > > Hello!
> > > > > > >
> > > > > > > > All public methods throw only unchecked
> > > > > > > org.apache.ignite.lang.IgniteException containing
> aforementioned
> > > > > fields.
> > > > > > > > Each public method must have a section in the javadoc with a
> > list
> > > > of
> > > > > > all
> > > > > > > possible error codes for this method.
> > > > > > >
> > > > > > > I don't think this is feasible at all.
> > > > > > > Imagine javadoc for cache.get() method featuring three pages of
> > > > > possible
> > > > > > > error codes thrown by this method.
> > > > > > >
> > > > > >
> > > > > > Of course there is no need to write 3 pages of error codes, this
> > > makes
> > > > no
> > > > > > sense.
> > > > > > I think we can use error ranges here, or, probably, document most
> > > > > important
> > > > > > error scenarios.
> > > > > > The point here is to try to document errors as much as possible.
> > > > > >
> > > > > >
> > > > > > > Also, updated every two weeks to account for changes in
> > > > > > > underlying mechanisms.
> > > > > > >
> > > > > > > There is still a confusion between "error code for any error in
> > > logs"
> > > > > and
> > > > > > > "error code for any pair of method & exception". Which one are
> we
> > > > > > > discussing really?
> > > > > > >
> > > > > > > This is impossible to track or test, but is susceptible for
> > common
> > > > > > > error-hiding antipattern where all exceptions are caught in
> > > > cache.get()
> > > > > > and
> > > > > > > discarded, and instead a brand new CH-0001 "error in
> cache.get()"
> > > is
> > > > > > thrown
> > > > > > > to the user.
> > > > > > >
> > > > > > > Which is certainly not something that anybody wants.
> > > > > > >
> > > > > >
> > > > > > Certainly not. We are talking here about root cause - what is
> > exactly
> > > > the
> > > > > > reason for method call failure.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > --
> > > > > > > Ilya Kasnacheev
> > > > > > >
> > > > > > >
> > > > > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
> > > vldpyatkov@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Hi Alexei,
> > > > > > > >
> > > > > > > > > Each public method *must *have a section in the javadoc
> with
> > a
> > > > list
> > > > > > of
> > > > > > > > all possible error codes for this method.
> > > > > > > >
> > > > > > > > I consider it is redundant, because any public exception can
> be
> > > > > thrown
> > > > > > > from
> > > > > > > > public API.
> > > > > > > > If it not happens today, it may change tomorrow: one will be
> > > > removed,
> > > > > > > > another one will be added.
> > > > > > > >
> > > > > > > > >Nested exceptions are not forbidden to use. They can provide
> > > > > > additional
> > > > > > > > details on the error for debug purposes
> > > > > > > >
> > > > > > > > I see another issue, which is in the Ignite 2.x, but not
> attend
> > > > here.
> > > > > > We
> > > > > > > > can have a deep nested exception and use it for handling.
> > > > > > > > In the result, all time when we are handling an exception we
> > use
> > > > > > > > pattern like this:
> > > > > > > > try{
> > > > > > > > ...
> > > > > > > > }
> > > > > > > > catch (Exception e) {
> > > > > > > >     if (X.hasCause(e, TimeoutException.class))
> > > > > > > >         throw e;
> > > > > > > >
> > > > > > > >     if (X.hasCause(e, ConnectException.class,
> > > EOFException.class))
> > > > > > > >         continue;
> > > > > > > >
> > > > > > > >     if (X.hasCause(e, InterruptedException.class))
> > > > > > > >         return false;
> > > > > > > > }
> > > > > > > >
> > > > > > > > If we have a pant to make only one exception to the client
> > side,
> > > we
> > > > > can
> > > > > > > > also do it for an internal exception.
> > > > > > > >
> > > > > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Alexey,
> > > > > > > > >
> > > > > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > > > > > > kukushkinalexey@gmail.com
> > > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Just some points looking questionable to me, although I
> > > realize
> > > > > the
> > > > > > > > error
> > > > > > > > > > handling style may be very opinionated:
> > > > > > > > > >
> > > > > > > > > >    - Would it make sense splitting the proposed composite
> > > error
> > > > > > code
> > > > > > > > > >    (TBL-0001) into separate numeric code (0001) and
> > > > > scope/category
> > > > > > > > > ("TBL")
> > > > > > > > > > to
> > > > > > > > > >    avoid parsing the code when an error handler needs to
> > > > analyze
> > > > > > only
> > > > > > > > the
> > > > > > > > > >    category or the code?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >   TBL-0001 is a *string representation* of the error. It is
> > > built
> > > > > > from
> > > > > > > 2
> > > > > > > > > byte scope id (mapped to name TBL) and 2 byte number
> (0001).
> > > Both
> > > > > > > > > internally packed in int. No any kind of parsing will be
> > > > necessary
> > > > > to
> > > > > > > > read
> > > > > > > > > scope/category.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - "*The cause - short string description of an issue,
> > > > readable
> > > > > > by
> > > > > > > > > > user.*".
> > > > > > > > > >    This terminology sounds confusing to me since that
> > "cause"
> > > > > > sounds
> > > > > > > > like
> > > > > > > > > > Java
> > > > > > > > > >    Throwable's Message to me and the "Cause" is a lower
> > level
> > > > > > > > exception.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The string describes the cause of error, so the name. I'm
> ok
> > to
> > > > > > rename
> > > > > > > it
> > > > > > > > > to a message. It will be stored in IgniteException.message
> > > field
> > > > > > > anyway.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - "*The action - steps for a user to resolve
> error...*".
> > > The
> > > > > > > action
> > > > > > > > is
> > > > > > > > > >    very important but do we want to make it part of the
> > > > > > > > IgniteException?
> > > > > > > > > I
> > > > > > > > > > do
> > > > > > > > > >    not think the recovery action text should be part of
> the
> > > > > > > exception.
> > > > > > > > > >    IgniteException may include a URL pointing to the
> > > > > corresponding
> > > > > > > > > >    documentation - this is discussable.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This will not be the part of the exception. A user should
> > visit
> > > > the
> > > > > > > > > documentation page and read the action section by
> > corresponding
> > > > > error
> > > > > > > > code.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - "*All public methods throw only unchecked
> > > > IgniteException*"
> > > > > -
> > > > > > I
> > > > > > > > > assume
> > > > > > > > > >    we still respect JCache specification and prefer using
> > > > > standard
> > > > > > > Java
> > > > > > > > > >    exception to signal about invalid parameters.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Using standard java exceptions whenever possible makes
> sense.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - Why we do not follow the "classic" structured
> > exception
> > > > > > handling
> > > > > > > > > >    practices in Ignite:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ignite 3 will be multi language, and other languages use
> > other
> > > > > error
> > > > > > > > > processing models. SQL for example uses error codes.
> > > > > > > > > The single exception approach simplifies and unifies error
> > > > handling
> > > > > > > > across
> > > > > > > > > platforms for me.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >       - Why do we not allow using checked exceptions? It
> > > seems
> > > > to
> > > > > > me
> > > > > > > > > >       sometimes forcing the user to handle an error
> serves
> > > as a
> > > > > > hint
> > > > > > > > and
> > > > > > > > > > thus
> > > > > > > > > >       improves usability. For example, handling an
> > > > > > > > optimistic/pessimistic
> > > > > > > > > >       transaction conflict/deadlock. Or handling a
> timeout
> > > for
> > > > > > > > operations
> > > > > > > > > > with
> > > > > > > > > >       timeouts.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > A valid point. Checked exceptions must be used for whose
> > > methods,
> > > > > > where
> > > > > > > > > error handling is enforced, for example tx optimistic
> > failure.
> > > > > > > > > Such errors will also have corresponding error codes.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >       - Why single public IgniteException and no
> exception
> > > > > > hierarchy?
> > > > > > > > > Java
> > > > > > > > > >       is optimized for structured exception handling
> > instead
> > > of
> > > > > > using
> > > > > > > > > > IF-ELSE to
> > > > > > > > > >       analyze the codes.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Exception hierarchy is not required when using error codes
> > and
> > > > > > > applicable
> > > > > > > > > only to java API, so I would avoid spending efforts on it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >       - Why no nested exceptions? Sometimes an error
> > handler
> > > is
> > > > > > > > > interested
> > > > > > > > > >       only in high level exceptions (like Invalid
> > > > Configuration)
> > > > > > and
> > > > > > > > > > sometimes
> > > > > > > > > >       details are needed (like specific configuration
> > parser
> > > > > > > > exceptions).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Nested exceptions are not forbidden to use. They can
> provide
> > > > > > additional
> > > > > > > > > details on the error for debug purposes, but not strictly
> > > > required,
> > > > > > > > because
> > > > > > > > > error code + message should provide enough information to
> the
> > > > user.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - For async methods returning a Future we may have a
> > > > universal
> > > > > > > rule
> > > > > > > > on
> > > > > > > > > >    how to handle exceptions. For example, we may specify
> > that
> > > > any
> > > > > > > async
> > > > > > > > > > method
> > > > > > > > > >    can throw only invalid argument exceptions. All other
> > > errors
> > > > > are
> > > > > > > > > > reported
> > > > > > > > > >    via the exceptionally(IgniteException -> {}) callback
> > even
> > > > if
> > > > > > the
> > > > > > > > > async
> > > > > > > > > >    method was executed synchronously.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is ok to me.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > > > > > > alexey.scherbakoff@gmail.com
> > > > > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > I would like to start the discussion about error
> handling
> > > in
> > > > > > > Ignite 3
> > > > > > > > > and
> > > > > > > > > > > how we can improve it compared to Ignite 2.
> > > > > > > > > > >
> > > > > > > > > > > The error handling in Ignite 2 was not very good
> because
> > of
> > > > > > generic
> > > > > > > > > > > CacheException thrown on almost any occasion, having
> > deeply
> > > > > > nested
> > > > > > > > root
> > > > > > > > > > > cause and often containing no useful information on
> > further
> > > > > steps
> > > > > > > to
> > > > > > > > > fix
> > > > > > > > > > > the issue.
> > > > > > > > > > >
> > > > > > > > > > > I aim to fix it by introducing some rules on error
> > > handling.
> > > > > > > > > > >
> > > > > > > > > > > *Public exception structure.*
> > > > > > > > > > >
> > > > > > > > > > > A public exception must have an error code, a cause,
> and
> > an
> > > > > > action.
> > > > > > > > > > >
> > > > > > > > > > > * The code - the combination of 2 byte scope id and 2
> > byte
> > > > > error
> > > > > > > > number
> > > > > > > > > > > within the module. This allows up to 2^16 errors for
> each
> > > > > scope,
> > > > > > > > which
> > > > > > > > > > > should be enough. The error code string representation
> > can
> > > > look
> > > > > > > like
> > > > > > > > > > > RFT-0001 or TBL-0001
> > > > > > > > > > > * The cause - short string description of an issue,
> > > readable
> > > > by
> > > > > > > user.
> > > > > > > > > > This
> > > > > > > > > > > can have dynamic parameters depending on the error type
> > for
> > > > > > better
> > > > > > > > user
> > > > > > > > > > > experience, like "Can't write a snapshot, no space left
> > on
> > > > > device
> > > > > > > > {0}"
> > > > > > > > > > > * The action - steps for a user to resolve error
> > situation
> > > > > > > described
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > documentation in the corresponding error section, for
> > > example
> > > > > > > "Clean
> > > > > > > > up
> > > > > > > > > > > disk space and retry the operation".
> > > > > > > > > > >
> > > > > > > > > > > Common errors should have their own scope, for example
> > > > IGN-0001
> > > > > > > > > > >
> > > > > > > > > > > All public methods throw only unchecked
> > > > > > > > > > > org.apache.ignite.lang.IgniteException containing
> > > > > aforementioned
> > > > > > > > > fields.
> > > > > > > > > > > Each public method must have a section in the javadoc
> > with
> > > a
> > > > > list
> > > > > > > of
> > > > > > > > > all
> > > > > > > > > > > possible error codes for this method.
> > > > > > > > > > >
> > > > > > > > > > > A good example with similar structure can be found here
> > [1]
> > > > > > > > > > >
> > > > > > > > > > > *Async timeouts.*
> > > > > > > > > > >
> > > > > > > > > > > Because almost all API methods in Ignite 3 are async,
> > they
> > > > all
> > > > > > will
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > configurable default timeout and can complete with
> > timeout
> > > > > error
> > > > > > > if a
> > > > > > > > > > > computation is not finished in time, for example if a
> > > > response
> > > > > > has
> > > > > > > > not
> > > > > > > > > > been
> > > > > > > > > > > yet received.
> > > > > > > > > > > I suggest to complete the async op future with
> > > > TimeoutException
> > > > > > in
> > > > > > > > this
> > > > > > > > > > > case to make it on par with synchronous execution using
> > > > > > future.get,
> > > > > > > > > which
> > > > > > > > > > > will throw java.util.concurrent.TimeoutException on
> > > timeout.
> > > > > > > > > > > For reference, see
> > > > > > java.util.concurrent.CompletableFuture#orTimeout
> > > > > > > > > > > No special error code should be used for this scenario.
> > > > > > > > > > >
> > > > > > > > > > > *Internal exceptions hierarchy.*
> > > > > > > > > > >
> > > > > > > > > > > All internal exceptions should extend
> > > > > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException
> > for
> > > > > > checked
> > > > > > > > > > > exceptions and
> > > > > > > > > > >
> > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > > > > > for
> > > > > > > > > > > unchecked exceptions.
> > > > > > > > > > >
> > > > > > > > > > > Thoughts ?
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > >
> > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Alexei Scherbakov
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > > Alexey
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Alexei Scherbakov
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Vladislav Pyatkov
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrey V. Mashenkov
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


-- 
With regards,
Aleksandr Polovtcev