You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Zhenya Stanilovsky <ar...@mail.ru.INVALID> on 2022/03/23 09:54:00 UTC

Re[2]: [DISCUSSION] Error handling in Ignite 3

Ivan, thanks for this effort, as for me - looks good.


 
>Hi everyone,
>
>I'd like to continue the discussion. I created IEP with the attempt to
>summarize
>all the information above, you can find it here [1]. What do you think?
>
>[1]
>https://cwiki.apache.org/confluence/display/IGNITE/IEP-84%3A+Error+handling
>
>ср, 21 апр. 2021 г. в 20:46, Alexei Scherbakov < alexey.scherbakoff@gmail.com
>>:
>
>> Alexei,
>>
>> I think it's ok to do error conversion if it makes sense, but better to
>> preserve the root cause whenever possible.
>> Another way to solve the described scenario is to introduce something like
>> checked IgniteRetryAgainException, which forces the user to retry or ignore
>> it.
>> It's difficult to foresee exactly at this point what is the best solution
>> without knowing the exact scenario.
>>
>> Andrey,
>>
>> I've just realized your proposal to add IGN to each string code. This is ok
>> to me, for example IGN-TBL-0001
>>
>> ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk < alexey.goncharuk@gmail.com
>> >:
>>
>> > Aleksei,
>> >
>> > > The method should always report root cause, in your example it will be
>> > > B-xxxx, no matter which module API is called
>> >
>> > I may be wrong, but I doubt this will be usable for an end-user. Let's
>> > imagine that the same root exception was raised in different contexts
>> > resulting in two outcomes. The first one is safe to retry (say, the root
>> > cause led to a transaction prepare failure), but the second outcome may
>> be
>> > a state in which no matter how many retries will be attempted, the
>> > operation will never succeed. Only the upper-level layer can tell the
>> > difference and return a proper message to the user, so I would say that
>> > some error conversion/wrapping will be required no matter what.
>> >
>> > --AG
>> >
>> > пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
>> >  alexey.scherbakoff@gmail.com
>> > >:
>> >
>> > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
>> >  andrey.mashenkov@gmail.com
>> > > >:
>> > >
>> > > > Hi Alexey,
>> > > > I like the idea.
>> > > >
>> > > > 1.
>> > > >
>> > > > > TBL-0001 is a *string representation* of the error. It is built
>> > from
>> > > 2
>> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
>> > > > > internally packed in int. No any kind of parsing will be necessary
>> to
>> > > > read
>> > > > > scope/category.
>> > > >
>> > > > I think Alexey mean if it will be possible to make smth like that
>> > > >
>> > > > catch (IgniteException e) {
>> > > > if (e.getScope() == "TBL" && e.getCode() == 1234)
>> > > > continue; // E.g. retry my TX
>> > > > }
>> > > >
>> > > > It looks useful to me.
>> > > >
>> > >
>> > > I have in mind something like this:
>> > >
>> > > public class IgniteException extends RuntimeException {
>> > > private int errorCode;
>> > >
>> > > public IgniteException(ErrorScope scope, int code, String message,
>> > > Throwable cause) {
>> > > super(message, cause);
>> > > this.errorCode = make(scope, code);
>> > > }
>> > >
>> > > public boolean matches(ErrorScope scope, int code) {
>> > > return errorCode == make(scope, code);
>> > > }
>> > >
>> > > private int make(ErrorScope scope, int code) {
>> > > return ((scope.ordinal() << 16) | code);
>> > > }
>> > >
>> > > public ErrorScope scope() {
>> > > return ErrorScope.values()[errorCode >> 16];
>> > > }
>> > >
>> > > public int code() {
>> > > return 0xFFFF & errorCode;
>> > > }
>> > >
>> > > public static void main(String[] args) {
>> > > IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
>> > "test",
>> > > null);
>> > >
>> > > System.out.println(e.matches(ErrorScope.RAFT, 2));
>> > > System.out.println(e.scope());
>> > > System.out.println(e.code());
>> > >
>> > > try {
>> > > throw e;
>> > > }
>> > > catch (IgniteException ee) {
>> > > System.out.println(ee.matches(ErrorScope.RAFT, 1));
>> > > }
>> > > }
>> > > }
>> > >
>> > >
>> > > >
>> > > > 2. How you see a cross-module exception throwing?
>> > > > Assume, user call -> module A, which recursively call -> module B,
>> > which
>> > > > fails.
>> > > > So, module A component calls a module B component and got an
>> Exception
>> > > with
>> > > > "B-1234" exception.
>> > > > Module A do not expect any exception here and doesn't take care of
>> > > "B-xxx"
>> > > > error codes, but only "A-yyyy.
>> > > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
>> > > > or reuse "B-xxxx" code with the own message, pointing original
>> > exception
>> > > as
>> > > > a cause for both cases?
>> > > >
>> > > > The first approach may looks confusing, while the second one produces
>> > too
>> > > > many "UNK-" codes.
>> > > > What code should get the user in that case?
>> > > >
>> > > >
>> > > >
>> > > The method should always report root cause, in your example it will be
>> > > B-xxxx, no matter which module API is called.
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
>> > > >  alexey.scherbakoff@gmail.com > wrote:
>> > > >
>> > > > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
>> > >  ilya.kasnacheev@gmail.com
>> > > > >:
>> > > > >
>> > > > > > Hello!
>> > > > > >
>> > > > > > > All public methods throw only unchecked
>> > > > > > org.apache.ignite.lang.IgniteException containing aforementioned
>> > > > fields.
>> > > > > > > Each public method must have a section in the javadoc with a
>> list
>> > > of
>> > > > > all
>> > > > > > possible error codes for this method.
>> > > > > >
>> > > > > > I don't think this is feasible at all.
>> > > > > > Imagine javadoc for cache.get() method featuring three pages of
>> > > > possible
>> > > > > > error codes thrown by this method.
>> > > > > >
>> > > > >
>> > > > > Of course there is no need to write 3 pages of error codes, this
>> > makes
>> > > no
>> > > > > sense.
>> > > > > I think we can use error ranges here, or, probably, document most
>> > > > important
>> > > > > error scenarios.
>> > > > > The point here is to try to document errors as much as possible.
>> > > > >
>> > > > >
>> > > > > > Also, updated every two weeks to account for changes in
>> > > > > > underlying mechanisms.
>> > > > > >
>> > > > > > There is still a confusion between "error code for any error in
>> > logs"
>> > > > and
>> > > > > > "error code for any pair of method & exception". Which one are we
>> > > > > > discussing really?
>> > > > > >
>> > > > > > This is impossible to track or test, but is susceptible for
>> common
>> > > > > > error-hiding antipattern where all exceptions are caught in
>> > > cache.get()
>> > > > > and
>> > > > > > discarded, and instead a brand new CH-0001 "error in cache.get()"
>> > is
>> > > > > thrown
>> > > > > > to the user.
>> > > > > >
>> > > > > > Which is certainly not something that anybody wants.
>> > > > > >
>> > > > >
>> > > > > Certainly not. We are talking here about root cause - what is
>> exactly
>> > > the
>> > > > > reason for method call failure.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > Regards,
>> > > > > > --
>> > > > > > Ilya Kasnacheev
>> > > > > >
>> > > > > >
>> > > > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
>> >  vldpyatkov@gmail.com
>> > > >:
>> > > > > >
>> > > > > > > Hi Alexei,
>> > > > > > >
>> > > > > > > > Each public method *must *have a section in the javadoc with
>> a
>> > > list
>> > > > > of
>> > > > > > > all possible error codes for this method.
>> > > > > > >
>> > > > > > > I consider it is redundant, because any public exception can be
>> > > > thrown
>> > > > > > from
>> > > > > > > public API.
>> > > > > > > If it not happens today, it may change tomorrow: one will be
>> > > removed,
>> > > > > > > another one will be added.
>> > > > > > >
>> > > > > > > >Nested exceptions are not forbidden to use. They can provide
>> > > > > additional
>> > > > > > > details on the error for debug purposes
>> > > > > > >
>> > > > > > > I see another issue, which is in the Ignite 2.x, but not attend
>> > > here.
>> > > > > We
>> > > > > > > can have a deep nested exception and use it for handling.
>> > > > > > > In the result, all time when we are handling an exception we
>> use
>> > > > > > > pattern like this:
>> > > > > > > try{
>> > > > > > > ...
>> > > > > > > }
>> > > > > > > catch (Exception e) {
>> > > > > > > if (X.hasCause(e, TimeoutException.class))
>> > > > > > > throw e;
>> > > > > > >
>> > > > > > > if (X.hasCause(e, ConnectException.class,
>> > EOFException.class))
>> > > > > > > continue;
>> > > > > > >
>> > > > > > > if (X.hasCause(e, InterruptedException.class))
>> > > > > > > return false;
>> > > > > > > }
>> > > > > > >
>> > > > > > > If we have a pant to make only one exception to the client
>> side,
>> > we
>> > > > can
>> > > > > > > also do it for an internal exception.
>> > > > > > >
>> > > > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
>> > > > > > >  alexey.scherbakoff@gmail.com > wrote:
>> > > > > > >
>> > > > > > > > Alexey,
>> > > > > > > >
>> > > > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
>> > > > > >  kukushkinalexey@gmail.com
>> > > > > > > >:
>> > > > > > > >
>> > > > > > > > > Just some points looking questionable to me, although I
>> > realize
>> > > > the
>> > > > > > > error
>> > > > > > > > > handling style may be very opinionated:
>> > > > > > > > >
>> > > > > > > > > - Would it make sense splitting the proposed composite
>> > error
>> > > > > code
>> > > > > > > > > (TBL-0001) into separate numeric code (0001) and
>> > > > scope/category
>> > > > > > > > ("TBL")
>> > > > > > > > > to
>> > > > > > > > > avoid parsing the code when an error handler needs to
>> > > analyze
>> > > > > only
>> > > > > > > the
>> > > > > > > > > category or the code?
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > TBL-0001 is a *string representation* of the error. It is
>> > built
>> > > > > from
>> > > > > > 2
>> > > > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001).
>> > Both
>> > > > > > > > internally packed in int. No any kind of parsing will be
>> > > necessary
>> > > > to
>> > > > > > > read
>> > > > > > > > scope/category.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - "*The cause - short string description of an issue,
>> > > readable
>> > > > > by
>> > > > > > > > > user.*".
>> > > > > > > > > This terminology sounds confusing to me since that
>> "cause"
>> > > > > sounds
>> > > > > > > like
>> > > > > > > > > Java
>> > > > > > > > > Throwable's Message to me and the "Cause" is a lower
>> level
>> > > > > > > exception.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > The string describes the cause of error, so the name. I'm ok
>> to
>> > > > > rename
>> > > > > > it
>> > > > > > > > to a message. It will be stored in IgniteException.message
>> > field
>> > > > > > anyway.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - "*The action - steps for a user to resolve error...*".
>> > The
>> > > > > > action
>> > > > > > > is
>> > > > > > > > > very important but do we want to make it part of the
>> > > > > > > IgniteException?
>> > > > > > > > I
>> > > > > > > > > do
>> > > > > > > > > not think the recovery action text should be part of the
>> > > > > > exception.
>> > > > > > > > > IgniteException may include a URL pointing to the
>> > > > corresponding
>> > > > > > > > > documentation - this is discussable.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > This will not be the part of the exception. A user should
>> visit
>> > > the
>> > > > > > > > documentation page and read the action section by
>> corresponding
>> > > > error
>> > > > > > > code.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - "*All public methods throw only unchecked
>> > > IgniteException*"
>> > > > -
>> > > > > I
>> > > > > > > > assume
>> > > > > > > > > we still respect JCache specification and prefer using
>> > > > standard
>> > > > > > Java
>> > > > > > > > > exception to signal about invalid parameters.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Using standard java exceptions whenever possible makes sense.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why we do not follow the "classic" structured
>> exception
>> > > > > handling
>> > > > > > > > > practices in Ignite:
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Ignite 3 will be multi language, and other languages use
>> other
>> > > > error
>> > > > > > > > processing models. SQL for example uses error codes.
>> > > > > > > > The single exception approach simplifies and unifies error
>> > > handling
>> > > > > > > across
>> > > > > > > > platforms for me.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why do we not allow using checked exceptions? It
>> > seems
>> > > to
>> > > > > me
>> > > > > > > > > sometimes forcing the user to handle an error serves
>> > as a
>> > > > > hint
>> > > > > > > and
>> > > > > > > > > thus
>> > > > > > > > > improves usability. For example, handling an
>> > > > > > > optimistic/pessimistic
>> > > > > > > > > transaction conflict/deadlock. Or handling a timeout
>> > for
>> > > > > > > operations
>> > > > > > > > > with
>> > > > > > > > > timeouts.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > A valid point. Checked exceptions must be used for whose
>> > methods,
>> > > > > where
>> > > > > > > > error handling is enforced, for example tx optimistic
>> failure.
>> > > > > > > > Such errors will also have corresponding error codes.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why single public IgniteException and no exception
>> > > > > hierarchy?
>> > > > > > > > Java
>> > > > > > > > > is optimized for structured exception handling
>> instead
>> > of
>> > > > > using
>> > > > > > > > > IF-ELSE to
>> > > > > > > > > analyze the codes.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Exception hierarchy is not required when using error codes
>> and
>> > > > > > applicable
>> > > > > > > > only to java API, so I would avoid spending efforts on it.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why no nested exceptions? Sometimes an error
>> handler
>> > is
>> > > > > > > > interested
>> > > > > > > > > only in high level exceptions (like Invalid
>> > > Configuration)
>> > > > > and
>> > > > > > > > > sometimes
>> > > > > > > > > details are needed (like specific configuration
>> parser
>> > > > > > > exceptions).
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Nested exceptions are not forbidden to use. They can provide
>> > > > > additional
>> > > > > > > > details on the error for debug purposes, but not strictly
>> > > required,
>> > > > > > > because
>> > > > > > > > error code + message should provide enough information to the
>> > > user.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - For async methods returning a Future we may have a
>> > > universal
>> > > > > > rule
>> > > > > > > on
>> > > > > > > > > how to handle exceptions. For example, we may specify
>> that
>> > > any
>> > > > > > async
>> > > > > > > > > method
>> > > > > > > > > can throw only invalid argument exceptions. All other
>> > errors
>> > > > are
>> > > > > > > > > reported
>> > > > > > > > > via the exceptionally(IgniteException -> {}) callback
>> even
>> > > if
>> > > > > the
>> > > > > > > > async
>> > > > > > > > > method was executed synchronously.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > This is ok to me.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
>> > > > > > > > >  alexey.scherbakoff@gmail.com
>> > > > > > > > > >:
>> > > > > > > > >
>> > > > > > > > > > Igniters,
>> > > > > > > > > >
>> > > > > > > > > > I would like to start the discussion about error handling
>> > in
>> > > > > > Ignite 3
>> > > > > > > > and
>> > > > > > > > > > how we can improve it compared to Ignite 2.
>> > > > > > > > > >
>> > > > > > > > > > The error handling in Ignite 2 was not very good because
>> of
>> > > > > generic
>> > > > > > > > > > CacheException thrown on almost any occasion, having
>> deeply
>> > > > > nested
>> > > > > > > root
>> > > > > > > > > > cause and often containing no useful information on
>> further
>> > > > steps
>> > > > > > to
>> > > > > > > > fix
>> > > > > > > > > > the issue.
>> > > > > > > > > >
>> > > > > > > > > > I aim to fix it by introducing some rules on error
>> > handling.
>> > > > > > > > > >
>> > > > > > > > > > *Public exception structure.*
>> > > > > > > > > >
>> > > > > > > > > > A public exception must have an error code, a cause, and
>> an
>> > > > > action.
>> > > > > > > > > >
>> > > > > > > > > > * The code - the combination of 2 byte scope id and 2
>> byte
>> > > > error
>> > > > > > > number
>> > > > > > > > > > within the module. This allows up to 2^16 errors for each
>> > > > scope,
>> > > > > > > which
>> > > > > > > > > > should be enough. The error code string representation
>> can
>> > > look
>> > > > > > like
>> > > > > > > > > > RFT-0001 or TBL-0001
>> > > > > > > > > > * The cause - short string description of an issue,
>> > readable
>> > > by
>> > > > > > user.
>> > > > > > > > > This
>> > > > > > > > > > can have dynamic parameters depending on the error type
>> for
>> > > > > better
>> > > > > > > user
>> > > > > > > > > > experience, like "Can't write a snapshot, no space left
>> on
>> > > > device
>> > > > > > > {0}"
>> > > > > > > > > > * The action - steps for a user to resolve error
>> situation
>> > > > > > described
>> > > > > > > in
>> > > > > > > > > the
>> > > > > > > > > > documentation in the corresponding error section, for
>> > example
>> > > > > > "Clean
>> > > > > > > up
>> > > > > > > > > > disk space and retry the operation".
>> > > > > > > > > >
>> > > > > > > > > > Common errors should have their own scope, for example
>> > > IGN-0001
>> > > > > > > > > >
>> > > > > > > > > > All public methods throw only unchecked
>> > > > > > > > > > org.apache.ignite.lang.IgniteException containing
>> > > > aforementioned
>> > > > > > > > fields.
>> > > > > > > > > > Each public method must have a section in the javadoc
>> with
>> > a
>> > > > list
>> > > > > > of
>> > > > > > > > all
>> > > > > > > > > > possible error codes for this method.
>> > > > > > > > > >
>> > > > > > > > > > A good example with similar structure can be found here
>> [1]
>> > > > > > > > > >
>> > > > > > > > > > *Async timeouts.*
>> > > > > > > > > >
>> > > > > > > > > > Because almost all API methods in Ignite 3 are async,
>> they
>> > > all
>> > > > > will
>> > > > > > > > have
>> > > > > > > > > a
>> > > > > > > > > > configurable default timeout and can complete with
>> timeout
>> > > > error
>> > > > > > if a
>> > > > > > > > > > computation is not finished in time, for example if a
>> > > response
>> > > > > has
>> > > > > > > not
>> > > > > > > > > been
>> > > > > > > > > > yet received.
>> > > > > > > > > > I suggest to complete the async op future with
>> > > TimeoutException
>> > > > > in
>> > > > > > > this
>> > > > > > > > > > case to make it on par with synchronous execution using
>> > > > > future.get,
>> > > > > > > > which
>> > > > > > > > > > will throw java.util.concurrent.TimeoutException on
>> > timeout.
>> > > > > > > > > > For reference, see
>> > > > > java.util.concurrent.CompletableFuture#orTimeout
>> > > > > > > > > > No special error code should be used for this scenario.
>> > > > > > > > > >
>> > > > > > > > > > *Internal exceptions hierarchy.*
>> > > > > > > > > >
>> > > > > > > > > > All internal exceptions should extend
>> > > > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException
>> for
>> > > > > checked
>> > > > > > > > > > exceptions and
>> > > > > > > > > >
>> > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
>> > > > > for
>> > > > > > > > > > unchecked exceptions.
>> > > > > > > > > >
>> > > > > > > > > > Thoughts ?
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > >
>> >  https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
>> > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > >
>> > > > > > > > > > Best regards,
>> > > > > > > > > > Alexei Scherbakov
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Best regards,
>> > > > > > > > > Alexey
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > >
>> > > > > > > > Best regards,
>> > > > > > > > Alexei Scherbakov
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Vladislav Pyatkov
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Best regards,
>> > > > > Alexei Scherbakov
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best regards,
>> > > > Andrey V. Mashenkov
>> > > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best regards,
>> > > Alexei Scherbakov
>> > >
>> >
>>
>>
>> --
>>
>> Best regards,
>> Alexei Scherbakov
>>
>
>
>--
>Sincerely yours,
>Ivan Bessonov