You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexei Scherbakov <al...@gmail.com> on 2021/04/13 09:08:05 UTC

[DISCUSSION] Error handling in Ignite 3

Igniters,

I would like to start the discussion about error handling in Ignite 3 and
how we can improve it compared to Ignite 2.

The error handling in Ignite 2 was not very good because of generic
CacheException thrown on almost any occasion, having deeply nested root
cause and often containing no useful information on further steps to fix
the issue.

I aim to fix it by introducing some rules on error handling.

*Public exception structure.*

A public exception must have an error code, a cause, and an action.

* The code - the combination of 2 byte scope id and 2 byte error number
within the module. This allows up to 2^16 errors for each scope, which
should be enough. The error code string representation can look like
RFT-0001 or TBL-0001
* The cause - short string description of an issue, readable by user. This
can have dynamic parameters depending on the error type for better user
experience, like "Can't write a snapshot, no space left on device {0}"
* The action - steps for a user to resolve error situation described in the
documentation in the corresponding error section, for example "Clean up
disk space and retry the operation".

Common errors should have their own scope, for example IGN-0001

All public methods throw only unchecked
org.apache.ignite.lang.IgniteException containing aforementioned fields.
Each public method must have a section in the javadoc with a list of all
possible error codes for this method.

A good example with similar structure can be found here [1]

*Async timeouts.*

Because almost all API methods in Ignite 3 are async, they all will have a
configurable default timeout and can complete with timeout error if a
computation is not finished in time, for example if a response has not been
yet received.
I suggest to complete the async op future with TimeoutException in this
case to make it on par with synchronous execution using future.get, which
will throw java.util.concurrent.TimeoutException on timeout.
For reference, see java.util.concurrent.CompletableFuture#orTimeout
No special error code should be used for this scenario.

*Internal exceptions hierarchy.*

All internal exceptions should extend
org.apache.ignite.internal.lang.IgniteInternalException for checked
exceptions and
org.apache.ignite.internal.lang.IgniteInternalCheckedException for
unchecked exceptions.

Thoughts ?

[1] https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm

-- 

Best regards,
Alexei Scherbakov

Re[2]: [DISCUSSION] Error handling in Ignite 3

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.
Ivan, thanks for this effort, as for me - looks good.


 
>Hi everyone,
>
>I'd like to continue the discussion. I created IEP with the attempt to
>summarize
>all the information above, you can find it here [1]. What do you think?
>
>[1]
>https://cwiki.apache.org/confluence/display/IGNITE/IEP-84%3A+Error+handling
>
>ср, 21 апр. 2021 г. в 20:46, Alexei Scherbakov < alexey.scherbakoff@gmail.com
>>:
>
>> Alexei,
>>
>> I think it's ok to do error conversion if it makes sense, but better to
>> preserve the root cause whenever possible.
>> Another way to solve the described scenario is to introduce something like
>> checked IgniteRetryAgainException, which forces the user to retry or ignore
>> it.
>> It's difficult to foresee exactly at this point what is the best solution
>> without knowing the exact scenario.
>>
>> Andrey,
>>
>> I've just realized your proposal to add IGN to each string code. This is ok
>> to me, for example IGN-TBL-0001
>>
>> ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk < alexey.goncharuk@gmail.com
>> >:
>>
>> > Aleksei,
>> >
>> > > The method should always report root cause, in your example it will be
>> > > B-xxxx, no matter which module API is called
>> >
>> > I may be wrong, but I doubt this will be usable for an end-user. Let's
>> > imagine that the same root exception was raised in different contexts
>> > resulting in two outcomes. The first one is safe to retry (say, the root
>> > cause led to a transaction prepare failure), but the second outcome may
>> be
>> > a state in which no matter how many retries will be attempted, the
>> > operation will never succeed. Only the upper-level layer can tell the
>> > difference and return a proper message to the user, so I would say that
>> > some error conversion/wrapping will be required no matter what.
>> >
>> > --AG
>> >
>> > пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
>> >  alexey.scherbakoff@gmail.com
>> > >:
>> >
>> > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
>> >  andrey.mashenkov@gmail.com
>> > > >:
>> > >
>> > > > Hi Alexey,
>> > > > I like the idea.
>> > > >
>> > > > 1.
>> > > >
>> > > > > TBL-0001 is a *string representation* of the error. It is built
>> > from
>> > > 2
>> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
>> > > > > internally packed in int. No any kind of parsing will be necessary
>> to
>> > > > read
>> > > > > scope/category.
>> > > >
>> > > > I think Alexey mean if it will be possible to make smth like that
>> > > >
>> > > > catch (IgniteException e) {
>> > > > if (e.getScope() == "TBL" && e.getCode() == 1234)
>> > > > continue; // E.g. retry my TX
>> > > > }
>> > > >
>> > > > It looks useful to me.
>> > > >
>> > >
>> > > I have in mind something like this:
>> > >
>> > > public class IgniteException extends RuntimeException {
>> > > private int errorCode;
>> > >
>> > > public IgniteException(ErrorScope scope, int code, String message,
>> > > Throwable cause) {
>> > > super(message, cause);
>> > > this.errorCode = make(scope, code);
>> > > }
>> > >
>> > > public boolean matches(ErrorScope scope, int code) {
>> > > return errorCode == make(scope, code);
>> > > }
>> > >
>> > > private int make(ErrorScope scope, int code) {
>> > > return ((scope.ordinal() << 16) | code);
>> > > }
>> > >
>> > > public ErrorScope scope() {
>> > > return ErrorScope.values()[errorCode >> 16];
>> > > }
>> > >
>> > > public int code() {
>> > > return 0xFFFF & errorCode;
>> > > }
>> > >
>> > > public static void main(String[] args) {
>> > > IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
>> > "test",
>> > > null);
>> > >
>> > > System.out.println(e.matches(ErrorScope.RAFT, 2));
>> > > System.out.println(e.scope());
>> > > System.out.println(e.code());
>> > >
>> > > try {
>> > > throw e;
>> > > }
>> > > catch (IgniteException ee) {
>> > > System.out.println(ee.matches(ErrorScope.RAFT, 1));
>> > > }
>> > > }
>> > > }
>> > >
>> > >
>> > > >
>> > > > 2. How you see a cross-module exception throwing?
>> > > > Assume, user call -> module A, which recursively call -> module B,
>> > which
>> > > > fails.
>> > > > So, module A component calls a module B component and got an
>> Exception
>> > > with
>> > > > "B-1234" exception.
>> > > > Module A do not expect any exception here and doesn't take care of
>> > > "B-xxx"
>> > > > error codes, but only "A-yyyy.
>> > > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
>> > > > or reuse "B-xxxx" code with the own message, pointing original
>> > exception
>> > > as
>> > > > a cause for both cases?
>> > > >
>> > > > The first approach may looks confusing, while the second one produces
>> > too
>> > > > many "UNK-" codes.
>> > > > What code should get the user in that case?
>> > > >
>> > > >
>> > > >
>> > > The method should always report root cause, in your example it will be
>> > > B-xxxx, no matter which module API is called.
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
>> > > >  alexey.scherbakoff@gmail.com > wrote:
>> > > >
>> > > > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
>> > >  ilya.kasnacheev@gmail.com
>> > > > >:
>> > > > >
>> > > > > > Hello!
>> > > > > >
>> > > > > > > All public methods throw only unchecked
>> > > > > > org.apache.ignite.lang.IgniteException containing aforementioned
>> > > > fields.
>> > > > > > > Each public method must have a section in the javadoc with a
>> list
>> > > of
>> > > > > all
>> > > > > > possible error codes for this method.
>> > > > > >
>> > > > > > I don't think this is feasible at all.
>> > > > > > Imagine javadoc for cache.get() method featuring three pages of
>> > > > possible
>> > > > > > error codes thrown by this method.
>> > > > > >
>> > > > >
>> > > > > Of course there is no need to write 3 pages of error codes, this
>> > makes
>> > > no
>> > > > > sense.
>> > > > > I think we can use error ranges here, or, probably, document most
>> > > > important
>> > > > > error scenarios.
>> > > > > The point here is to try to document errors as much as possible.
>> > > > >
>> > > > >
>> > > > > > Also, updated every two weeks to account for changes in
>> > > > > > underlying mechanisms.
>> > > > > >
>> > > > > > There is still a confusion between "error code for any error in
>> > logs"
>> > > > and
>> > > > > > "error code for any pair of method & exception". Which one are we
>> > > > > > discussing really?
>> > > > > >
>> > > > > > This is impossible to track or test, but is susceptible for
>> common
>> > > > > > error-hiding antipattern where all exceptions are caught in
>> > > cache.get()
>> > > > > and
>> > > > > > discarded, and instead a brand new CH-0001 "error in cache.get()"
>> > is
>> > > > > thrown
>> > > > > > to the user.
>> > > > > >
>> > > > > > Which is certainly not something that anybody wants.
>> > > > > >
>> > > > >
>> > > > > Certainly not. We are talking here about root cause - what is
>> exactly
>> > > the
>> > > > > reason for method call failure.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > Regards,
>> > > > > > --
>> > > > > > Ilya Kasnacheev
>> > > > > >
>> > > > > >
>> > > > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
>> >  vldpyatkov@gmail.com
>> > > >:
>> > > > > >
>> > > > > > > Hi Alexei,
>> > > > > > >
>> > > > > > > > Each public method *must *have a section in the javadoc with
>> a
>> > > list
>> > > > > of
>> > > > > > > all possible error codes for this method.
>> > > > > > >
>> > > > > > > I consider it is redundant, because any public exception can be
>> > > > thrown
>> > > > > > from
>> > > > > > > public API.
>> > > > > > > If it not happens today, it may change tomorrow: one will be
>> > > removed,
>> > > > > > > another one will be added.
>> > > > > > >
>> > > > > > > >Nested exceptions are not forbidden to use. They can provide
>> > > > > additional
>> > > > > > > details on the error for debug purposes
>> > > > > > >
>> > > > > > > I see another issue, which is in the Ignite 2.x, but not attend
>> > > here.
>> > > > > We
>> > > > > > > can have a deep nested exception and use it for handling.
>> > > > > > > In the result, all time when we are handling an exception we
>> use
>> > > > > > > pattern like this:
>> > > > > > > try{
>> > > > > > > ...
>> > > > > > > }
>> > > > > > > catch (Exception e) {
>> > > > > > > if (X.hasCause(e, TimeoutException.class))
>> > > > > > > throw e;
>> > > > > > >
>> > > > > > > if (X.hasCause(e, ConnectException.class,
>> > EOFException.class))
>> > > > > > > continue;
>> > > > > > >
>> > > > > > > if (X.hasCause(e, InterruptedException.class))
>> > > > > > > return false;
>> > > > > > > }
>> > > > > > >
>> > > > > > > If we have a pant to make only one exception to the client
>> side,
>> > we
>> > > > can
>> > > > > > > also do it for an internal exception.
>> > > > > > >
>> > > > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
>> > > > > > >  alexey.scherbakoff@gmail.com > wrote:
>> > > > > > >
>> > > > > > > > Alexey,
>> > > > > > > >
>> > > > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
>> > > > > >  kukushkinalexey@gmail.com
>> > > > > > > >:
>> > > > > > > >
>> > > > > > > > > Just some points looking questionable to me, although I
>> > realize
>> > > > the
>> > > > > > > error
>> > > > > > > > > handling style may be very opinionated:
>> > > > > > > > >
>> > > > > > > > > - Would it make sense splitting the proposed composite
>> > error
>> > > > > code
>> > > > > > > > > (TBL-0001) into separate numeric code (0001) and
>> > > > scope/category
>> > > > > > > > ("TBL")
>> > > > > > > > > to
>> > > > > > > > > avoid parsing the code when an error handler needs to
>> > > analyze
>> > > > > only
>> > > > > > > the
>> > > > > > > > > category or the code?
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > TBL-0001 is a *string representation* of the error. It is
>> > built
>> > > > > from
>> > > > > > 2
>> > > > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001).
>> > Both
>> > > > > > > > internally packed in int. No any kind of parsing will be
>> > > necessary
>> > > > to
>> > > > > > > read
>> > > > > > > > scope/category.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - "*The cause - short string description of an issue,
>> > > readable
>> > > > > by
>> > > > > > > > > user.*".
>> > > > > > > > > This terminology sounds confusing to me since that
>> "cause"
>> > > > > sounds
>> > > > > > > like
>> > > > > > > > > Java
>> > > > > > > > > Throwable's Message to me and the "Cause" is a lower
>> level
>> > > > > > > exception.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > The string describes the cause of error, so the name. I'm ok
>> to
>> > > > > rename
>> > > > > > it
>> > > > > > > > to a message. It will be stored in IgniteException.message
>> > field
>> > > > > > anyway.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - "*The action - steps for a user to resolve error...*".
>> > The
>> > > > > > action
>> > > > > > > is
>> > > > > > > > > very important but do we want to make it part of the
>> > > > > > > IgniteException?
>> > > > > > > > I
>> > > > > > > > > do
>> > > > > > > > > not think the recovery action text should be part of the
>> > > > > > exception.
>> > > > > > > > > IgniteException may include a URL pointing to the
>> > > > corresponding
>> > > > > > > > > documentation - this is discussable.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > This will not be the part of the exception. A user should
>> visit
>> > > the
>> > > > > > > > documentation page and read the action section by
>> corresponding
>> > > > error
>> > > > > > > code.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - "*All public methods throw only unchecked
>> > > IgniteException*"
>> > > > -
>> > > > > I
>> > > > > > > > assume
>> > > > > > > > > we still respect JCache specification and prefer using
>> > > > standard
>> > > > > > Java
>> > > > > > > > > exception to signal about invalid parameters.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Using standard java exceptions whenever possible makes sense.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why we do not follow the "classic" structured
>> exception
>> > > > > handling
>> > > > > > > > > practices in Ignite:
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Ignite 3 will be multi language, and other languages use
>> other
>> > > > error
>> > > > > > > > processing models. SQL for example uses error codes.
>> > > > > > > > The single exception approach simplifies and unifies error
>> > > handling
>> > > > > > > across
>> > > > > > > > platforms for me.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why do we not allow using checked exceptions? It
>> > seems
>> > > to
>> > > > > me
>> > > > > > > > > sometimes forcing the user to handle an error serves
>> > as a
>> > > > > hint
>> > > > > > > and
>> > > > > > > > > thus
>> > > > > > > > > improves usability. For example, handling an
>> > > > > > > optimistic/pessimistic
>> > > > > > > > > transaction conflict/deadlock. Or handling a timeout
>> > for
>> > > > > > > operations
>> > > > > > > > > with
>> > > > > > > > > timeouts.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > A valid point. Checked exceptions must be used for whose
>> > methods,
>> > > > > where
>> > > > > > > > error handling is enforced, for example tx optimistic
>> failure.
>> > > > > > > > Such errors will also have corresponding error codes.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why single public IgniteException and no exception
>> > > > > hierarchy?
>> > > > > > > > Java
>> > > > > > > > > is optimized for structured exception handling
>> instead
>> > of
>> > > > > using
>> > > > > > > > > IF-ELSE to
>> > > > > > > > > analyze the codes.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Exception hierarchy is not required when using error codes
>> and
>> > > > > > applicable
>> > > > > > > > only to java API, so I would avoid spending efforts on it.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - Why no nested exceptions? Sometimes an error
>> handler
>> > is
>> > > > > > > > interested
>> > > > > > > > > only in high level exceptions (like Invalid
>> > > Configuration)
>> > > > > and
>> > > > > > > > > sometimes
>> > > > > > > > > details are needed (like specific configuration
>> parser
>> > > > > > > exceptions).
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Nested exceptions are not forbidden to use. They can provide
>> > > > > additional
>> > > > > > > > details on the error for debug purposes, but not strictly
>> > > required,
>> > > > > > > because
>> > > > > > > > error code + message should provide enough information to the
>> > > user.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > - For async methods returning a Future we may have a
>> > > universal
>> > > > > > rule
>> > > > > > > on
>> > > > > > > > > how to handle exceptions. For example, we may specify
>> that
>> > > any
>> > > > > > async
>> > > > > > > > > method
>> > > > > > > > > can throw only invalid argument exceptions. All other
>> > errors
>> > > > are
>> > > > > > > > > reported
>> > > > > > > > > via the exceptionally(IgniteException -> {}) callback
>> even
>> > > if
>> > > > > the
>> > > > > > > > async
>> > > > > > > > > method was executed synchronously.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > This is ok to me.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
>> > > > > > > > >  alexey.scherbakoff@gmail.com
>> > > > > > > > > >:
>> > > > > > > > >
>> > > > > > > > > > Igniters,
>> > > > > > > > > >
>> > > > > > > > > > I would like to start the discussion about error handling
>> > in
>> > > > > > Ignite 3
>> > > > > > > > and
>> > > > > > > > > > how we can improve it compared to Ignite 2.
>> > > > > > > > > >
>> > > > > > > > > > The error handling in Ignite 2 was not very good because
>> of
>> > > > > generic
>> > > > > > > > > > CacheException thrown on almost any occasion, having
>> deeply
>> > > > > nested
>> > > > > > > root
>> > > > > > > > > > cause and often containing no useful information on
>> further
>> > > > steps
>> > > > > > to
>> > > > > > > > fix
>> > > > > > > > > > the issue.
>> > > > > > > > > >
>> > > > > > > > > > I aim to fix it by introducing some rules on error
>> > handling.
>> > > > > > > > > >
>> > > > > > > > > > *Public exception structure.*
>> > > > > > > > > >
>> > > > > > > > > > A public exception must have an error code, a cause, and
>> an
>> > > > > action.
>> > > > > > > > > >
>> > > > > > > > > > * The code - the combination of 2 byte scope id and 2
>> byte
>> > > > error
>> > > > > > > number
>> > > > > > > > > > within the module. This allows up to 2^16 errors for each
>> > > > scope,
>> > > > > > > which
>> > > > > > > > > > should be enough. The error code string representation
>> can
>> > > look
>> > > > > > like
>> > > > > > > > > > RFT-0001 or TBL-0001
>> > > > > > > > > > * The cause - short string description of an issue,
>> > readable
>> > > by
>> > > > > > user.
>> > > > > > > > > This
>> > > > > > > > > > can have dynamic parameters depending on the error type
>> for
>> > > > > better
>> > > > > > > user
>> > > > > > > > > > experience, like "Can't write a snapshot, no space left
>> on
>> > > > device
>> > > > > > > {0}"
>> > > > > > > > > > * The action - steps for a user to resolve error
>> situation
>> > > > > > described
>> > > > > > > in
>> > > > > > > > > the
>> > > > > > > > > > documentation in the corresponding error section, for
>> > example
>> > > > > > "Clean
>> > > > > > > up
>> > > > > > > > > > disk space and retry the operation".
>> > > > > > > > > >
>> > > > > > > > > > Common errors should have their own scope, for example
>> > > IGN-0001
>> > > > > > > > > >
>> > > > > > > > > > All public methods throw only unchecked
>> > > > > > > > > > org.apache.ignite.lang.IgniteException containing
>> > > > aforementioned
>> > > > > > > > fields.
>> > > > > > > > > > Each public method must have a section in the javadoc
>> with
>> > a
>> > > > list
>> > > > > > of
>> > > > > > > > all
>> > > > > > > > > > possible error codes for this method.
>> > > > > > > > > >
>> > > > > > > > > > A good example with similar structure can be found here
>> [1]
>> > > > > > > > > >
>> > > > > > > > > > *Async timeouts.*
>> > > > > > > > > >
>> > > > > > > > > > Because almost all API methods in Ignite 3 are async,
>> they
>> > > all
>> > > > > will
>> > > > > > > > have
>> > > > > > > > > a
>> > > > > > > > > > configurable default timeout and can complete with
>> timeout
>> > > > error
>> > > > > > if a
>> > > > > > > > > > computation is not finished in time, for example if a
>> > > response
>> > > > > has
>> > > > > > > not
>> > > > > > > > > been
>> > > > > > > > > > yet received.
>> > > > > > > > > > I suggest to complete the async op future with
>> > > TimeoutException
>> > > > > in
>> > > > > > > this
>> > > > > > > > > > case to make it on par with synchronous execution using
>> > > > > future.get,
>> > > > > > > > which
>> > > > > > > > > > will throw java.util.concurrent.TimeoutException on
>> > timeout.
>> > > > > > > > > > For reference, see
>> > > > > java.util.concurrent.CompletableFuture#orTimeout
>> > > > > > > > > > No special error code should be used for this scenario.
>> > > > > > > > > >
>> > > > > > > > > > *Internal exceptions hierarchy.*
>> > > > > > > > > >
>> > > > > > > > > > All internal exceptions should extend
>> > > > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException
>> for
>> > > > > checked
>> > > > > > > > > > exceptions and
>> > > > > > > > > >
>> > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
>> > > > > for
>> > > > > > > > > > unchecked exceptions.
>> > > > > > > > > >
>> > > > > > > > > > Thoughts ?
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > >
>> >  https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
>> > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > >
>> > > > > > > > > > Best regards,
>> > > > > > > > > > Alexei Scherbakov
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Best regards,
>> > > > > > > > > Alexey
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > >
>> > > > > > > > Best regards,
>> > > > > > > > Alexei Scherbakov
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Vladislav Pyatkov
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Best regards,
>> > > > > Alexei Scherbakov
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best regards,
>> > > > Andrey V. Mashenkov
>> > > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best regards,
>> > > Alexei Scherbakov
>> > >
>> >
>>
>>
>> --
>>
>> Best regards,
>> Alexei Scherbakov
>>
>
>
>--
>Sincerely yours,
>Ivan Bessonov 
 
 
 
 

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexander Polovtcev <al...@gmail.com>.
Ivan, thanks for the prepared document! I've left a couple questions.

On Wed, Mar 23, 2022 at 10:49 AM Ivan Bessonov <be...@gmail.com>
wrote:

> Hi everyone,
>
> I'd like to continue the discussion. I created IEP with the attempt to
> summarize
> all the information above, you can find it here [1]. What do you think?
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-84%3A+Error+handling
>
> ср, 21 апр. 2021 г. в 20:46, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> >:
>
> > Alexei,
> >
> > I think it's ok to do error conversion if it makes sense, but better to
> > preserve the root cause whenever possible.
> > Another way to solve the described scenario is to introduce something
> like
> > checked IgniteRetryAgainException, which forces the user to retry or
> ignore
> > it.
> > It's difficult to foresee exactly at this point what is the best solution
> > without knowing the exact scenario.
> >
> > Andrey,
> >
> > I've just realized your proposal to add IGN to each string code. This is
> ok
> > to me, for example IGN-TBL-0001
> >
> > ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> >
> > > Aleksei,
> > >
> > > > The method should always report root cause, in your example it will
> be
> > > > B-xxxx, no matter which module API is called
> > >
> > > I may be wrong, but I doubt this will be usable for an end-user. Let's
> > > imagine that the same root exception was raised in different contexts
> > > resulting in two outcomes. The first one is safe to retry (say, the
> root
> > > cause led to a transaction prepare failure), but the second outcome may
> > be
> > > a state in which no matter how many retries will be attempted, the
> > > operation will never succeed. Only the upper-level layer can tell the
> > > difference and return a proper message to the user, so I would say that
> > > some error conversion/wrapping will be required no matter what.
> > >
> > > --AG
> > >
> > > пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com
> > > >:
> > >
> > > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> > > andrey.mashenkov@gmail.com
> > > > >:
> > > >
> > > > > Hi Alexey,
> > > > > I like the idea.
> > > > >
> > > > > 1.
> > > > >
> > > > > >   TBL-0001 is a *string representation* of the error. It is built
> > > from
> > > > 2
> > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > > > internally packed in int. No any kind of parsing will be
> necessary
> > to
> > > > > read
> > > > > > scope/category.
> > > > >
> > > > > I think Alexey mean if it will be possible to make smth like that
> > > > >
> > > > > catch (IgniteException e) {
> > > > >     if (e.getScope() == "TBL" && e.getCode() == 1234)
> > > > >         continue; // E.g. retry my TX
> > > > > }
> > > > >
> > > > > It looks useful to me.
> > > > >
> > > >
> > > > I have in mind something like this:
> > > >
> > > > public class IgniteException extends RuntimeException {
> > > >     private int errorCode;
> > > >
> > > >     public IgniteException(ErrorScope scope, int code, String
> message,
> > > > Throwable cause) {
> > > >         super(message, cause);
> > > >         this.errorCode = make(scope, code);
> > > >     }
> > > >
> > > >     public boolean matches(ErrorScope scope, int code) {
> > > >         return errorCode == make(scope, code);
> > > >     }
> > > >
> > > >     private int make(ErrorScope scope, int code) {
> > > >         return ((scope.ordinal() << 16) | code);
> > > >     }
> > > >
> > > >     public ErrorScope scope() {
> > > >         return ErrorScope.values()[errorCode >> 16];
> > > >     }
> > > >
> > > >     public int code() {
> > > >         return 0xFFFF & errorCode;
> > > >     }
> > > >
> > > >     public static void main(String[] args) {
> > > >         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> > > "test",
> > > > null);
> > > >
> > > >         System.out.println(e.matches(ErrorScope.RAFT, 2));
> > > >         System.out.println(e.scope());
> > > >         System.out.println(e.code());
> > > >
> > > >         try {
> > > >             throw e;
> > > >         }
> > > >         catch (IgniteException ee) {
> > > >             System.out.println(ee.matches(ErrorScope.RAFT, 1));
> > > >         }
> > > >     }
> > > > }
> > > >
> > > >
> > > > >
> > > > > 2. How you see a cross-module exception throwing?
> > > > > Assume, user call -> module A, which recursively call -> module B,
> > > which
> > > > > fails.
> > > > > So, module A component calls a module B component and got an
> > Exception
> > > > with
> > > > > "B-1234" exception.
> > > > > Module A do not expect any exception here and doesn't take care of
> > > > "B-xxx"
> > > > > error codes, but only "A-yyyy.
> > > > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001")
> code
> > > > > or reuse "B-xxxx" code with the own message, pointing original
> > > exception
> > > > as
> > > > > a cause for both cases?
> > > > >
> > > > > The first approach may looks confusing, while the second one
> produces
> > > too
> > > > > many "UNK-" codes.
> > > > > What code should get the user in that case?
> > > > >
> > > > >
> > > > >
> > > > The method should always report root cause, in your example it will
> be
> > > > B-xxxx, no matter which module API is called.
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >
> > > > > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> > > > ilya.kasnacheev@gmail.com
> > > > > >:
> > > > > >
> > > > > > > Hello!
> > > > > > >
> > > > > > > > All public methods throw only unchecked
> > > > > > > org.apache.ignite.lang.IgniteException containing
> aforementioned
> > > > > fields.
> > > > > > > > Each public method must have a section in the javadoc with a
> > list
> > > > of
> > > > > > all
> > > > > > > possible error codes for this method.
> > > > > > >
> > > > > > > I don't think this is feasible at all.
> > > > > > > Imagine javadoc for cache.get() method featuring three pages of
> > > > > possible
> > > > > > > error codes thrown by this method.
> > > > > > >
> > > > > >
> > > > > > Of course there is no need to write 3 pages of error codes, this
> > > makes
> > > > no
> > > > > > sense.
> > > > > > I think we can use error ranges here, or, probably, document most
> > > > > important
> > > > > > error scenarios.
> > > > > > The point here is to try to document errors as much as possible.
> > > > > >
> > > > > >
> > > > > > > Also, updated every two weeks to account for changes in
> > > > > > > underlying mechanisms.
> > > > > > >
> > > > > > > There is still a confusion between "error code for any error in
> > > logs"
> > > > > and
> > > > > > > "error code for any pair of method & exception". Which one are
> we
> > > > > > > discussing really?
> > > > > > >
> > > > > > > This is impossible to track or test, but is susceptible for
> > common
> > > > > > > error-hiding antipattern where all exceptions are caught in
> > > > cache.get()
> > > > > > and
> > > > > > > discarded, and instead a brand new CH-0001 "error in
> cache.get()"
> > > is
> > > > > > thrown
> > > > > > > to the user.
> > > > > > >
> > > > > > > Which is certainly not something that anybody wants.
> > > > > > >
> > > > > >
> > > > > > Certainly not. We are talking here about root cause - what is
> > exactly
> > > > the
> > > > > > reason for method call failure.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > --
> > > > > > > Ilya Kasnacheev
> > > > > > >
> > > > > > >
> > > > > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
> > > vldpyatkov@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Hi Alexei,
> > > > > > > >
> > > > > > > > > Each public method *must *have a section in the javadoc
> with
> > a
> > > > list
> > > > > > of
> > > > > > > > all possible error codes for this method.
> > > > > > > >
> > > > > > > > I consider it is redundant, because any public exception can
> be
> > > > > thrown
> > > > > > > from
> > > > > > > > public API.
> > > > > > > > If it not happens today, it may change tomorrow: one will be
> > > > removed,
> > > > > > > > another one will be added.
> > > > > > > >
> > > > > > > > >Nested exceptions are not forbidden to use. They can provide
> > > > > > additional
> > > > > > > > details on the error for debug purposes
> > > > > > > >
> > > > > > > > I see another issue, which is in the Ignite 2.x, but not
> attend
> > > > here.
> > > > > > We
> > > > > > > > can have a deep nested exception and use it for handling.
> > > > > > > > In the result, all time when we are handling an exception we
> > use
> > > > > > > > pattern like this:
> > > > > > > > try{
> > > > > > > > ...
> > > > > > > > }
> > > > > > > > catch (Exception e) {
> > > > > > > >     if (X.hasCause(e, TimeoutException.class))
> > > > > > > >         throw e;
> > > > > > > >
> > > > > > > >     if (X.hasCause(e, ConnectException.class,
> > > EOFException.class))
> > > > > > > >         continue;
> > > > > > > >
> > > > > > > >     if (X.hasCause(e, InterruptedException.class))
> > > > > > > >         return false;
> > > > > > > > }
> > > > > > > >
> > > > > > > > If we have a pant to make only one exception to the client
> > side,
> > > we
> > > > > can
> > > > > > > > also do it for an internal exception.
> > > > > > > >
> > > > > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Alexey,
> > > > > > > > >
> > > > > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > > > > > > kukushkinalexey@gmail.com
> > > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Just some points looking questionable to me, although I
> > > realize
> > > > > the
> > > > > > > > error
> > > > > > > > > > handling style may be very opinionated:
> > > > > > > > > >
> > > > > > > > > >    - Would it make sense splitting the proposed composite
> > > error
> > > > > > code
> > > > > > > > > >    (TBL-0001) into separate numeric code (0001) and
> > > > > scope/category
> > > > > > > > > ("TBL")
> > > > > > > > > > to
> > > > > > > > > >    avoid parsing the code when an error handler needs to
> > > > analyze
> > > > > > only
> > > > > > > > the
> > > > > > > > > >    category or the code?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >   TBL-0001 is a *string representation* of the error. It is
> > > built
> > > > > > from
> > > > > > > 2
> > > > > > > > > byte scope id (mapped to name TBL) and 2 byte number
> (0001).
> > > Both
> > > > > > > > > internally packed in int. No any kind of parsing will be
> > > > necessary
> > > > > to
> > > > > > > > read
> > > > > > > > > scope/category.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - "*The cause - short string description of an issue,
> > > > readable
> > > > > > by
> > > > > > > > > > user.*".
> > > > > > > > > >    This terminology sounds confusing to me since that
> > "cause"
> > > > > > sounds
> > > > > > > > like
> > > > > > > > > > Java
> > > > > > > > > >    Throwable's Message to me and the "Cause" is a lower
> > level
> > > > > > > > exception.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The string describes the cause of error, so the name. I'm
> ok
> > to
> > > > > > rename
> > > > > > > it
> > > > > > > > > to a message. It will be stored in IgniteException.message
> > > field
> > > > > > > anyway.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - "*The action - steps for a user to resolve
> error...*".
> > > The
> > > > > > > action
> > > > > > > > is
> > > > > > > > > >    very important but do we want to make it part of the
> > > > > > > > IgniteException?
> > > > > > > > > I
> > > > > > > > > > do
> > > > > > > > > >    not think the recovery action text should be part of
> the
> > > > > > > exception.
> > > > > > > > > >    IgniteException may include a URL pointing to the
> > > > > corresponding
> > > > > > > > > >    documentation - this is discussable.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This will not be the part of the exception. A user should
> > visit
> > > > the
> > > > > > > > > documentation page and read the action section by
> > corresponding
> > > > > error
> > > > > > > > code.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - "*All public methods throw only unchecked
> > > > IgniteException*"
> > > > > -
> > > > > > I
> > > > > > > > > assume
> > > > > > > > > >    we still respect JCache specification and prefer using
> > > > > standard
> > > > > > > Java
> > > > > > > > > >    exception to signal about invalid parameters.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Using standard java exceptions whenever possible makes
> sense.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - Why we do not follow the "classic" structured
> > exception
> > > > > > handling
> > > > > > > > > >    practices in Ignite:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ignite 3 will be multi language, and other languages use
> > other
> > > > > error
> > > > > > > > > processing models. SQL for example uses error codes.
> > > > > > > > > The single exception approach simplifies and unifies error
> > > > handling
> > > > > > > > across
> > > > > > > > > platforms for me.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >       - Why do we not allow using checked exceptions? It
> > > seems
> > > > to
> > > > > > me
> > > > > > > > > >       sometimes forcing the user to handle an error
> serves
> > > as a
> > > > > > hint
> > > > > > > > and
> > > > > > > > > > thus
> > > > > > > > > >       improves usability. For example, handling an
> > > > > > > > optimistic/pessimistic
> > > > > > > > > >       transaction conflict/deadlock. Or handling a
> timeout
> > > for
> > > > > > > > operations
> > > > > > > > > > with
> > > > > > > > > >       timeouts.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > A valid point. Checked exceptions must be used for whose
> > > methods,
> > > > > > where
> > > > > > > > > error handling is enforced, for example tx optimistic
> > failure.
> > > > > > > > > Such errors will also have corresponding error codes.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >       - Why single public IgniteException and no
> exception
> > > > > > hierarchy?
> > > > > > > > > Java
> > > > > > > > > >       is optimized for structured exception handling
> > instead
> > > of
> > > > > > using
> > > > > > > > > > IF-ELSE to
> > > > > > > > > >       analyze the codes.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Exception hierarchy is not required when using error codes
> > and
> > > > > > > applicable
> > > > > > > > > only to java API, so I would avoid spending efforts on it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >       - Why no nested exceptions? Sometimes an error
> > handler
> > > is
> > > > > > > > > interested
> > > > > > > > > >       only in high level exceptions (like Invalid
> > > > Configuration)
> > > > > > and
> > > > > > > > > > sometimes
> > > > > > > > > >       details are needed (like specific configuration
> > parser
> > > > > > > > exceptions).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Nested exceptions are not forbidden to use. They can
> provide
> > > > > > additional
> > > > > > > > > details on the error for debug purposes, but not strictly
> > > > required,
> > > > > > > > because
> > > > > > > > > error code + message should provide enough information to
> the
> > > > user.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - For async methods returning a Future we may have a
> > > > universal
> > > > > > > rule
> > > > > > > > on
> > > > > > > > > >    how to handle exceptions. For example, we may specify
> > that
> > > > any
> > > > > > > async
> > > > > > > > > > method
> > > > > > > > > >    can throw only invalid argument exceptions. All other
> > > errors
> > > > > are
> > > > > > > > > > reported
> > > > > > > > > >    via the exceptionally(IgniteException -> {}) callback
> > even
> > > > if
> > > > > > the
> > > > > > > > > async
> > > > > > > > > >    method was executed synchronously.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is ok to me.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > > > > > > alexey.scherbakoff@gmail.com
> > > > > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > I would like to start the discussion about error
> handling
> > > in
> > > > > > > Ignite 3
> > > > > > > > > and
> > > > > > > > > > > how we can improve it compared to Ignite 2.
> > > > > > > > > > >
> > > > > > > > > > > The error handling in Ignite 2 was not very good
> because
> > of
> > > > > > generic
> > > > > > > > > > > CacheException thrown on almost any occasion, having
> > deeply
> > > > > > nested
> > > > > > > > root
> > > > > > > > > > > cause and often containing no useful information on
> > further
> > > > > steps
> > > > > > > to
> > > > > > > > > fix
> > > > > > > > > > > the issue.
> > > > > > > > > > >
> > > > > > > > > > > I aim to fix it by introducing some rules on error
> > > handling.
> > > > > > > > > > >
> > > > > > > > > > > *Public exception structure.*
> > > > > > > > > > >
> > > > > > > > > > > A public exception must have an error code, a cause,
> and
> > an
> > > > > > action.
> > > > > > > > > > >
> > > > > > > > > > > * The code - the combination of 2 byte scope id and 2
> > byte
> > > > > error
> > > > > > > > number
> > > > > > > > > > > within the module. This allows up to 2^16 errors for
> each
> > > > > scope,
> > > > > > > > which
> > > > > > > > > > > should be enough. The error code string representation
> > can
> > > > look
> > > > > > > like
> > > > > > > > > > > RFT-0001 or TBL-0001
> > > > > > > > > > > * The cause - short string description of an issue,
> > > readable
> > > > by
> > > > > > > user.
> > > > > > > > > > This
> > > > > > > > > > > can have dynamic parameters depending on the error type
> > for
> > > > > > better
> > > > > > > > user
> > > > > > > > > > > experience, like "Can't write a snapshot, no space left
> > on
> > > > > device
> > > > > > > > {0}"
> > > > > > > > > > > * The action - steps for a user to resolve error
> > situation
> > > > > > > described
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > documentation in the corresponding error section, for
> > > example
> > > > > > > "Clean
> > > > > > > > up
> > > > > > > > > > > disk space and retry the operation".
> > > > > > > > > > >
> > > > > > > > > > > Common errors should have their own scope, for example
> > > > IGN-0001
> > > > > > > > > > >
> > > > > > > > > > > All public methods throw only unchecked
> > > > > > > > > > > org.apache.ignite.lang.IgniteException containing
> > > > > aforementioned
> > > > > > > > > fields.
> > > > > > > > > > > Each public method must have a section in the javadoc
> > with
> > > a
> > > > > list
> > > > > > > of
> > > > > > > > > all
> > > > > > > > > > > possible error codes for this method.
> > > > > > > > > > >
> > > > > > > > > > > A good example with similar structure can be found here
> > [1]
> > > > > > > > > > >
> > > > > > > > > > > *Async timeouts.*
> > > > > > > > > > >
> > > > > > > > > > > Because almost all API methods in Ignite 3 are async,
> > they
> > > > all
> > > > > > will
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > configurable default timeout and can complete with
> > timeout
> > > > > error
> > > > > > > if a
> > > > > > > > > > > computation is not finished in time, for example if a
> > > > response
> > > > > > has
> > > > > > > > not
> > > > > > > > > > been
> > > > > > > > > > > yet received.
> > > > > > > > > > > I suggest to complete the async op future with
> > > > TimeoutException
> > > > > > in
> > > > > > > > this
> > > > > > > > > > > case to make it on par with synchronous execution using
> > > > > > future.get,
> > > > > > > > > which
> > > > > > > > > > > will throw java.util.concurrent.TimeoutException on
> > > timeout.
> > > > > > > > > > > For reference, see
> > > > > > java.util.concurrent.CompletableFuture#orTimeout
> > > > > > > > > > > No special error code should be used for this scenario.
> > > > > > > > > > >
> > > > > > > > > > > *Internal exceptions hierarchy.*
> > > > > > > > > > >
> > > > > > > > > > > All internal exceptions should extend
> > > > > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException
> > for
> > > > > > checked
> > > > > > > > > > > exceptions and
> > > > > > > > > > >
> > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > > > > > for
> > > > > > > > > > > unchecked exceptions.
> > > > > > > > > > >
> > > > > > > > > > > Thoughts ?
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > >
> > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Alexei Scherbakov
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > > Alexey
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Alexei Scherbakov
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Vladislav Pyatkov
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrey V. Mashenkov
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


-- 
With regards,
Aleksandr Polovtcev

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Ivan Bessonov <be...@gmail.com>.
Hi everyone,

I'd like to continue the discussion. I created IEP with the attempt to
summarize
all the information above, you can find it here [1]. What do you think?

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-84%3A+Error+handling

ср, 21 апр. 2021 г. в 20:46, Alexei Scherbakov <alexey.scherbakoff@gmail.com
>:

> Alexei,
>
> I think it's ok to do error conversion if it makes sense, but better to
> preserve the root cause whenever possible.
> Another way to solve the described scenario is to introduce something like
> checked IgniteRetryAgainException, which forces the user to retry or ignore
> it.
> It's difficult to foresee exactly at this point what is the best solution
> without knowing the exact scenario.
>
> Andrey,
>
> I've just realized your proposal to add IGN to each string code. This is ok
> to me, for example IGN-TBL-0001
>
> ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
>
> > Aleksei,
> >
> > > The method should always report root cause, in your example it will be
> > > B-xxxx, no matter which module API is called
> >
> > I may be wrong, but I doubt this will be usable for an end-user. Let's
> > imagine that the same root exception was raised in different contexts
> > resulting in two outcomes. The first one is safe to retry (say, the root
> > cause led to a transaction prepare failure), but the second outcome may
> be
> > a state in which no matter how many retries will be attempted, the
> > operation will never succeed. Only the upper-level layer can tell the
> > difference and return a proper message to the user, so I would say that
> > some error conversion/wrapping will be required no matter what.
> >
> > --AG
> >
> > пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com
> > >:
> >
> > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> > andrey.mashenkov@gmail.com
> > > >:
> > >
> > > > Hi Alexey,
> > > > I like the idea.
> > > >
> > > > 1.
> > > >
> > > > >   TBL-0001 is a *string representation* of the error. It is built
> > from
> > > 2
> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > > internally packed in int. No any kind of parsing will be necessary
> to
> > > > read
> > > > > scope/category.
> > > >
> > > > I think Alexey mean if it will be possible to make smth like that
> > > >
> > > > catch (IgniteException e) {
> > > >     if (e.getScope() == "TBL" && e.getCode() == 1234)
> > > >         continue; // E.g. retry my TX
> > > > }
> > > >
> > > > It looks useful to me.
> > > >
> > >
> > > I have in mind something like this:
> > >
> > > public class IgniteException extends RuntimeException {
> > >     private int errorCode;
> > >
> > >     public IgniteException(ErrorScope scope, int code, String message,
> > > Throwable cause) {
> > >         super(message, cause);
> > >         this.errorCode = make(scope, code);
> > >     }
> > >
> > >     public boolean matches(ErrorScope scope, int code) {
> > >         return errorCode == make(scope, code);
> > >     }
> > >
> > >     private int make(ErrorScope scope, int code) {
> > >         return ((scope.ordinal() << 16) | code);
> > >     }
> > >
> > >     public ErrorScope scope() {
> > >         return ErrorScope.values()[errorCode >> 16];
> > >     }
> > >
> > >     public int code() {
> > >         return 0xFFFF & errorCode;
> > >     }
> > >
> > >     public static void main(String[] args) {
> > >         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> > "test",
> > > null);
> > >
> > >         System.out.println(e.matches(ErrorScope.RAFT, 2));
> > >         System.out.println(e.scope());
> > >         System.out.println(e.code());
> > >
> > >         try {
> > >             throw e;
> > >         }
> > >         catch (IgniteException ee) {
> > >             System.out.println(ee.matches(ErrorScope.RAFT, 1));
> > >         }
> > >     }
> > > }
> > >
> > >
> > > >
> > > > 2. How you see a cross-module exception throwing?
> > > > Assume, user call -> module A, which recursively call -> module B,
> > which
> > > > fails.
> > > > So, module A component calls a module B component and got an
> Exception
> > > with
> > > > "B-1234" exception.
> > > > Module A do not expect any exception here and doesn't take care of
> > > "B-xxx"
> > > > error codes, but only "A-yyyy.
> > > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> > > > or reuse "B-xxxx" code with the own message, pointing original
> > exception
> > > as
> > > > a cause for both cases?
> > > >
> > > > The first approach may looks confusing, while the second one produces
> > too
> > > > many "UNK-" codes.
> > > > What code should get the user in that case?
> > > >
> > > >
> > > >
> > > The method should always report root cause, in your example it will be
> > > B-xxxx, no matter which module API is called.
> > >
> > >
> > > >
> > > >
> > > >
> > > > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com> wrote:
> > > >
> > > > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> > > ilya.kasnacheev@gmail.com
> > > > >:
> > > > >
> > > > > > Hello!
> > > > > >
> > > > > > > All public methods throw only unchecked
> > > > > > org.apache.ignite.lang.IgniteException containing aforementioned
> > > > fields.
> > > > > > > Each public method must have a section in the javadoc with a
> list
> > > of
> > > > > all
> > > > > > possible error codes for this method.
> > > > > >
> > > > > > I don't think this is feasible at all.
> > > > > > Imagine javadoc for cache.get() method featuring three pages of
> > > > possible
> > > > > > error codes thrown by this method.
> > > > > >
> > > > >
> > > > > Of course there is no need to write 3 pages of error codes, this
> > makes
> > > no
> > > > > sense.
> > > > > I think we can use error ranges here, or, probably, document most
> > > > important
> > > > > error scenarios.
> > > > > The point here is to try to document errors as much as possible.
> > > > >
> > > > >
> > > > > > Also, updated every two weeks to account for changes in
> > > > > > underlying mechanisms.
> > > > > >
> > > > > > There is still a confusion between "error code for any error in
> > logs"
> > > > and
> > > > > > "error code for any pair of method & exception". Which one are we
> > > > > > discussing really?
> > > > > >
> > > > > > This is impossible to track or test, but is susceptible for
> common
> > > > > > error-hiding antipattern where all exceptions are caught in
> > > cache.get()
> > > > > and
> > > > > > discarded, and instead a brand new CH-0001 "error in cache.get()"
> > is
> > > > > thrown
> > > > > > to the user.
> > > > > >
> > > > > > Which is certainly not something that anybody wants.
> > > > > >
> > > > >
> > > > > Certainly not. We are talking here about root cause - what is
> exactly
> > > the
> > > > > reason for method call failure.
> > > > >
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > > --
> > > > > > Ilya Kasnacheev
> > > > > >
> > > > > >
> > > > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
> > vldpyatkov@gmail.com
> > > >:
> > > > > >
> > > > > > > Hi Alexei,
> > > > > > >
> > > > > > > > Each public method *must *have a section in the javadoc with
> a
> > > list
> > > > > of
> > > > > > > all possible error codes for this method.
> > > > > > >
> > > > > > > I consider it is redundant, because any public exception can be
> > > > thrown
> > > > > > from
> > > > > > > public API.
> > > > > > > If it not happens today, it may change tomorrow: one will be
> > > removed,
> > > > > > > another one will be added.
> > > > > > >
> > > > > > > >Nested exceptions are not forbidden to use. They can provide
> > > > > additional
> > > > > > > details on the error for debug purposes
> > > > > > >
> > > > > > > I see another issue, which is in the Ignite 2.x, but not attend
> > > here.
> > > > > We
> > > > > > > can have a deep nested exception and use it for handling.
> > > > > > > In the result, all time when we are handling an exception we
> use
> > > > > > > pattern like this:
> > > > > > > try{
> > > > > > > ...
> > > > > > > }
> > > > > > > catch (Exception e) {
> > > > > > >     if (X.hasCause(e, TimeoutException.class))
> > > > > > >         throw e;
> > > > > > >
> > > > > > >     if (X.hasCause(e, ConnectException.class,
> > EOFException.class))
> > > > > > >         continue;
> > > > > > >
> > > > > > >     if (X.hasCause(e, InterruptedException.class))
> > > > > > >         return false;
> > > > > > > }
> > > > > > >
> > > > > > > If we have a pant to make only one exception to the client
> side,
> > we
> > > > can
> > > > > > > also do it for an internal exception.
> > > > > > >
> > > > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > > >
> > > > > > > > Alexey,
> > > > > > > >
> > > > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > > > > > kukushkinalexey@gmail.com
> > > > > > > >:
> > > > > > > >
> > > > > > > > > Just some points looking questionable to me, although I
> > realize
> > > > the
> > > > > > > error
> > > > > > > > > handling style may be very opinionated:
> > > > > > > > >
> > > > > > > > >    - Would it make sense splitting the proposed composite
> > error
> > > > > code
> > > > > > > > >    (TBL-0001) into separate numeric code (0001) and
> > > > scope/category
> > > > > > > > ("TBL")
> > > > > > > > > to
> > > > > > > > >    avoid parsing the code when an error handler needs to
> > > analyze
> > > > > only
> > > > > > > the
> > > > > > > > >    category or the code?
> > > > > > > > >
> > > > > > > >
> > > > > > > >   TBL-0001 is a *string representation* of the error. It is
> > built
> > > > > from
> > > > > > 2
> > > > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001).
> > Both
> > > > > > > > internally packed in int. No any kind of parsing will be
> > > necessary
> > > > to
> > > > > > > read
> > > > > > > > scope/category.
> > > > > > > >
> > > > > > > >
> > > > > > > > >    - "*The cause - short string description of an issue,
> > > readable
> > > > > by
> > > > > > > > > user.*".
> > > > > > > > >    This terminology sounds confusing to me since that
> "cause"
> > > > > sounds
> > > > > > > like
> > > > > > > > > Java
> > > > > > > > >    Throwable's Message to me and the "Cause" is a lower
> level
> > > > > > > exception.
> > > > > > > > >
> > > > > > > >
> > > > > > > > The string describes the cause of error, so the name. I'm ok
> to
> > > > > rename
> > > > > > it
> > > > > > > > to a message. It will be stored in IgniteException.message
> > field
> > > > > > anyway.
> > > > > > > >
> > > > > > > >
> > > > > > > > >    - "*The action - steps for a user to resolve error...*".
> > The
> > > > > > action
> > > > > > > is
> > > > > > > > >    very important but do we want to make it part of the
> > > > > > > IgniteException?
> > > > > > > > I
> > > > > > > > > do
> > > > > > > > >    not think the recovery action text should be part of the
> > > > > > exception.
> > > > > > > > >    IgniteException may include a URL pointing to the
> > > > corresponding
> > > > > > > > >    documentation - this is discussable.
> > > > > > > > >
> > > > > > > >
> > > > > > > > This will not be the part of the exception. A user should
> visit
> > > the
> > > > > > > > documentation page and read the action section by
> corresponding
> > > > error
> > > > > > > code.
> > > > > > > >
> > > > > > > >
> > > > > > > > >    - "*All public methods throw only unchecked
> > > IgniteException*"
> > > > -
> > > > > I
> > > > > > > > assume
> > > > > > > > >    we still respect JCache specification and prefer using
> > > > standard
> > > > > > Java
> > > > > > > > >    exception to signal about invalid parameters.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Using standard java exceptions whenever possible makes sense.
> > > > > > > >
> > > > > > > >
> > > > > > > > >    - Why we do not follow the "classic" structured
> exception
> > > > > handling
> > > > > > > > >    practices in Ignite:
> > > > > > > > >
> > > > > > > >
> > > > > > > > Ignite 3 will be multi language, and other languages use
> other
> > > > error
> > > > > > > > processing models. SQL for example uses error codes.
> > > > > > > > The single exception approach simplifies and unifies error
> > > handling
> > > > > > > across
> > > > > > > > platforms for me.
> > > > > > > >
> > > > > > > >
> > > > > > > > >       - Why do we not allow using checked exceptions? It
> > seems
> > > to
> > > > > me
> > > > > > > > >       sometimes forcing the user to handle an error serves
> > as a
> > > > > hint
> > > > > > > and
> > > > > > > > > thus
> > > > > > > > >       improves usability. For example, handling an
> > > > > > > optimistic/pessimistic
> > > > > > > > >       transaction conflict/deadlock. Or handling a timeout
> > for
> > > > > > > operations
> > > > > > > > > with
> > > > > > > > >       timeouts.
> > > > > > > > >
> > > > > > > >
> > > > > > > > A valid point. Checked exceptions must be used for whose
> > methods,
> > > > > where
> > > > > > > > error handling is enforced, for example tx optimistic
> failure.
> > > > > > > > Such errors will also have corresponding error codes.
> > > > > > > >
> > > > > > > >
> > > > > > > > >       - Why single public IgniteException and no exception
> > > > > hierarchy?
> > > > > > > > Java
> > > > > > > > >       is optimized for structured exception handling
> instead
> > of
> > > > > using
> > > > > > > > > IF-ELSE to
> > > > > > > > >       analyze the codes.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Exception hierarchy is not required when using error codes
> and
> > > > > > applicable
> > > > > > > > only to java API, so I would avoid spending efforts on it.
> > > > > > > >
> > > > > > > >
> > > > > > > > >       - Why no nested exceptions? Sometimes an error
> handler
> > is
> > > > > > > > interested
> > > > > > > > >       only in high level exceptions (like Invalid
> > > Configuration)
> > > > > and
> > > > > > > > > sometimes
> > > > > > > > >       details are needed (like specific configuration
> parser
> > > > > > > exceptions).
> > > > > > > > >
> > > > > > > >
> > > > > > > > Nested exceptions are not forbidden to use. They can provide
> > > > > additional
> > > > > > > > details on the error for debug purposes, but not strictly
> > > required,
> > > > > > > because
> > > > > > > > error code + message should provide enough information to the
> > > user.
> > > > > > > >
> > > > > > > >
> > > > > > > > >    - For async methods returning a Future we may have a
> > > universal
> > > > > > rule
> > > > > > > on
> > > > > > > > >    how to handle exceptions. For example, we may specify
> that
> > > any
> > > > > > async
> > > > > > > > > method
> > > > > > > > >    can throw only invalid argument exceptions. All other
> > errors
> > > > are
> > > > > > > > > reported
> > > > > > > > >    via the exceptionally(IgniteException -> {}) callback
> even
> > > if
> > > > > the
> > > > > > > > async
> > > > > > > > >    method was executed synchronously.
> > > > > > > > >
> > > > > > > >
> > > > > > > > This is ok to me.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > > > > > alexey.scherbakoff@gmail.com
> > > > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Igniters,
> > > > > > > > > >
> > > > > > > > > > I would like to start the discussion about error handling
> > in
> > > > > > Ignite 3
> > > > > > > > and
> > > > > > > > > > how we can improve it compared to Ignite 2.
> > > > > > > > > >
> > > > > > > > > > The error handling in Ignite 2 was not very good because
> of
> > > > > generic
> > > > > > > > > > CacheException thrown on almost any occasion, having
> deeply
> > > > > nested
> > > > > > > root
> > > > > > > > > > cause and often containing no useful information on
> further
> > > > steps
> > > > > > to
> > > > > > > > fix
> > > > > > > > > > the issue.
> > > > > > > > > >
> > > > > > > > > > I aim to fix it by introducing some rules on error
> > handling.
> > > > > > > > > >
> > > > > > > > > > *Public exception structure.*
> > > > > > > > > >
> > > > > > > > > > A public exception must have an error code, a cause, and
> an
> > > > > action.
> > > > > > > > > >
> > > > > > > > > > * The code - the combination of 2 byte scope id and 2
> byte
> > > > error
> > > > > > > number
> > > > > > > > > > within the module. This allows up to 2^16 errors for each
> > > > scope,
> > > > > > > which
> > > > > > > > > > should be enough. The error code string representation
> can
> > > look
> > > > > > like
> > > > > > > > > > RFT-0001 or TBL-0001
> > > > > > > > > > * The cause - short string description of an issue,
> > readable
> > > by
> > > > > > user.
> > > > > > > > > This
> > > > > > > > > > can have dynamic parameters depending on the error type
> for
> > > > > better
> > > > > > > user
> > > > > > > > > > experience, like "Can't write a snapshot, no space left
> on
> > > > device
> > > > > > > {0}"
> > > > > > > > > > * The action - steps for a user to resolve error
> situation
> > > > > > described
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > documentation in the corresponding error section, for
> > example
> > > > > > "Clean
> > > > > > > up
> > > > > > > > > > disk space and retry the operation".
> > > > > > > > > >
> > > > > > > > > > Common errors should have their own scope, for example
> > > IGN-0001
> > > > > > > > > >
> > > > > > > > > > All public methods throw only unchecked
> > > > > > > > > > org.apache.ignite.lang.IgniteException containing
> > > > aforementioned
> > > > > > > > fields.
> > > > > > > > > > Each public method must have a section in the javadoc
> with
> > a
> > > > list
> > > > > > of
> > > > > > > > all
> > > > > > > > > > possible error codes for this method.
> > > > > > > > > >
> > > > > > > > > > A good example with similar structure can be found here
> [1]
> > > > > > > > > >
> > > > > > > > > > *Async timeouts.*
> > > > > > > > > >
> > > > > > > > > > Because almost all API methods in Ignite 3 are async,
> they
> > > all
> > > > > will
> > > > > > > > have
> > > > > > > > > a
> > > > > > > > > > configurable default timeout and can complete with
> timeout
> > > > error
> > > > > > if a
> > > > > > > > > > computation is not finished in time, for example if a
> > > response
> > > > > has
> > > > > > > not
> > > > > > > > > been
> > > > > > > > > > yet received.
> > > > > > > > > > I suggest to complete the async op future with
> > > TimeoutException
> > > > > in
> > > > > > > this
> > > > > > > > > > case to make it on par with synchronous execution using
> > > > > future.get,
> > > > > > > > which
> > > > > > > > > > will throw java.util.concurrent.TimeoutException on
> > timeout.
> > > > > > > > > > For reference, see
> > > > > java.util.concurrent.CompletableFuture#orTimeout
> > > > > > > > > > No special error code should be used for this scenario.
> > > > > > > > > >
> > > > > > > > > > *Internal exceptions hierarchy.*
> > > > > > > > > >
> > > > > > > > > > All internal exceptions should extend
> > > > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException
> for
> > > > > checked
> > > > > > > > > > exceptions and
> > > > > > > > > >
> > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > > > > for
> > > > > > > > > > unchecked exceptions.
> > > > > > > > > >
> > > > > > > > > > Thoughts ?
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > >
> > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Alexei Scherbakov
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best regards,
> > > > > > > > > Alexey
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Alexei Scherbakov
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Vladislav Pyatkov
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Sincerely yours,
Ivan Bessonov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexei Scherbakov <al...@gmail.com>.
Alexei,

I think it's ok to do error conversion if it makes sense, but better to
preserve the root cause whenever possible.
Another way to solve the described scenario is to introduce something like
checked IgniteRetryAgainException, which forces the user to retry or ignore
it.
It's difficult to foresee exactly at this point what is the best solution
without knowing the exact scenario.

Andrey,

I've just realized your proposal to add IGN to each string code. This is ok
to me, for example IGN-TBL-0001

ср, 21 апр. 2021 г. в 17:49, Alexey Goncharuk <al...@gmail.com>:

> Aleksei,
>
> > The method should always report root cause, in your example it will be
> > B-xxxx, no matter which module API is called
>
> I may be wrong, but I doubt this will be usable for an end-user. Let's
> imagine that the same root exception was raised in different contexts
> resulting in two outcomes. The first one is safe to retry (say, the root
> cause led to a transaction prepare failure), but the second outcome may be
> a state in which no matter how many retries will be attempted, the
> operation will never succeed. Only the upper-level layer can tell the
> difference and return a proper message to the user, so I would say that
> some error conversion/wrapping will be required no matter what.
>
> --AG
>
> пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> >:
>
> > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> > >:
> >
> > > Hi Alexey,
> > > I like the idea.
> > >
> > > 1.
> > >
> > > >   TBL-0001 is a *string representation* of the error. It is built
> from
> > 2
> > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > internally packed in int. No any kind of parsing will be necessary to
> > > read
> > > > scope/category.
> > >
> > > I think Alexey mean if it will be possible to make smth like that
> > >
> > > catch (IgniteException e) {
> > >     if (e.getScope() == "TBL" && e.getCode() == 1234)
> > >         continue; // E.g. retry my TX
> > > }
> > >
> > > It looks useful to me.
> > >
> >
> > I have in mind something like this:
> >
> > public class IgniteException extends RuntimeException {
> >     private int errorCode;
> >
> >     public IgniteException(ErrorScope scope, int code, String message,
> > Throwable cause) {
> >         super(message, cause);
> >         this.errorCode = make(scope, code);
> >     }
> >
> >     public boolean matches(ErrorScope scope, int code) {
> >         return errorCode == make(scope, code);
> >     }
> >
> >     private int make(ErrorScope scope, int code) {
> >         return ((scope.ordinal() << 16) | code);
> >     }
> >
> >     public ErrorScope scope() {
> >         return ErrorScope.values()[errorCode >> 16];
> >     }
> >
> >     public int code() {
> >         return 0xFFFF & errorCode;
> >     }
> >
> >     public static void main(String[] args) {
> >         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> "test",
> > null);
> >
> >         System.out.println(e.matches(ErrorScope.RAFT, 2));
> >         System.out.println(e.scope());
> >         System.out.println(e.code());
> >
> >         try {
> >             throw e;
> >         }
> >         catch (IgniteException ee) {
> >             System.out.println(ee.matches(ErrorScope.RAFT, 1));
> >         }
> >     }
> > }
> >
> >
> > >
> > > 2. How you see a cross-module exception throwing?
> > > Assume, user call -> module A, which recursively call -> module B,
> which
> > > fails.
> > > So, module A component calls a module B component and got an Exception
> > with
> > > "B-1234" exception.
> > > Module A do not expect any exception here and doesn't take care of
> > "B-xxx"
> > > error codes, but only "A-yyyy.
> > > Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> > > or reuse "B-xxxx" code with the own message, pointing original
> exception
> > as
> > > a cause for both cases?
> > >
> > > The first approach may looks confusing, while the second one produces
> too
> > > many "UNK-" codes.
> > > What code should get the user in that case?
> > >
> > >
> > >
> > The method should always report root cause, in your example it will be
> > B-xxxx, no matter which module API is called.
> >
> >
> > >
> > >
> > >
> > > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com> wrote:
> > >
> > > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> > ilya.kasnacheev@gmail.com
> > > >:
> > > >
> > > > > Hello!
> > > > >
> > > > > > All public methods throw only unchecked
> > > > > org.apache.ignite.lang.IgniteException containing aforementioned
> > > fields.
> > > > > > Each public method must have a section in the javadoc with a list
> > of
> > > > all
> > > > > possible error codes for this method.
> > > > >
> > > > > I don't think this is feasible at all.
> > > > > Imagine javadoc for cache.get() method featuring three pages of
> > > possible
> > > > > error codes thrown by this method.
> > > > >
> > > >
> > > > Of course there is no need to write 3 pages of error codes, this
> makes
> > no
> > > > sense.
> > > > I think we can use error ranges here, or, probably, document most
> > > important
> > > > error scenarios.
> > > > The point here is to try to document errors as much as possible.
> > > >
> > > >
> > > > > Also, updated every two weeks to account for changes in
> > > > > underlying mechanisms.
> > > > >
> > > > > There is still a confusion between "error code for any error in
> logs"
> > > and
> > > > > "error code for any pair of method & exception". Which one are we
> > > > > discussing really?
> > > > >
> > > > > This is impossible to track or test, but is susceptible for common
> > > > > error-hiding antipattern where all exceptions are caught in
> > cache.get()
> > > > and
> > > > > discarded, and instead a brand new CH-0001 "error in cache.get()"
> is
> > > > thrown
> > > > > to the user.
> > > > >
> > > > > Which is certainly not something that anybody wants.
> > > > >
> > > >
> > > > Certainly not. We are talking here about root cause - what is exactly
> > the
> > > > reason for method call failure.
> > > >
> > > >
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Ilya Kasnacheev
> > > > >
> > > > >
> > > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
> vldpyatkov@gmail.com
> > >:
> > > > >
> > > > > > Hi Alexei,
> > > > > >
> > > > > > > Each public method *must *have a section in the javadoc with a
> > list
> > > > of
> > > > > > all possible error codes for this method.
> > > > > >
> > > > > > I consider it is redundant, because any public exception can be
> > > thrown
> > > > > from
> > > > > > public API.
> > > > > > If it not happens today, it may change tomorrow: one will be
> > removed,
> > > > > > another one will be added.
> > > > > >
> > > > > > >Nested exceptions are not forbidden to use. They can provide
> > > > additional
> > > > > > details on the error for debug purposes
> > > > > >
> > > > > > I see another issue, which is in the Ignite 2.x, but not attend
> > here.
> > > > We
> > > > > > can have a deep nested exception and use it for handling.
> > > > > > In the result, all time when we are handling an exception we use
> > > > > > pattern like this:
> > > > > > try{
> > > > > > ...
> > > > > > }
> > > > > > catch (Exception e) {
> > > > > >     if (X.hasCause(e, TimeoutException.class))
> > > > > >         throw e;
> > > > > >
> > > > > >     if (X.hasCause(e, ConnectException.class,
> EOFException.class))
> > > > > >         continue;
> > > > > >
> > > > > >     if (X.hasCause(e, InterruptedException.class))
> > > > > >         return false;
> > > > > > }
> > > > > >
> > > > > > If we have a pant to make only one exception to the client side,
> we
> > > can
> > > > > > also do it for an internal exception.
> > > > > >
> > > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > >
> > > > > > > Alexey,
> > > > > > >
> > > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > > > > kukushkinalexey@gmail.com
> > > > > > >:
> > > > > > >
> > > > > > > > Just some points looking questionable to me, although I
> realize
> > > the
> > > > > > error
> > > > > > > > handling style may be very opinionated:
> > > > > > > >
> > > > > > > >    - Would it make sense splitting the proposed composite
> error
> > > > code
> > > > > > > >    (TBL-0001) into separate numeric code (0001) and
> > > scope/category
> > > > > > > ("TBL")
> > > > > > > > to
> > > > > > > >    avoid parsing the code when an error handler needs to
> > analyze
> > > > only
> > > > > > the
> > > > > > > >    category or the code?
> > > > > > > >
> > > > > > >
> > > > > > >   TBL-0001 is a *string representation* of the error. It is
> built
> > > > from
> > > > > 2
> > > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001).
> Both
> > > > > > > internally packed in int. No any kind of parsing will be
> > necessary
> > > to
> > > > > > read
> > > > > > > scope/category.
> > > > > > >
> > > > > > >
> > > > > > > >    - "*The cause - short string description of an issue,
> > readable
> > > > by
> > > > > > > > user.*".
> > > > > > > >    This terminology sounds confusing to me since that "cause"
> > > > sounds
> > > > > > like
> > > > > > > > Java
> > > > > > > >    Throwable's Message to me and the "Cause" is a lower level
> > > > > > exception.
> > > > > > > >
> > > > > > >
> > > > > > > The string describes the cause of error, so the name. I'm ok to
> > > > rename
> > > > > it
> > > > > > > to a message. It will be stored in IgniteException.message
> field
> > > > > anyway.
> > > > > > >
> > > > > > >
> > > > > > > >    - "*The action - steps for a user to resolve error...*".
> The
> > > > > action
> > > > > > is
> > > > > > > >    very important but do we want to make it part of the
> > > > > > IgniteException?
> > > > > > > I
> > > > > > > > do
> > > > > > > >    not think the recovery action text should be part of the
> > > > > exception.
> > > > > > > >    IgniteException may include a URL pointing to the
> > > corresponding
> > > > > > > >    documentation - this is discussable.
> > > > > > > >
> > > > > > >
> > > > > > > This will not be the part of the exception. A user should visit
> > the
> > > > > > > documentation page and read the action section by corresponding
> > > error
> > > > > > code.
> > > > > > >
> > > > > > >
> > > > > > > >    - "*All public methods throw only unchecked
> > IgniteException*"
> > > -
> > > > I
> > > > > > > assume
> > > > > > > >    we still respect JCache specification and prefer using
> > > standard
> > > > > Java
> > > > > > > >    exception to signal about invalid parameters.
> > > > > > > >
> > > > > > >
> > > > > > > Using standard java exceptions whenever possible makes sense.
> > > > > > >
> > > > > > >
> > > > > > > >    - Why we do not follow the "classic" structured exception
> > > > handling
> > > > > > > >    practices in Ignite:
> > > > > > > >
> > > > > > >
> > > > > > > Ignite 3 will be multi language, and other languages use other
> > > error
> > > > > > > processing models. SQL for example uses error codes.
> > > > > > > The single exception approach simplifies and unifies error
> > handling
> > > > > > across
> > > > > > > platforms for me.
> > > > > > >
> > > > > > >
> > > > > > > >       - Why do we not allow using checked exceptions? It
> seems
> > to
> > > > me
> > > > > > > >       sometimes forcing the user to handle an error serves
> as a
> > > > hint
> > > > > > and
> > > > > > > > thus
> > > > > > > >       improves usability. For example, handling an
> > > > > > optimistic/pessimistic
> > > > > > > >       transaction conflict/deadlock. Or handling a timeout
> for
> > > > > > operations
> > > > > > > > with
> > > > > > > >       timeouts.
> > > > > > > >
> > > > > > >
> > > > > > > A valid point. Checked exceptions must be used for whose
> methods,
> > > > where
> > > > > > > error handling is enforced, for example tx optimistic failure.
> > > > > > > Such errors will also have corresponding error codes.
> > > > > > >
> > > > > > >
> > > > > > > >       - Why single public IgniteException and no exception
> > > > hierarchy?
> > > > > > > Java
> > > > > > > >       is optimized for structured exception handling instead
> of
> > > > using
> > > > > > > > IF-ELSE to
> > > > > > > >       analyze the codes.
> > > > > > > >
> > > > > > >
> > > > > > > Exception hierarchy is not required when using error codes and
> > > > > applicable
> > > > > > > only to java API, so I would avoid spending efforts on it.
> > > > > > >
> > > > > > >
> > > > > > > >       - Why no nested exceptions? Sometimes an error handler
> is
> > > > > > > interested
> > > > > > > >       only in high level exceptions (like Invalid
> > Configuration)
> > > > and
> > > > > > > > sometimes
> > > > > > > >       details are needed (like specific configuration parser
> > > > > > exceptions).
> > > > > > > >
> > > > > > >
> > > > > > > Nested exceptions are not forbidden to use. They can provide
> > > > additional
> > > > > > > details on the error for debug purposes, but not strictly
> > required,
> > > > > > because
> > > > > > > error code + message should provide enough information to the
> > user.
> > > > > > >
> > > > > > >
> > > > > > > >    - For async methods returning a Future we may have a
> > universal
> > > > > rule
> > > > > > on
> > > > > > > >    how to handle exceptions. For example, we may specify that
> > any
> > > > > async
> > > > > > > > method
> > > > > > > >    can throw only invalid argument exceptions. All other
> errors
> > > are
> > > > > > > > reported
> > > > > > > >    via the exceptionally(IgniteException -> {}) callback even
> > if
> > > > the
> > > > > > > async
> > > > > > > >    method was executed synchronously.
> > > > > > > >
> > > > > > >
> > > > > > > This is ok to me.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > > > > alexey.scherbakoff@gmail.com
> > > > > > > > >:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > I would like to start the discussion about error handling
> in
> > > > > Ignite 3
> > > > > > > and
> > > > > > > > > how we can improve it compared to Ignite 2.
> > > > > > > > >
> > > > > > > > > The error handling in Ignite 2 was not very good because of
> > > > generic
> > > > > > > > > CacheException thrown on almost any occasion, having deeply
> > > > nested
> > > > > > root
> > > > > > > > > cause and often containing no useful information on further
> > > steps
> > > > > to
> > > > > > > fix
> > > > > > > > > the issue.
> > > > > > > > >
> > > > > > > > > I aim to fix it by introducing some rules on error
> handling.
> > > > > > > > >
> > > > > > > > > *Public exception structure.*
> > > > > > > > >
> > > > > > > > > A public exception must have an error code, a cause, and an
> > > > action.
> > > > > > > > >
> > > > > > > > > * The code - the combination of 2 byte scope id and 2 byte
> > > error
> > > > > > number
> > > > > > > > > within the module. This allows up to 2^16 errors for each
> > > scope,
> > > > > > which
> > > > > > > > > should be enough. The error code string representation can
> > look
> > > > > like
> > > > > > > > > RFT-0001 or TBL-0001
> > > > > > > > > * The cause - short string description of an issue,
> readable
> > by
> > > > > user.
> > > > > > > > This
> > > > > > > > > can have dynamic parameters depending on the error type for
> > > > better
> > > > > > user
> > > > > > > > > experience, like "Can't write a snapshot, no space left on
> > > device
> > > > > > {0}"
> > > > > > > > > * The action - steps for a user to resolve error situation
> > > > > described
> > > > > > in
> > > > > > > > the
> > > > > > > > > documentation in the corresponding error section, for
> example
> > > > > "Clean
> > > > > > up
> > > > > > > > > disk space and retry the operation".
> > > > > > > > >
> > > > > > > > > Common errors should have their own scope, for example
> > IGN-0001
> > > > > > > > >
> > > > > > > > > All public methods throw only unchecked
> > > > > > > > > org.apache.ignite.lang.IgniteException containing
> > > aforementioned
> > > > > > > fields.
> > > > > > > > > Each public method must have a section in the javadoc with
> a
> > > list
> > > > > of
> > > > > > > all
> > > > > > > > > possible error codes for this method.
> > > > > > > > >
> > > > > > > > > A good example with similar structure can be found here [1]
> > > > > > > > >
> > > > > > > > > *Async timeouts.*
> > > > > > > > >
> > > > > > > > > Because almost all API methods in Ignite 3 are async, they
> > all
> > > > will
> > > > > > > have
> > > > > > > > a
> > > > > > > > > configurable default timeout and can complete with timeout
> > > error
> > > > > if a
> > > > > > > > > computation is not finished in time, for example if a
> > response
> > > > has
> > > > > > not
> > > > > > > > been
> > > > > > > > > yet received.
> > > > > > > > > I suggest to complete the async op future with
> > TimeoutException
> > > > in
> > > > > > this
> > > > > > > > > case to make it on par with synchronous execution using
> > > > future.get,
> > > > > > > which
> > > > > > > > > will throw java.util.concurrent.TimeoutException on
> timeout.
> > > > > > > > > For reference, see
> > > > java.util.concurrent.CompletableFuture#orTimeout
> > > > > > > > > No special error code should be used for this scenario.
> > > > > > > > >
> > > > > > > > > *Internal exceptions hierarchy.*
> > > > > > > > >
> > > > > > > > > All internal exceptions should extend
> > > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException for
> > > > checked
> > > > > > > > > exceptions and
> > > > > > > > >
> > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > > > for
> > > > > > > > > unchecked exceptions.
> > > > > > > > >
> > > > > > > > > Thoughts ?
> > > > > > > > >
> > > > > > > > > [1]
> > > > > >
> https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Alexei Scherbakov
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Alexey
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Vladislav Pyatkov
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>


-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexey Goncharuk <al...@gmail.com>.
Aleksei,

> The method should always report root cause, in your example it will be
> B-xxxx, no matter which module API is called

I may be wrong, but I doubt this will be usable for an end-user. Let's
imagine that the same root exception was raised in different contexts
resulting in two outcomes. The first one is safe to retry (say, the root
cause led to a transaction prepare failure), but the second outcome may be
a state in which no matter how many retries will be attempted, the
operation will never succeed. Only the upper-level layer can tell the
difference and return a proper message to the user, so I would say that
some error conversion/wrapping will be required no matter what.

--AG

пт, 16 апр. 2021 г. в 16:31, Alexei Scherbakov <alexey.scherbakoff@gmail.com
>:

> чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <andrey.mashenkov@gmail.com
> >:
>
> > Hi Alexey,
> > I like the idea.
> >
> > 1.
> >
> > >   TBL-0001 is a *string representation* of the error. It is built from
> 2
> > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > internally packed in int. No any kind of parsing will be necessary to
> > read
> > > scope/category.
> >
> > I think Alexey mean if it will be possible to make smth like that
> >
> > catch (IgniteException e) {
> >     if (e.getScope() == "TBL" && e.getCode() == 1234)
> >         continue; // E.g. retry my TX
> > }
> >
> > It looks useful to me.
> >
>
> I have in mind something like this:
>
> public class IgniteException extends RuntimeException {
>     private int errorCode;
>
>     public IgniteException(ErrorScope scope, int code, String message,
> Throwable cause) {
>         super(message, cause);
>         this.errorCode = make(scope, code);
>     }
>
>     public boolean matches(ErrorScope scope, int code) {
>         return errorCode == make(scope, code);
>     }
>
>     private int make(ErrorScope scope, int code) {
>         return ((scope.ordinal() << 16) | code);
>     }
>
>     public ErrorScope scope() {
>         return ErrorScope.values()[errorCode >> 16];
>     }
>
>     public int code() {
>         return 0xFFFF & errorCode;
>     }
>
>     public static void main(String[] args) {
>         IgniteException e = new IgniteException(ErrorScope.RAFT, 1, "test",
> null);
>
>         System.out.println(e.matches(ErrorScope.RAFT, 2));
>         System.out.println(e.scope());
>         System.out.println(e.code());
>
>         try {
>             throw e;
>         }
>         catch (IgniteException ee) {
>             System.out.println(ee.matches(ErrorScope.RAFT, 1));
>         }
>     }
> }
>
>
> >
> > 2. How you see a cross-module exception throwing?
> > Assume, user call -> module A, which recursively call -> module B, which
> > fails.
> > So, module A component calls a module B component and got an Exception
> with
> > "B-1234" exception.
> > Module A do not expect any exception here and doesn't take care of
> "B-xxx"
> > error codes, but only "A-yyyy.
> > Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> > or reuse "B-xxxx" code with the own message, pointing original exception
> as
> > a cause for both cases?
> >
> > The first approach may looks confusing, while the second one produces too
> > many "UNK-" codes.
> > What code should get the user in that case?
> >
> >
> >
> The method should always report root cause, in your example it will be
> B-xxxx, no matter which module API is called.
>
>
> >
> >
> >
> > On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com
> > >:
> > >
> > > > Hello!
> > > >
> > > > > All public methods throw only unchecked
> > > > org.apache.ignite.lang.IgniteException containing aforementioned
> > fields.
> > > > > Each public method must have a section in the javadoc with a list
> of
> > > all
> > > > possible error codes for this method.
> > > >
> > > > I don't think this is feasible at all.
> > > > Imagine javadoc for cache.get() method featuring three pages of
> > possible
> > > > error codes thrown by this method.
> > > >
> > >
> > > Of course there is no need to write 3 pages of error codes, this makes
> no
> > > sense.
> > > I think we can use error ranges here, or, probably, document most
> > important
> > > error scenarios.
> > > The point here is to try to document errors as much as possible.
> > >
> > >
> > > > Also, updated every two weeks to account for changes in
> > > > underlying mechanisms.
> > > >
> > > > There is still a confusion between "error code for any error in logs"
> > and
> > > > "error code for any pair of method & exception". Which one are we
> > > > discussing really?
> > > >
> > > > This is impossible to track or test, but is susceptible for common
> > > > error-hiding antipattern where all exceptions are caught in
> cache.get()
> > > and
> > > > discarded, and instead a brand new CH-0001 "error in cache.get()" is
> > > thrown
> > > > to the user.
> > > >
> > > > Which is certainly not something that anybody wants.
> > > >
> > >
> > > Certainly not. We are talking here about root cause - what is exactly
> the
> > > reason for method call failure.
> > >
> > >
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <vldpyatkov@gmail.com
> >:
> > > >
> > > > > Hi Alexei,
> > > > >
> > > > > > Each public method *must *have a section in the javadoc with a
> list
> > > of
> > > > > all possible error codes for this method.
> > > > >
> > > > > I consider it is redundant, because any public exception can be
> > thrown
> > > > from
> > > > > public API.
> > > > > If it not happens today, it may change tomorrow: one will be
> removed,
> > > > > another one will be added.
> > > > >
> > > > > >Nested exceptions are not forbidden to use. They can provide
> > > additional
> > > > > details on the error for debug purposes
> > > > >
> > > > > I see another issue, which is in the Ignite 2.x, but not attend
> here.
> > > We
> > > > > can have a deep nested exception and use it for handling.
> > > > > In the result, all time when we are handling an exception we use
> > > > > pattern like this:
> > > > > try{
> > > > > ...
> > > > > }
> > > > > catch (Exception e) {
> > > > >     if (X.hasCause(e, TimeoutException.class))
> > > > >         throw e;
> > > > >
> > > > >     if (X.hasCause(e, ConnectException.class, EOFException.class))
> > > > >         continue;
> > > > >
> > > > >     if (X.hasCause(e, InterruptedException.class))
> > > > >         return false;
> > > > > }
> > > > >
> > > > > If we have a pant to make only one exception to the client side, we
> > can
> > > > > also do it for an internal exception.
> > > > >
> > > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >
> > > > > > Alexey,
> > > > > >
> > > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > > > kukushkinalexey@gmail.com
> > > > > >:
> > > > > >
> > > > > > > Just some points looking questionable to me, although I realize
> > the
> > > > > error
> > > > > > > handling style may be very opinionated:
> > > > > > >
> > > > > > >    - Would it make sense splitting the proposed composite error
> > > code
> > > > > > >    (TBL-0001) into separate numeric code (0001) and
> > scope/category
> > > > > > ("TBL")
> > > > > > > to
> > > > > > >    avoid parsing the code when an error handler needs to
> analyze
> > > only
> > > > > the
> > > > > > >    category or the code?
> > > > > > >
> > > > > >
> > > > > >   TBL-0001 is a *string representation* of the error. It is built
> > > from
> > > > 2
> > > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > > > internally packed in int. No any kind of parsing will be
> necessary
> > to
> > > > > read
> > > > > > scope/category.
> > > > > >
> > > > > >
> > > > > > >    - "*The cause - short string description of an issue,
> readable
> > > by
> > > > > > > user.*".
> > > > > > >    This terminology sounds confusing to me since that "cause"
> > > sounds
> > > > > like
> > > > > > > Java
> > > > > > >    Throwable's Message to me and the "Cause" is a lower level
> > > > > exception.
> > > > > > >
> > > > > >
> > > > > > The string describes the cause of error, so the name. I'm ok to
> > > rename
> > > > it
> > > > > > to a message. It will be stored in IgniteException.message field
> > > > anyway.
> > > > > >
> > > > > >
> > > > > > >    - "*The action - steps for a user to resolve error...*". The
> > > > action
> > > > > is
> > > > > > >    very important but do we want to make it part of the
> > > > > IgniteException?
> > > > > > I
> > > > > > > do
> > > > > > >    not think the recovery action text should be part of the
> > > > exception.
> > > > > > >    IgniteException may include a URL pointing to the
> > corresponding
> > > > > > >    documentation - this is discussable.
> > > > > > >
> > > > > >
> > > > > > This will not be the part of the exception. A user should visit
> the
> > > > > > documentation page and read the action section by corresponding
> > error
> > > > > code.
> > > > > >
> > > > > >
> > > > > > >    - "*All public methods throw only unchecked
> IgniteException*"
> > -
> > > I
> > > > > > assume
> > > > > > >    we still respect JCache specification and prefer using
> > standard
> > > > Java
> > > > > > >    exception to signal about invalid parameters.
> > > > > > >
> > > > > >
> > > > > > Using standard java exceptions whenever possible makes sense.
> > > > > >
> > > > > >
> > > > > > >    - Why we do not follow the "classic" structured exception
> > > handling
> > > > > > >    practices in Ignite:
> > > > > > >
> > > > > >
> > > > > > Ignite 3 will be multi language, and other languages use other
> > error
> > > > > > processing models. SQL for example uses error codes.
> > > > > > The single exception approach simplifies and unifies error
> handling
> > > > > across
> > > > > > platforms for me.
> > > > > >
> > > > > >
> > > > > > >       - Why do we not allow using checked exceptions? It seems
> to
> > > me
> > > > > > >       sometimes forcing the user to handle an error serves as a
> > > hint
> > > > > and
> > > > > > > thus
> > > > > > >       improves usability. For example, handling an
> > > > > optimistic/pessimistic
> > > > > > >       transaction conflict/deadlock. Or handling a timeout for
> > > > > operations
> > > > > > > with
> > > > > > >       timeouts.
> > > > > > >
> > > > > >
> > > > > > A valid point. Checked exceptions must be used for whose methods,
> > > where
> > > > > > error handling is enforced, for example tx optimistic failure.
> > > > > > Such errors will also have corresponding error codes.
> > > > > >
> > > > > >
> > > > > > >       - Why single public IgniteException and no exception
> > > hierarchy?
> > > > > > Java
> > > > > > >       is optimized for structured exception handling instead of
> > > using
> > > > > > > IF-ELSE to
> > > > > > >       analyze the codes.
> > > > > > >
> > > > > >
> > > > > > Exception hierarchy is not required when using error codes and
> > > > applicable
> > > > > > only to java API, so I would avoid spending efforts on it.
> > > > > >
> > > > > >
> > > > > > >       - Why no nested exceptions? Sometimes an error handler is
> > > > > > interested
> > > > > > >       only in high level exceptions (like Invalid
> Configuration)
> > > and
> > > > > > > sometimes
> > > > > > >       details are needed (like specific configuration parser
> > > > > exceptions).
> > > > > > >
> > > > > >
> > > > > > Nested exceptions are not forbidden to use. They can provide
> > > additional
> > > > > > details on the error for debug purposes, but not strictly
> required,
> > > > > because
> > > > > > error code + message should provide enough information to the
> user.
> > > > > >
> > > > > >
> > > > > > >    - For async methods returning a Future we may have a
> universal
> > > > rule
> > > > > on
> > > > > > >    how to handle exceptions. For example, we may specify that
> any
> > > > async
> > > > > > > method
> > > > > > >    can throw only invalid argument exceptions. All other errors
> > are
> > > > > > > reported
> > > > > > >    via the exceptionally(IgniteException -> {}) callback even
> if
> > > the
> > > > > > async
> > > > > > >    method was executed synchronously.
> > > > > > >
> > > > > >
> > > > > > This is ok to me.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > > > alexey.scherbakoff@gmail.com
> > > > > > > >:
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > I would like to start the discussion about error handling in
> > > > Ignite 3
> > > > > > and
> > > > > > > > how we can improve it compared to Ignite 2.
> > > > > > > >
> > > > > > > > The error handling in Ignite 2 was not very good because of
> > > generic
> > > > > > > > CacheException thrown on almost any occasion, having deeply
> > > nested
> > > > > root
> > > > > > > > cause and often containing no useful information on further
> > steps
> > > > to
> > > > > > fix
> > > > > > > > the issue.
> > > > > > > >
> > > > > > > > I aim to fix it by introducing some rules on error handling.
> > > > > > > >
> > > > > > > > *Public exception structure.*
> > > > > > > >
> > > > > > > > A public exception must have an error code, a cause, and an
> > > action.
> > > > > > > >
> > > > > > > > * The code - the combination of 2 byte scope id and 2 byte
> > error
> > > > > number
> > > > > > > > within the module. This allows up to 2^16 errors for each
> > scope,
> > > > > which
> > > > > > > > should be enough. The error code string representation can
> look
> > > > like
> > > > > > > > RFT-0001 or TBL-0001
> > > > > > > > * The cause - short string description of an issue, readable
> by
> > > > user.
> > > > > > > This
> > > > > > > > can have dynamic parameters depending on the error type for
> > > better
> > > > > user
> > > > > > > > experience, like "Can't write a snapshot, no space left on
> > device
> > > > > {0}"
> > > > > > > > * The action - steps for a user to resolve error situation
> > > > described
> > > > > in
> > > > > > > the
> > > > > > > > documentation in the corresponding error section, for example
> > > > "Clean
> > > > > up
> > > > > > > > disk space and retry the operation".
> > > > > > > >
> > > > > > > > Common errors should have their own scope, for example
> IGN-0001
> > > > > > > >
> > > > > > > > All public methods throw only unchecked
> > > > > > > > org.apache.ignite.lang.IgniteException containing
> > aforementioned
> > > > > > fields.
> > > > > > > > Each public method must have a section in the javadoc with a
> > list
> > > > of
> > > > > > all
> > > > > > > > possible error codes for this method.
> > > > > > > >
> > > > > > > > A good example with similar structure can be found here [1]
> > > > > > > >
> > > > > > > > *Async timeouts.*
> > > > > > > >
> > > > > > > > Because almost all API methods in Ignite 3 are async, they
> all
> > > will
> > > > > > have
> > > > > > > a
> > > > > > > > configurable default timeout and can complete with timeout
> > error
> > > > if a
> > > > > > > > computation is not finished in time, for example if a
> response
> > > has
> > > > > not
> > > > > > > been
> > > > > > > > yet received.
> > > > > > > > I suggest to complete the async op future with
> TimeoutException
> > > in
> > > > > this
> > > > > > > > case to make it on par with synchronous execution using
> > > future.get,
> > > > > > which
> > > > > > > > will throw java.util.concurrent.TimeoutException on timeout.
> > > > > > > > For reference, see
> > > java.util.concurrent.CompletableFuture#orTimeout
> > > > > > > > No special error code should be used for this scenario.
> > > > > > > >
> > > > > > > > *Internal exceptions hierarchy.*
> > > > > > > >
> > > > > > > > All internal exceptions should extend
> > > > > > > > org.apache.ignite.internal.lang.IgniteInternalException for
> > > checked
> > > > > > > > exceptions and
> > > > > > > >
> org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > > for
> > > > > > > > unchecked exceptions.
> > > > > > > >
> > > > > > > > Thoughts ?
> > > > > > > >
> > > > > > > > [1]
> > > > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Alexei Scherbakov
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Alexey
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Vladislav Pyatkov
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexei Scherbakov <al...@gmail.com>.
Andrey,

I've already proposed the similar string representation in the very first
message of the topic.

ср, 21 апр. 2021 г. в 12:31, Andrey Mashenkov <an...@gmail.com>:

> Alexey,
>
> As I understand, you suggest ErrorScope enum for easier analysis and it
> will be a part of a error code.
> But what about String representation?
>
> I think we should use a common prefix for error codes in error messages.
> Such codes will be more searchable and as a bonus, vendor-specific.
> String representation might look like IGN-001042 or IGN-TBL042.
>
>
>
>
> On Wed, Apr 21, 2021 at 11:43 AM Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > I've create the ticket for implementing this [1]
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-14611
> >
> > пт, 16 апр. 2021 г. в 16:30, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com
> > >:
> >
> > >
> > >
> > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> > andrey.mashenkov@gmail.com
> > > >:
> > >
> > >> Hi Alexey,
> > >> I like the idea.
> > >>
> > >> 1.
> > >>
> > >> >   TBL-0001 is a *string representation* of the error. It is built
> > from 2
> > >> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > >> > internally packed in int. No any kind of parsing will be necessary
> to
> > >> read
> > >> > scope/category.
> > >>
> > >> I think Alexey mean if it will be possible to make smth like that
> > >>
> > >> catch (IgniteException e) {
> > >>     if (e.getScope() == "TBL" && e.getCode() == 1234)
> > >>         continue; // E.g. retry my TX
> > >> }
> > >>
> > >> It looks useful to me.
> > >>
> > >
> > > I have in mind something like this:
> > >
> > > public class IgniteException extends RuntimeException {
> > >     private int errorCode;
> > >
> > >     public IgniteException(ErrorScope scope, int code, String message,
> > > Throwable cause) {
> > >         super(message, cause);
> > >         this.errorCode = make(scope, code);
> > >     }
> > >
> > >     public boolean matches(ErrorScope scope, int code) {
> > >         return errorCode == make(scope, code);
> > >     }
> > >
> > >     private int make(ErrorScope scope, int code) {
> > >         return ((scope.ordinal() << 16) | code);
> > >     }
> > >
> > >     public ErrorScope scope() {
> > >         return ErrorScope.values()[errorCode >> 16];
> > >     }
> > >
> > >     public int code() {
> > >         return 0xFFFF & errorCode;
> > >     }
> > >
> > >     public static void main(String[] args) {
> > >         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> > > "test", null);
> > >
> > >         System.out.println(e.matches(ErrorScope.RAFT, 2));
> > >         System.out.println(e.scope());
> > >         System.out.println(e.code());
> > >
> > >         try {
> > >             throw e;
> > >         }
> > >         catch (IgniteException ee) {
> > >             System.out.println(ee.matches(ErrorScope.RAFT, 1));
> > >         }
> > >     }
> > > }
> > >
> > >
> > >>
> > >> 2. How you see a cross-module exception throwing?
> > >> Assume, user call -> module A, which recursively call -> module B,
> which
> > >> fails.
> > >> So, module A component calls a module B component and got an Exception
> > >> with
> > >> "B-1234" exception.
> > >> Module A do not expect any exception here and doesn't take care of
> > "B-xxx"
> > >> error codes, but only "A-yyyy.
> > >> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> > >> or reuse "B-xxxx" code with the own message, pointing original
> exception
> > >> as
> > >> a cause for both cases?
> > >>
> > >> The first approach may looks confusing, while the second one produces
> > too
> > >> many "UNK-" codes.
> > >> What code should get the user in that case?
> > >>
> > >>
> > >>
> > > The method should always report root cause, in your example it will be
> > > B-xxxx, no matter which module API is called.
> > >
> > >
> > >>
> > >>
> > >>
> > >> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> > >> alexey.scherbakoff@gmail.com> wrote:
> > >>
> > >> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> > ilya.kasnacheev@gmail.com
> > >> >:
> > >> >
> > >> > > Hello!
> > >> > >
> > >> > > > All public methods throw only unchecked
> > >> > > org.apache.ignite.lang.IgniteException containing aforementioned
> > >> fields.
> > >> > > > Each public method must have a section in the javadoc with a
> list
> > of
> > >> > all
> > >> > > possible error codes for this method.
> > >> > >
> > >> > > I don't think this is feasible at all.
> > >> > > Imagine javadoc for cache.get() method featuring three pages of
> > >> possible
> > >> > > error codes thrown by this method.
> > >> > >
> > >> >
> > >> > Of course there is no need to write 3 pages of error codes, this
> makes
> > >> no
> > >> > sense.
> > >> > I think we can use error ranges here, or, probably, document most
> > >> important
> > >> > error scenarios.
> > >> > The point here is to try to document errors as much as possible.
> > >> >
> > >> >
> > >> > > Also, updated every two weeks to account for changes in
> > >> > > underlying mechanisms.
> > >> > >
> > >> > > There is still a confusion between "error code for any error in
> > logs"
> > >> and
> > >> > > "error code for any pair of method & exception". Which one are we
> > >> > > discussing really?
> > >> > >
> > >> > > This is impossible to track or test, but is susceptible for common
> > >> > > error-hiding antipattern where all exceptions are caught in
> > >> cache.get()
> > >> > and
> > >> > > discarded, and instead a brand new CH-0001 "error in cache.get()"
> is
> > >> > thrown
> > >> > > to the user.
> > >> > >
> > >> > > Which is certainly not something that anybody wants.
> > >> > >
> > >> >
> > >> > Certainly not. We are talking here about root cause - what is
> exactly
> > >> the
> > >> > reason for method call failure.
> > >> >
> > >> >
> > >> > >
> > >> > > Regards,
> > >> > > --
> > >> > > Ilya Kasnacheev
> > >> > >
> > >> > >
> > >> > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
> > vldpyatkov@gmail.com
> > >> >:
> > >> > >
> > >> > > > Hi Alexei,
> > >> > > >
> > >> > > > > Each public method *must *have a section in the javadoc with a
> > >> list
> > >> > of
> > >> > > > all possible error codes for this method.
> > >> > > >
> > >> > > > I consider it is redundant, because any public exception can be
> > >> thrown
> > >> > > from
> > >> > > > public API.
> > >> > > > If it not happens today, it may change tomorrow: one will be
> > >> removed,
> > >> > > > another one will be added.
> > >> > > >
> > >> > > > >Nested exceptions are not forbidden to use. They can provide
> > >> > additional
> > >> > > > details on the error for debug purposes
> > >> > > >
> > >> > > > I see another issue, which is in the Ignite 2.x, but not attend
> > >> here.
> > >> > We
> > >> > > > can have a deep nested exception and use it for handling.
> > >> > > > In the result, all time when we are handling an exception we use
> > >> > > > pattern like this:
> > >> > > > try{
> > >> > > > ...
> > >> > > > }
> > >> > > > catch (Exception e) {
> > >> > > >     if (X.hasCause(e, TimeoutException.class))
> > >> > > >         throw e;
> > >> > > >
> > >> > > >     if (X.hasCause(e, ConnectException.class,
> EOFException.class))
> > >> > > >         continue;
> > >> > > >
> > >> > > >     if (X.hasCause(e, InterruptedException.class))
> > >> > > >         return false;
> > >> > > > }
> > >> > > >
> > >> > > > If we have a pant to make only one exception to the client side,
> > we
> > >> can
> > >> > > > also do it for an internal exception.
> > >> > > >
> > >> > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > >> > > > alexey.scherbakoff@gmail.com> wrote:
> > >> > > >
> > >> > > > > Alexey,
> > >> > > > >
> > >> > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > >> > > kukushkinalexey@gmail.com
> > >> > > > >:
> > >> > > > >
> > >> > > > > > Just some points looking questionable to me, although I
> > realize
> > >> the
> > >> > > > error
> > >> > > > > > handling style may be very opinionated:
> > >> > > > > >
> > >> > > > > >    - Would it make sense splitting the proposed composite
> > error
> > >> > code
> > >> > > > > >    (TBL-0001) into separate numeric code (0001) and
> > >> scope/category
> > >> > > > > ("TBL")
> > >> > > > > > to
> > >> > > > > >    avoid parsing the code when an error handler needs to
> > analyze
> > >> > only
> > >> > > > the
> > >> > > > > >    category or the code?
> > >> > > > > >
> > >> > > > >
> > >> > > > >   TBL-0001 is a *string representation* of the error. It is
> > built
> > >> > from
> > >> > > 2
> > >> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001).
> > Both
> > >> > > > > internally packed in int. No any kind of parsing will be
> > >> necessary to
> > >> > > > read
> > >> > > > > scope/category.
> > >> > > > >
> > >> > > > >
> > >> > > > > >    - "*The cause - short string description of an issue,
> > >> readable
> > >> > by
> > >> > > > > > user.*".
> > >> > > > > >    This terminology sounds confusing to me since that
> "cause"
> > >> > sounds
> > >> > > > like
> > >> > > > > > Java
> > >> > > > > >    Throwable's Message to me and the "Cause" is a lower
> level
> > >> > > > exception.
> > >> > > > > >
> > >> > > > >
> > >> > > > > The string describes the cause of error, so the name. I'm ok
> to
> > >> > rename
> > >> > > it
> > >> > > > > to a message. It will be stored in IgniteException.message
> field
> > >> > > anyway.
> > >> > > > >
> > >> > > > >
> > >> > > > > >    - "*The action - steps for a user to resolve error...*".
> > The
> > >> > > action
> > >> > > > is
> > >> > > > > >    very important but do we want to make it part of the
> > >> > > > IgniteException?
> > >> > > > > I
> > >> > > > > > do
> > >> > > > > >    not think the recovery action text should be part of the
> > >> > > exception.
> > >> > > > > >    IgniteException may include a URL pointing to the
> > >> corresponding
> > >> > > > > >    documentation - this is discussable.
> > >> > > > > >
> > >> > > > >
> > >> > > > > This will not be the part of the exception. A user should
> visit
> > >> the
> > >> > > > > documentation page and read the action section by
> corresponding
> > >> error
> > >> > > > code.
> > >> > > > >
> > >> > > > >
> > >> > > > > >    - "*All public methods throw only unchecked
> > >> IgniteException*" -
> > >> > I
> > >> > > > > assume
> > >> > > > > >    we still respect JCache specification and prefer using
> > >> standard
> > >> > > Java
> > >> > > > > >    exception to signal about invalid parameters.
> > >> > > > > >
> > >> > > > >
> > >> > > > > Using standard java exceptions whenever possible makes sense.
> > >> > > > >
> > >> > > > >
> > >> > > > > >    - Why we do not follow the "classic" structured exception
> > >> > handling
> > >> > > > > >    practices in Ignite:
> > >> > > > > >
> > >> > > > >
> > >> > > > > Ignite 3 will be multi language, and other languages use other
> > >> error
> > >> > > > > processing models. SQL for example uses error codes.
> > >> > > > > The single exception approach simplifies and unifies error
> > >> handling
> > >> > > > across
> > >> > > > > platforms for me.
> > >> > > > >
> > >> > > > >
> > >> > > > > >       - Why do we not allow using checked exceptions? It
> seems
> > >> to
> > >> > me
> > >> > > > > >       sometimes forcing the user to handle an error serves
> as
> > a
> > >> > hint
> > >> > > > and
> > >> > > > > > thus
> > >> > > > > >       improves usability. For example, handling an
> > >> > > > optimistic/pessimistic
> > >> > > > > >       transaction conflict/deadlock. Or handling a timeout
> for
> > >> > > > operations
> > >> > > > > > with
> > >> > > > > >       timeouts.
> > >> > > > > >
> > >> > > > >
> > >> > > > > A valid point. Checked exceptions must be used for whose
> > methods,
> > >> > where
> > >> > > > > error handling is enforced, for example tx optimistic failure.
> > >> > > > > Such errors will also have corresponding error codes.
> > >> > > > >
> > >> > > > >
> > >> > > > > >       - Why single public IgniteException and no exception
> > >> > hierarchy?
> > >> > > > > Java
> > >> > > > > >       is optimized for structured exception handling instead
> > of
> > >> > using
> > >> > > > > > IF-ELSE to
> > >> > > > > >       analyze the codes.
> > >> > > > > >
> > >> > > > >
> > >> > > > > Exception hierarchy is not required when using error codes and
> > >> > > applicable
> > >> > > > > only to java API, so I would avoid spending efforts on it.
> > >> > > > >
> > >> > > > >
> > >> > > > > >       - Why no nested exceptions? Sometimes an error handler
> > is
> > >> > > > > interested
> > >> > > > > >       only in high level exceptions (like Invalid
> > Configuration)
> > >> > and
> > >> > > > > > sometimes
> > >> > > > > >       details are needed (like specific configuration parser
> > >> > > > exceptions).
> > >> > > > > >
> > >> > > > >
> > >> > > > > Nested exceptions are not forbidden to use. They can provide
> > >> > additional
> > >> > > > > details on the error for debug purposes, but not strictly
> > >> required,
> > >> > > > because
> > >> > > > > error code + message should provide enough information to the
> > >> user.
> > >> > > > >
> > >> > > > >
> > >> > > > > >    - For async methods returning a Future we may have a
> > >> universal
> > >> > > rule
> > >> > > > on
> > >> > > > > >    how to handle exceptions. For example, we may specify
> that
> > >> any
> > >> > > async
> > >> > > > > > method
> > >> > > > > >    can throw only invalid argument exceptions. All other
> > errors
> > >> are
> > >> > > > > > reported
> > >> > > > > >    via the exceptionally(IgniteException -> {}) callback
> even
> > if
> > >> > the
> > >> > > > > async
> > >> > > > > >    method was executed synchronously.
> > >> > > > > >
> > >> > > > >
> > >> > > > > This is ok to me.
> > >> > > > >
> > >> > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > >> > > > > > alexey.scherbakoff@gmail.com
> > >> > > > > > >:
> > >> > > > > >
> > >> > > > > > > Igniters,
> > >> > > > > > >
> > >> > > > > > > I would like to start the discussion about error handling
> in
> > >> > > Ignite 3
> > >> > > > > and
> > >> > > > > > > how we can improve it compared to Ignite 2.
> > >> > > > > > >
> > >> > > > > > > The error handling in Ignite 2 was not very good because
> of
> > >> > generic
> > >> > > > > > > CacheException thrown on almost any occasion, having
> deeply
> > >> > nested
> > >> > > > root
> > >> > > > > > > cause and often containing no useful information on
> further
> > >> steps
> > >> > > to
> > >> > > > > fix
> > >> > > > > > > the issue.
> > >> > > > > > >
> > >> > > > > > > I aim to fix it by introducing some rules on error
> handling.
> > >> > > > > > >
> > >> > > > > > > *Public exception structure.*
> > >> > > > > > >
> > >> > > > > > > A public exception must have an error code, a cause, and
> an
> > >> > action.
> > >> > > > > > >
> > >> > > > > > > * The code - the combination of 2 byte scope id and 2 byte
> > >> error
> > >> > > > number
> > >> > > > > > > within the module. This allows up to 2^16 errors for each
> > >> scope,
> > >> > > > which
> > >> > > > > > > should be enough. The error code string representation can
> > >> look
> > >> > > like
> > >> > > > > > > RFT-0001 or TBL-0001
> > >> > > > > > > * The cause - short string description of an issue,
> readable
> > >> by
> > >> > > user.
> > >> > > > > > This
> > >> > > > > > > can have dynamic parameters depending on the error type
> for
> > >> > better
> > >> > > > user
> > >> > > > > > > experience, like "Can't write a snapshot, no space left on
> > >> device
> > >> > > > {0}"
> > >> > > > > > > * The action - steps for a user to resolve error situation
> > >> > > described
> > >> > > > in
> > >> > > > > > the
> > >> > > > > > > documentation in the corresponding error section, for
> > example
> > >> > > "Clean
> > >> > > > up
> > >> > > > > > > disk space and retry the operation".
> > >> > > > > > >
> > >> > > > > > > Common errors should have their own scope, for example
> > >> IGN-0001
> > >> > > > > > >
> > >> > > > > > > All public methods throw only unchecked
> > >> > > > > > > org.apache.ignite.lang.IgniteException containing
> > >> aforementioned
> > >> > > > > fields.
> > >> > > > > > > Each public method must have a section in the javadoc
> with a
> > >> list
> > >> > > of
> > >> > > > > all
> > >> > > > > > > possible error codes for this method.
> > >> > > > > > >
> > >> > > > > > > A good example with similar structure can be found here
> [1]
> > >> > > > > > >
> > >> > > > > > > *Async timeouts.*
> > >> > > > > > >
> > >> > > > > > > Because almost all API methods in Ignite 3 are async, they
> > all
> > >> > will
> > >> > > > > have
> > >> > > > > > a
> > >> > > > > > > configurable default timeout and can complete with timeout
> > >> error
> > >> > > if a
> > >> > > > > > > computation is not finished in time, for example if a
> > response
> > >> > has
> > >> > > > not
> > >> > > > > > been
> > >> > > > > > > yet received.
> > >> > > > > > > I suggest to complete the async op future with
> > >> TimeoutException
> > >> > in
> > >> > > > this
> > >> > > > > > > case to make it on par with synchronous execution using
> > >> > future.get,
> > >> > > > > which
> > >> > > > > > > will throw java.util.concurrent.TimeoutException on
> timeout.
> > >> > > > > > > For reference, see
> > >> > java.util.concurrent.CompletableFuture#orTimeout
> > >> > > > > > > No special error code should be used for this scenario.
> > >> > > > > > >
> > >> > > > > > > *Internal exceptions hierarchy.*
> > >> > > > > > >
> > >> > > > > > > All internal exceptions should extend
> > >> > > > > > > org.apache.ignite.internal.lang.IgniteInternalException
> for
> > >> > checked
> > >> > > > > > > exceptions and
> > >> > > > > > >
> > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > >> > for
> > >> > > > > > > unchecked exceptions.
> > >> > > > > > >
> > >> > > > > > > Thoughts ?
> > >> > > > > > >
> > >> > > > > > > [1]
> > >> > > >
> > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > >
> > >> > > > > > > Best regards,
> > >> > > > > > > Alexei Scherbakov
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Best regards,
> > >> > > > > > Alexey
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > >
> > >> > > > > Best regards,
> > >> > > > > Alexei Scherbakov
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Vladislav Pyatkov
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Best regards,
> > >> > Alexei Scherbakov
> > >> >
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Andrey V. Mashenkov
> > >>
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Andrey Mashenkov <an...@gmail.com>.
Alexey,

As I understand, you suggest ErrorScope enum for easier analysis and it
will be a part of a error code.
But what about String representation?

I think we should use a common prefix for error codes in error messages.
Such codes will be more searchable and as a bonus, vendor-specific.
String representation might look like IGN-001042 or IGN-TBL042.




On Wed, Apr 21, 2021 at 11:43 AM Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> I've create the ticket for implementing this [1]
>
> [1] https://issues.apache.org/jira/browse/IGNITE-14611
>
> пт, 16 апр. 2021 г. в 16:30, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> >:
>
> >
> >
> > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> > >:
> >
> >> Hi Alexey,
> >> I like the idea.
> >>
> >> 1.
> >>
> >> >   TBL-0001 is a *string representation* of the error. It is built
> from 2
> >> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> >> > internally packed in int. No any kind of parsing will be necessary to
> >> read
> >> > scope/category.
> >>
> >> I think Alexey mean if it will be possible to make smth like that
> >>
> >> catch (IgniteException e) {
> >>     if (e.getScope() == "TBL" && e.getCode() == 1234)
> >>         continue; // E.g. retry my TX
> >> }
> >>
> >> It looks useful to me.
> >>
> >
> > I have in mind something like this:
> >
> > public class IgniteException extends RuntimeException {
> >     private int errorCode;
> >
> >     public IgniteException(ErrorScope scope, int code, String message,
> > Throwable cause) {
> >         super(message, cause);
> >         this.errorCode = make(scope, code);
> >     }
> >
> >     public boolean matches(ErrorScope scope, int code) {
> >         return errorCode == make(scope, code);
> >     }
> >
> >     private int make(ErrorScope scope, int code) {
> >         return ((scope.ordinal() << 16) | code);
> >     }
> >
> >     public ErrorScope scope() {
> >         return ErrorScope.values()[errorCode >> 16];
> >     }
> >
> >     public int code() {
> >         return 0xFFFF & errorCode;
> >     }
> >
> >     public static void main(String[] args) {
> >         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> > "test", null);
> >
> >         System.out.println(e.matches(ErrorScope.RAFT, 2));
> >         System.out.println(e.scope());
> >         System.out.println(e.code());
> >
> >         try {
> >             throw e;
> >         }
> >         catch (IgniteException ee) {
> >             System.out.println(ee.matches(ErrorScope.RAFT, 1));
> >         }
> >     }
> > }
> >
> >
> >>
> >> 2. How you see a cross-module exception throwing?
> >> Assume, user call -> module A, which recursively call -> module B, which
> >> fails.
> >> So, module A component calls a module B component and got an Exception
> >> with
> >> "B-1234" exception.
> >> Module A do not expect any exception here and doesn't take care of
> "B-xxx"
> >> error codes, but only "A-yyyy.
> >> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> >> or reuse "B-xxxx" code with the own message, pointing original exception
> >> as
> >> a cause for both cases?
> >>
> >> The first approach may looks confusing, while the second one produces
> too
> >> many "UNK-" codes.
> >> What code should get the user in that case?
> >>
> >>
> >>
> > The method should always report root cause, in your example it will be
> > B-xxxx, no matter which module API is called.
> >
> >
> >>
> >>
> >>
> >> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> >> alexey.scherbakoff@gmail.com> wrote:
> >>
> >> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com
> >> >:
> >> >
> >> > > Hello!
> >> > >
> >> > > > All public methods throw only unchecked
> >> > > org.apache.ignite.lang.IgniteException containing aforementioned
> >> fields.
> >> > > > Each public method must have a section in the javadoc with a list
> of
> >> > all
> >> > > possible error codes for this method.
> >> > >
> >> > > I don't think this is feasible at all.
> >> > > Imagine javadoc for cache.get() method featuring three pages of
> >> possible
> >> > > error codes thrown by this method.
> >> > >
> >> >
> >> > Of course there is no need to write 3 pages of error codes, this makes
> >> no
> >> > sense.
> >> > I think we can use error ranges here, or, probably, document most
> >> important
> >> > error scenarios.
> >> > The point here is to try to document errors as much as possible.
> >> >
> >> >
> >> > > Also, updated every two weeks to account for changes in
> >> > > underlying mechanisms.
> >> > >
> >> > > There is still a confusion between "error code for any error in
> logs"
> >> and
> >> > > "error code for any pair of method & exception". Which one are we
> >> > > discussing really?
> >> > >
> >> > > This is impossible to track or test, but is susceptible for common
> >> > > error-hiding antipattern where all exceptions are caught in
> >> cache.get()
> >> > and
> >> > > discarded, and instead a brand new CH-0001 "error in cache.get()" is
> >> > thrown
> >> > > to the user.
> >> > >
> >> > > Which is certainly not something that anybody wants.
> >> > >
> >> >
> >> > Certainly not. We are talking here about root cause - what is exactly
> >> the
> >> > reason for method call failure.
> >> >
> >> >
> >> > >
> >> > > Regards,
> >> > > --
> >> > > Ilya Kasnacheev
> >> > >
> >> > >
> >> > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <
> vldpyatkov@gmail.com
> >> >:
> >> > >
> >> > > > Hi Alexei,
> >> > > >
> >> > > > > Each public method *must *have a section in the javadoc with a
> >> list
> >> > of
> >> > > > all possible error codes for this method.
> >> > > >
> >> > > > I consider it is redundant, because any public exception can be
> >> thrown
> >> > > from
> >> > > > public API.
> >> > > > If it not happens today, it may change tomorrow: one will be
> >> removed,
> >> > > > another one will be added.
> >> > > >
> >> > > > >Nested exceptions are not forbidden to use. They can provide
> >> > additional
> >> > > > details on the error for debug purposes
> >> > > >
> >> > > > I see another issue, which is in the Ignite 2.x, but not attend
> >> here.
> >> > We
> >> > > > can have a deep nested exception and use it for handling.
> >> > > > In the result, all time when we are handling an exception we use
> >> > > > pattern like this:
> >> > > > try{
> >> > > > ...
> >> > > > }
> >> > > > catch (Exception e) {
> >> > > >     if (X.hasCause(e, TimeoutException.class))
> >> > > >         throw e;
> >> > > >
> >> > > >     if (X.hasCause(e, ConnectException.class, EOFException.class))
> >> > > >         continue;
> >> > > >
> >> > > >     if (X.hasCause(e, InterruptedException.class))
> >> > > >         return false;
> >> > > > }
> >> > > >
> >> > > > If we have a pant to make only one exception to the client side,
> we
> >> can
> >> > > > also do it for an internal exception.
> >> > > >
> >> > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> >> > > > alexey.scherbakoff@gmail.com> wrote:
> >> > > >
> >> > > > > Alexey,
> >> > > > >
> >> > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> >> > > kukushkinalexey@gmail.com
> >> > > > >:
> >> > > > >
> >> > > > > > Just some points looking questionable to me, although I
> realize
> >> the
> >> > > > error
> >> > > > > > handling style may be very opinionated:
> >> > > > > >
> >> > > > > >    - Would it make sense splitting the proposed composite
> error
> >> > code
> >> > > > > >    (TBL-0001) into separate numeric code (0001) and
> >> scope/category
> >> > > > > ("TBL")
> >> > > > > > to
> >> > > > > >    avoid parsing the code when an error handler needs to
> analyze
> >> > only
> >> > > > the
> >> > > > > >    category or the code?
> >> > > > > >
> >> > > > >
> >> > > > >   TBL-0001 is a *string representation* of the error. It is
> built
> >> > from
> >> > > 2
> >> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001).
> Both
> >> > > > > internally packed in int. No any kind of parsing will be
> >> necessary to
> >> > > > read
> >> > > > > scope/category.
> >> > > > >
> >> > > > >
> >> > > > > >    - "*The cause - short string description of an issue,
> >> readable
> >> > by
> >> > > > > > user.*".
> >> > > > > >    This terminology sounds confusing to me since that "cause"
> >> > sounds
> >> > > > like
> >> > > > > > Java
> >> > > > > >    Throwable's Message to me and the "Cause" is a lower level
> >> > > > exception.
> >> > > > > >
> >> > > > >
> >> > > > > The string describes the cause of error, so the name. I'm ok to
> >> > rename
> >> > > it
> >> > > > > to a message. It will be stored in IgniteException.message field
> >> > > anyway.
> >> > > > >
> >> > > > >
> >> > > > > >    - "*The action - steps for a user to resolve error...*".
> The
> >> > > action
> >> > > > is
> >> > > > > >    very important but do we want to make it part of the
> >> > > > IgniteException?
> >> > > > > I
> >> > > > > > do
> >> > > > > >    not think the recovery action text should be part of the
> >> > > exception.
> >> > > > > >    IgniteException may include a URL pointing to the
> >> corresponding
> >> > > > > >    documentation - this is discussable.
> >> > > > > >
> >> > > > >
> >> > > > > This will not be the part of the exception. A user should visit
> >> the
> >> > > > > documentation page and read the action section by corresponding
> >> error
> >> > > > code.
> >> > > > >
> >> > > > >
> >> > > > > >    - "*All public methods throw only unchecked
> >> IgniteException*" -
> >> > I
> >> > > > > assume
> >> > > > > >    we still respect JCache specification and prefer using
> >> standard
> >> > > Java
> >> > > > > >    exception to signal about invalid parameters.
> >> > > > > >
> >> > > > >
> >> > > > > Using standard java exceptions whenever possible makes sense.
> >> > > > >
> >> > > > >
> >> > > > > >    - Why we do not follow the "classic" structured exception
> >> > handling
> >> > > > > >    practices in Ignite:
> >> > > > > >
> >> > > > >
> >> > > > > Ignite 3 will be multi language, and other languages use other
> >> error
> >> > > > > processing models. SQL for example uses error codes.
> >> > > > > The single exception approach simplifies and unifies error
> >> handling
> >> > > > across
> >> > > > > platforms for me.
> >> > > > >
> >> > > > >
> >> > > > > >       - Why do we not allow using checked exceptions? It seems
> >> to
> >> > me
> >> > > > > >       sometimes forcing the user to handle an error serves as
> a
> >> > hint
> >> > > > and
> >> > > > > > thus
> >> > > > > >       improves usability. For example, handling an
> >> > > > optimistic/pessimistic
> >> > > > > >       transaction conflict/deadlock. Or handling a timeout for
> >> > > > operations
> >> > > > > > with
> >> > > > > >       timeouts.
> >> > > > > >
> >> > > > >
> >> > > > > A valid point. Checked exceptions must be used for whose
> methods,
> >> > where
> >> > > > > error handling is enforced, for example tx optimistic failure.
> >> > > > > Such errors will also have corresponding error codes.
> >> > > > >
> >> > > > >
> >> > > > > >       - Why single public IgniteException and no exception
> >> > hierarchy?
> >> > > > > Java
> >> > > > > >       is optimized for structured exception handling instead
> of
> >> > using
> >> > > > > > IF-ELSE to
> >> > > > > >       analyze the codes.
> >> > > > > >
> >> > > > >
> >> > > > > Exception hierarchy is not required when using error codes and
> >> > > applicable
> >> > > > > only to java API, so I would avoid spending efforts on it.
> >> > > > >
> >> > > > >
> >> > > > > >       - Why no nested exceptions? Sometimes an error handler
> is
> >> > > > > interested
> >> > > > > >       only in high level exceptions (like Invalid
> Configuration)
> >> > and
> >> > > > > > sometimes
> >> > > > > >       details are needed (like specific configuration parser
> >> > > > exceptions).
> >> > > > > >
> >> > > > >
> >> > > > > Nested exceptions are not forbidden to use. They can provide
> >> > additional
> >> > > > > details on the error for debug purposes, but not strictly
> >> required,
> >> > > > because
> >> > > > > error code + message should provide enough information to the
> >> user.
> >> > > > >
> >> > > > >
> >> > > > > >    - For async methods returning a Future we may have a
> >> universal
> >> > > rule
> >> > > > on
> >> > > > > >    how to handle exceptions. For example, we may specify that
> >> any
> >> > > async
> >> > > > > > method
> >> > > > > >    can throw only invalid argument exceptions. All other
> errors
> >> are
> >> > > > > > reported
> >> > > > > >    via the exceptionally(IgniteException -> {}) callback even
> if
> >> > the
> >> > > > > async
> >> > > > > >    method was executed synchronously.
> >> > > > > >
> >> > > > >
> >> > > > > This is ok to me.
> >> > > > >
> >> > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> >> > > > > > alexey.scherbakoff@gmail.com
> >> > > > > > >:
> >> > > > > >
> >> > > > > > > Igniters,
> >> > > > > > >
> >> > > > > > > I would like to start the discussion about error handling in
> >> > > Ignite 3
> >> > > > > and
> >> > > > > > > how we can improve it compared to Ignite 2.
> >> > > > > > >
> >> > > > > > > The error handling in Ignite 2 was not very good because of
> >> > generic
> >> > > > > > > CacheException thrown on almost any occasion, having deeply
> >> > nested
> >> > > > root
> >> > > > > > > cause and often containing no useful information on further
> >> steps
> >> > > to
> >> > > > > fix
> >> > > > > > > the issue.
> >> > > > > > >
> >> > > > > > > I aim to fix it by introducing some rules on error handling.
> >> > > > > > >
> >> > > > > > > *Public exception structure.*
> >> > > > > > >
> >> > > > > > > A public exception must have an error code, a cause, and an
> >> > action.
> >> > > > > > >
> >> > > > > > > * The code - the combination of 2 byte scope id and 2 byte
> >> error
> >> > > > number
> >> > > > > > > within the module. This allows up to 2^16 errors for each
> >> scope,
> >> > > > which
> >> > > > > > > should be enough. The error code string representation can
> >> look
> >> > > like
> >> > > > > > > RFT-0001 or TBL-0001
> >> > > > > > > * The cause - short string description of an issue, readable
> >> by
> >> > > user.
> >> > > > > > This
> >> > > > > > > can have dynamic parameters depending on the error type for
> >> > better
> >> > > > user
> >> > > > > > > experience, like "Can't write a snapshot, no space left on
> >> device
> >> > > > {0}"
> >> > > > > > > * The action - steps for a user to resolve error situation
> >> > > described
> >> > > > in
> >> > > > > > the
> >> > > > > > > documentation in the corresponding error section, for
> example
> >> > > "Clean
> >> > > > up
> >> > > > > > > disk space and retry the operation".
> >> > > > > > >
> >> > > > > > > Common errors should have their own scope, for example
> >> IGN-0001
> >> > > > > > >
> >> > > > > > > All public methods throw only unchecked
> >> > > > > > > org.apache.ignite.lang.IgniteException containing
> >> aforementioned
> >> > > > > fields.
> >> > > > > > > Each public method must have a section in the javadoc with a
> >> list
> >> > > of
> >> > > > > all
> >> > > > > > > possible error codes for this method.
> >> > > > > > >
> >> > > > > > > A good example with similar structure can be found here [1]
> >> > > > > > >
> >> > > > > > > *Async timeouts.*
> >> > > > > > >
> >> > > > > > > Because almost all API methods in Ignite 3 are async, they
> all
> >> > will
> >> > > > > have
> >> > > > > > a
> >> > > > > > > configurable default timeout and can complete with timeout
> >> error
> >> > > if a
> >> > > > > > > computation is not finished in time, for example if a
> response
> >> > has
> >> > > > not
> >> > > > > > been
> >> > > > > > > yet received.
> >> > > > > > > I suggest to complete the async op future with
> >> TimeoutException
> >> > in
> >> > > > this
> >> > > > > > > case to make it on par with synchronous execution using
> >> > future.get,
> >> > > > > which
> >> > > > > > > will throw java.util.concurrent.TimeoutException on timeout.
> >> > > > > > > For reference, see
> >> > java.util.concurrent.CompletableFuture#orTimeout
> >> > > > > > > No special error code should be used for this scenario.
> >> > > > > > >
> >> > > > > > > *Internal exceptions hierarchy.*
> >> > > > > > >
> >> > > > > > > All internal exceptions should extend
> >> > > > > > > org.apache.ignite.internal.lang.IgniteInternalException for
> >> > checked
> >> > > > > > > exceptions and
> >> > > > > > >
> org.apache.ignite.internal.lang.IgniteInternalCheckedException
> >> > for
> >> > > > > > > unchecked exceptions.
> >> > > > > > >
> >> > > > > > > Thoughts ?
> >> > > > > > >
> >> > > > > > > [1]
> >> > > >
> https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > >
> >> > > > > > > Best regards,
> >> > > > > > > Alexei Scherbakov
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Best regards,
> >> > > > > > Alexey
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Alexei Scherbakov
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Vladislav Pyatkov
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Best regards,
> >> > Alexei Scherbakov
> >> >
> >>
> >>
> >> --
> >> Best regards,
> >> Andrey V. Mashenkov
> >>
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Best regards,
Andrey V. Mashenkov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexei Scherbakov <al...@gmail.com>.
I've create the ticket for implementing this [1]

[1] https://issues.apache.org/jira/browse/IGNITE-14611

пт, 16 апр. 2021 г. в 16:30, Alexei Scherbakov <alexey.scherbakoff@gmail.com
>:

>
>
> чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <andrey.mashenkov@gmail.com
> >:
>
>> Hi Alexey,
>> I like the idea.
>>
>> 1.
>>
>> >   TBL-0001 is a *string representation* of the error. It is built from 2
>> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
>> > internally packed in int. No any kind of parsing will be necessary to
>> read
>> > scope/category.
>>
>> I think Alexey mean if it will be possible to make smth like that
>>
>> catch (IgniteException e) {
>>     if (e.getScope() == "TBL" && e.getCode() == 1234)
>>         continue; // E.g. retry my TX
>> }
>>
>> It looks useful to me.
>>
>
> I have in mind something like this:
>
> public class IgniteException extends RuntimeException {
>     private int errorCode;
>
>     public IgniteException(ErrorScope scope, int code, String message,
> Throwable cause) {
>         super(message, cause);
>         this.errorCode = make(scope, code);
>     }
>
>     public boolean matches(ErrorScope scope, int code) {
>         return errorCode == make(scope, code);
>     }
>
>     private int make(ErrorScope scope, int code) {
>         return ((scope.ordinal() << 16) | code);
>     }
>
>     public ErrorScope scope() {
>         return ErrorScope.values()[errorCode >> 16];
>     }
>
>     public int code() {
>         return 0xFFFF & errorCode;
>     }
>
>     public static void main(String[] args) {
>         IgniteException e = new IgniteException(ErrorScope.RAFT, 1,
> "test", null);
>
>         System.out.println(e.matches(ErrorScope.RAFT, 2));
>         System.out.println(e.scope());
>         System.out.println(e.code());
>
>         try {
>             throw e;
>         }
>         catch (IgniteException ee) {
>             System.out.println(ee.matches(ErrorScope.RAFT, 1));
>         }
>     }
> }
>
>
>>
>> 2. How you see a cross-module exception throwing?
>> Assume, user call -> module A, which recursively call -> module B, which
>> fails.
>> So, module A component calls a module B component and got an Exception
>> with
>> "B-1234" exception.
>> Module A do not expect any exception here and doesn't take care of "B-xxx"
>> error codes, but only "A-yyyy.
>> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
>> or reuse "B-xxxx" code with the own message, pointing original exception
>> as
>> a cause for both cases?
>>
>> The first approach may looks confusing, while the second one produces too
>> many "UNK-" codes.
>> What code should get the user in that case?
>>
>>
>>
> The method should always report root cause, in your example it will be
> B-xxxx, no matter which module API is called.
>
>
>>
>>
>>
>> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
>> alexey.scherbakoff@gmail.com> wrote:
>>
>> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
>> >:
>> >
>> > > Hello!
>> > >
>> > > > All public methods throw only unchecked
>> > > org.apache.ignite.lang.IgniteException containing aforementioned
>> fields.
>> > > > Each public method must have a section in the javadoc with a list of
>> > all
>> > > possible error codes for this method.
>> > >
>> > > I don't think this is feasible at all.
>> > > Imagine javadoc for cache.get() method featuring three pages of
>> possible
>> > > error codes thrown by this method.
>> > >
>> >
>> > Of course there is no need to write 3 pages of error codes, this makes
>> no
>> > sense.
>> > I think we can use error ranges here, or, probably, document most
>> important
>> > error scenarios.
>> > The point here is to try to document errors as much as possible.
>> >
>> >
>> > > Also, updated every two weeks to account for changes in
>> > > underlying mechanisms.
>> > >
>> > > There is still a confusion between "error code for any error in logs"
>> and
>> > > "error code for any pair of method & exception". Which one are we
>> > > discussing really?
>> > >
>> > > This is impossible to track or test, but is susceptible for common
>> > > error-hiding antipattern where all exceptions are caught in
>> cache.get()
>> > and
>> > > discarded, and instead a brand new CH-0001 "error in cache.get()" is
>> > thrown
>> > > to the user.
>> > >
>> > > Which is certainly not something that anybody wants.
>> > >
>> >
>> > Certainly not. We are talking here about root cause - what is exactly
>> the
>> > reason for method call failure.
>> >
>> >
>> > >
>> > > Regards,
>> > > --
>> > > Ilya Kasnacheev
>> > >
>> > >
>> > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <vldpyatkov@gmail.com
>> >:
>> > >
>> > > > Hi Alexei,
>> > > >
>> > > > > Each public method *must *have a section in the javadoc with a
>> list
>> > of
>> > > > all possible error codes for this method.
>> > > >
>> > > > I consider it is redundant, because any public exception can be
>> thrown
>> > > from
>> > > > public API.
>> > > > If it not happens today, it may change tomorrow: one will be
>> removed,
>> > > > another one will be added.
>> > > >
>> > > > >Nested exceptions are not forbidden to use. They can provide
>> > additional
>> > > > details on the error for debug purposes
>> > > >
>> > > > I see another issue, which is in the Ignite 2.x, but not attend
>> here.
>> > We
>> > > > can have a deep nested exception and use it for handling.
>> > > > In the result, all time when we are handling an exception we use
>> > > > pattern like this:
>> > > > try{
>> > > > ...
>> > > > }
>> > > > catch (Exception e) {
>> > > >     if (X.hasCause(e, TimeoutException.class))
>> > > >         throw e;
>> > > >
>> > > >     if (X.hasCause(e, ConnectException.class, EOFException.class))
>> > > >         continue;
>> > > >
>> > > >     if (X.hasCause(e, InterruptedException.class))
>> > > >         return false;
>> > > > }
>> > > >
>> > > > If we have a pant to make only one exception to the client side, we
>> can
>> > > > also do it for an internal exception.
>> > > >
>> > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
>> > > > alexey.scherbakoff@gmail.com> wrote:
>> > > >
>> > > > > Alexey,
>> > > > >
>> > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
>> > > kukushkinalexey@gmail.com
>> > > > >:
>> > > > >
>> > > > > > Just some points looking questionable to me, although I realize
>> the
>> > > > error
>> > > > > > handling style may be very opinionated:
>> > > > > >
>> > > > > >    - Would it make sense splitting the proposed composite error
>> > code
>> > > > > >    (TBL-0001) into separate numeric code (0001) and
>> scope/category
>> > > > > ("TBL")
>> > > > > > to
>> > > > > >    avoid parsing the code when an error handler needs to analyze
>> > only
>> > > > the
>> > > > > >    category or the code?
>> > > > > >
>> > > > >
>> > > > >   TBL-0001 is a *string representation* of the error. It is built
>> > from
>> > > 2
>> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
>> > > > > internally packed in int. No any kind of parsing will be
>> necessary to
>> > > > read
>> > > > > scope/category.
>> > > > >
>> > > > >
>> > > > > >    - "*The cause - short string description of an issue,
>> readable
>> > by
>> > > > > > user.*".
>> > > > > >    This terminology sounds confusing to me since that "cause"
>> > sounds
>> > > > like
>> > > > > > Java
>> > > > > >    Throwable's Message to me and the "Cause" is a lower level
>> > > > exception.
>> > > > > >
>> > > > >
>> > > > > The string describes the cause of error, so the name. I'm ok to
>> > rename
>> > > it
>> > > > > to a message. It will be stored in IgniteException.message field
>> > > anyway.
>> > > > >
>> > > > >
>> > > > > >    - "*The action - steps for a user to resolve error...*". The
>> > > action
>> > > > is
>> > > > > >    very important but do we want to make it part of the
>> > > > IgniteException?
>> > > > > I
>> > > > > > do
>> > > > > >    not think the recovery action text should be part of the
>> > > exception.
>> > > > > >    IgniteException may include a URL pointing to the
>> corresponding
>> > > > > >    documentation - this is discussable.
>> > > > > >
>> > > > >
>> > > > > This will not be the part of the exception. A user should visit
>> the
>> > > > > documentation page and read the action section by corresponding
>> error
>> > > > code.
>> > > > >
>> > > > >
>> > > > > >    - "*All public methods throw only unchecked
>> IgniteException*" -
>> > I
>> > > > > assume
>> > > > > >    we still respect JCache specification and prefer using
>> standard
>> > > Java
>> > > > > >    exception to signal about invalid parameters.
>> > > > > >
>> > > > >
>> > > > > Using standard java exceptions whenever possible makes sense.
>> > > > >
>> > > > >
>> > > > > >    - Why we do not follow the "classic" structured exception
>> > handling
>> > > > > >    practices in Ignite:
>> > > > > >
>> > > > >
>> > > > > Ignite 3 will be multi language, and other languages use other
>> error
>> > > > > processing models. SQL for example uses error codes.
>> > > > > The single exception approach simplifies and unifies error
>> handling
>> > > > across
>> > > > > platforms for me.
>> > > > >
>> > > > >
>> > > > > >       - Why do we not allow using checked exceptions? It seems
>> to
>> > me
>> > > > > >       sometimes forcing the user to handle an error serves as a
>> > hint
>> > > > and
>> > > > > > thus
>> > > > > >       improves usability. For example, handling an
>> > > > optimistic/pessimistic
>> > > > > >       transaction conflict/deadlock. Or handling a timeout for
>> > > > operations
>> > > > > > with
>> > > > > >       timeouts.
>> > > > > >
>> > > > >
>> > > > > A valid point. Checked exceptions must be used for whose methods,
>> > where
>> > > > > error handling is enforced, for example tx optimistic failure.
>> > > > > Such errors will also have corresponding error codes.
>> > > > >
>> > > > >
>> > > > > >       - Why single public IgniteException and no exception
>> > hierarchy?
>> > > > > Java
>> > > > > >       is optimized for structured exception handling instead of
>> > using
>> > > > > > IF-ELSE to
>> > > > > >       analyze the codes.
>> > > > > >
>> > > > >
>> > > > > Exception hierarchy is not required when using error codes and
>> > > applicable
>> > > > > only to java API, so I would avoid spending efforts on it.
>> > > > >
>> > > > >
>> > > > > >       - Why no nested exceptions? Sometimes an error handler is
>> > > > > interested
>> > > > > >       only in high level exceptions (like Invalid Configuration)
>> > and
>> > > > > > sometimes
>> > > > > >       details are needed (like specific configuration parser
>> > > > exceptions).
>> > > > > >
>> > > > >
>> > > > > Nested exceptions are not forbidden to use. They can provide
>> > additional
>> > > > > details on the error for debug purposes, but not strictly
>> required,
>> > > > because
>> > > > > error code + message should provide enough information to the
>> user.
>> > > > >
>> > > > >
>> > > > > >    - For async methods returning a Future we may have a
>> universal
>> > > rule
>> > > > on
>> > > > > >    how to handle exceptions. For example, we may specify that
>> any
>> > > async
>> > > > > > method
>> > > > > >    can throw only invalid argument exceptions. All other errors
>> are
>> > > > > > reported
>> > > > > >    via the exceptionally(IgniteException -> {}) callback even if
>> > the
>> > > > > async
>> > > > > >    method was executed synchronously.
>> > > > > >
>> > > > >
>> > > > > This is ok to me.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
>> > > > > > alexey.scherbakoff@gmail.com
>> > > > > > >:
>> > > > > >
>> > > > > > > Igniters,
>> > > > > > >
>> > > > > > > I would like to start the discussion about error handling in
>> > > Ignite 3
>> > > > > and
>> > > > > > > how we can improve it compared to Ignite 2.
>> > > > > > >
>> > > > > > > The error handling in Ignite 2 was not very good because of
>> > generic
>> > > > > > > CacheException thrown on almost any occasion, having deeply
>> > nested
>> > > > root
>> > > > > > > cause and often containing no useful information on further
>> steps
>> > > to
>> > > > > fix
>> > > > > > > the issue.
>> > > > > > >
>> > > > > > > I aim to fix it by introducing some rules on error handling.
>> > > > > > >
>> > > > > > > *Public exception structure.*
>> > > > > > >
>> > > > > > > A public exception must have an error code, a cause, and an
>> > action.
>> > > > > > >
>> > > > > > > * The code - the combination of 2 byte scope id and 2 byte
>> error
>> > > > number
>> > > > > > > within the module. This allows up to 2^16 errors for each
>> scope,
>> > > > which
>> > > > > > > should be enough. The error code string representation can
>> look
>> > > like
>> > > > > > > RFT-0001 or TBL-0001
>> > > > > > > * The cause - short string description of an issue, readable
>> by
>> > > user.
>> > > > > > This
>> > > > > > > can have dynamic parameters depending on the error type for
>> > better
>> > > > user
>> > > > > > > experience, like "Can't write a snapshot, no space left on
>> device
>> > > > {0}"
>> > > > > > > * The action - steps for a user to resolve error situation
>> > > described
>> > > > in
>> > > > > > the
>> > > > > > > documentation in the corresponding error section, for example
>> > > "Clean
>> > > > up
>> > > > > > > disk space and retry the operation".
>> > > > > > >
>> > > > > > > Common errors should have their own scope, for example
>> IGN-0001
>> > > > > > >
>> > > > > > > All public methods throw only unchecked
>> > > > > > > org.apache.ignite.lang.IgniteException containing
>> aforementioned
>> > > > > fields.
>> > > > > > > Each public method must have a section in the javadoc with a
>> list
>> > > of
>> > > > > all
>> > > > > > > possible error codes for this method.
>> > > > > > >
>> > > > > > > A good example with similar structure can be found here [1]
>> > > > > > >
>> > > > > > > *Async timeouts.*
>> > > > > > >
>> > > > > > > Because almost all API methods in Ignite 3 are async, they all
>> > will
>> > > > > have
>> > > > > > a
>> > > > > > > configurable default timeout and can complete with timeout
>> error
>> > > if a
>> > > > > > > computation is not finished in time, for example if a response
>> > has
>> > > > not
>> > > > > > been
>> > > > > > > yet received.
>> > > > > > > I suggest to complete the async op future with
>> TimeoutException
>> > in
>> > > > this
>> > > > > > > case to make it on par with synchronous execution using
>> > future.get,
>> > > > > which
>> > > > > > > will throw java.util.concurrent.TimeoutException on timeout.
>> > > > > > > For reference, see
>> > java.util.concurrent.CompletableFuture#orTimeout
>> > > > > > > No special error code should be used for this scenario.
>> > > > > > >
>> > > > > > > *Internal exceptions hierarchy.*
>> > > > > > >
>> > > > > > > All internal exceptions should extend
>> > > > > > > org.apache.ignite.internal.lang.IgniteInternalException for
>> > checked
>> > > > > > > exceptions and
>> > > > > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
>> > for
>> > > > > > > unchecked exceptions.
>> > > > > > >
>> > > > > > > Thoughts ?
>> > > > > > >
>> > > > > > > [1]
>> > > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Best regards,
>> > > > > > > Alexei Scherbakov
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best regards,
>> > > > > > Alexey
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Best regards,
>> > > > > Alexei Scherbakov
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Vladislav Pyatkov
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>> >
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexei Scherbakov <al...@gmail.com>.
чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov <an...@gmail.com>:

> Hi Alexey,
> I like the idea.
>
> 1.
>
> >   TBL-0001 is a *string representation* of the error. It is built from 2
> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > internally packed in int. No any kind of parsing will be necessary to
> read
> > scope/category.
>
> I think Alexey mean if it will be possible to make smth like that
>
> catch (IgniteException e) {
>     if (e.getScope() == "TBL" && e.getCode() == 1234)
>         continue; // E.g. retry my TX
> }
>
> It looks useful to me.
>

I have in mind something like this:

public class IgniteException extends RuntimeException {
    private int errorCode;

    public IgniteException(ErrorScope scope, int code, String message,
Throwable cause) {
        super(message, cause);
        this.errorCode = make(scope, code);
    }

    public boolean matches(ErrorScope scope, int code) {
        return errorCode == make(scope, code);
    }

    private int make(ErrorScope scope, int code) {
        return ((scope.ordinal() << 16) | code);
    }

    public ErrorScope scope() {
        return ErrorScope.values()[errorCode >> 16];
    }

    public int code() {
        return 0xFFFF & errorCode;
    }

    public static void main(String[] args) {
        IgniteException e = new IgniteException(ErrorScope.RAFT, 1, "test",
null);

        System.out.println(e.matches(ErrorScope.RAFT, 2));
        System.out.println(e.scope());
        System.out.println(e.code());

        try {
            throw e;
        }
        catch (IgniteException ee) {
            System.out.println(ee.matches(ErrorScope.RAFT, 1));
        }
    }
}


>
> 2. How you see a cross-module exception throwing?
> Assume, user call -> module A, which recursively call -> module B, which
> fails.
> So, module A component calls a module B component and got an Exception with
> "B-1234" exception.
> Module A do not expect any exception here and doesn't take care of "B-xxx"
> error codes, but only "A-yyyy.
> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
> or reuse "B-xxxx" code with the own message, pointing original exception as
> a cause for both cases?
>
> The first approach may looks confusing, while the second one produces too
> many "UNK-" codes.
> What code should get the user in that case?
>
>
>
The method should always report root cause, in your example it will be
B-xxxx, no matter which module API is called.


>
>
>
> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> >:
> >
> > > Hello!
> > >
> > > > All public methods throw only unchecked
> > > org.apache.ignite.lang.IgniteException containing aforementioned
> fields.
> > > > Each public method must have a section in the javadoc with a list of
> > all
> > > possible error codes for this method.
> > >
> > > I don't think this is feasible at all.
> > > Imagine javadoc for cache.get() method featuring three pages of
> possible
> > > error codes thrown by this method.
> > >
> >
> > Of course there is no need to write 3 pages of error codes, this makes no
> > sense.
> > I think we can use error ranges here, or, probably, document most
> important
> > error scenarios.
> > The point here is to try to document errors as much as possible.
> >
> >
> > > Also, updated every two weeks to account for changes in
> > > underlying mechanisms.
> > >
> > > There is still a confusion between "error code for any error in logs"
> and
> > > "error code for any pair of method & exception". Which one are we
> > > discussing really?
> > >
> > > This is impossible to track or test, but is susceptible for common
> > > error-hiding antipattern where all exceptions are caught in cache.get()
> > and
> > > discarded, and instead a brand new CH-0001 "error in cache.get()" is
> > thrown
> > > to the user.
> > >
> > > Which is certainly not something that anybody wants.
> > >
> >
> > Certainly not. We are talking here about root cause - what is exactly the
> > reason for method call failure.
> >
> >
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <vl...@gmail.com>:
> > >
> > > > Hi Alexei,
> > > >
> > > > > Each public method *must *have a section in the javadoc with a list
> > of
> > > > all possible error codes for this method.
> > > >
> > > > I consider it is redundant, because any public exception can be
> thrown
> > > from
> > > > public API.
> > > > If it not happens today, it may change tomorrow: one will be removed,
> > > > another one will be added.
> > > >
> > > > >Nested exceptions are not forbidden to use. They can provide
> > additional
> > > > details on the error for debug purposes
> > > >
> > > > I see another issue, which is in the Ignite 2.x, but not attend here.
> > We
> > > > can have a deep nested exception and use it for handling.
> > > > In the result, all time when we are handling an exception we use
> > > > pattern like this:
> > > > try{
> > > > ...
> > > > }
> > > > catch (Exception e) {
> > > >     if (X.hasCause(e, TimeoutException.class))
> > > >         throw e;
> > > >
> > > >     if (X.hasCause(e, ConnectException.class, EOFException.class))
> > > >         continue;
> > > >
> > > >     if (X.hasCause(e, InterruptedException.class))
> > > >         return false;
> > > > }
> > > >
> > > > If we have a pant to make only one exception to the client side, we
> can
> > > > also do it for an internal exception.
> > > >
> > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com> wrote:
> > > >
> > > > > Alexey,
> > > > >
> > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > > kukushkinalexey@gmail.com
> > > > >:
> > > > >
> > > > > > Just some points looking questionable to me, although I realize
> the
> > > > error
> > > > > > handling style may be very opinionated:
> > > > > >
> > > > > >    - Would it make sense splitting the proposed composite error
> > code
> > > > > >    (TBL-0001) into separate numeric code (0001) and
> scope/category
> > > > > ("TBL")
> > > > > > to
> > > > > >    avoid parsing the code when an error handler needs to analyze
> > only
> > > > the
> > > > > >    category or the code?
> > > > > >
> > > > >
> > > > >   TBL-0001 is a *string representation* of the error. It is built
> > from
> > > 2
> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > > internally packed in int. No any kind of parsing will be necessary
> to
> > > > read
> > > > > scope/category.
> > > > >
> > > > >
> > > > > >    - "*The cause - short string description of an issue, readable
> > by
> > > > > > user.*".
> > > > > >    This terminology sounds confusing to me since that "cause"
> > sounds
> > > > like
> > > > > > Java
> > > > > >    Throwable's Message to me and the "Cause" is a lower level
> > > > exception.
> > > > > >
> > > > >
> > > > > The string describes the cause of error, so the name. I'm ok to
> > rename
> > > it
> > > > > to a message. It will be stored in IgniteException.message field
> > > anyway.
> > > > >
> > > > >
> > > > > >    - "*The action - steps for a user to resolve error...*". The
> > > action
> > > > is
> > > > > >    very important but do we want to make it part of the
> > > > IgniteException?
> > > > > I
> > > > > > do
> > > > > >    not think the recovery action text should be part of the
> > > exception.
> > > > > >    IgniteException may include a URL pointing to the
> corresponding
> > > > > >    documentation - this is discussable.
> > > > > >
> > > > >
> > > > > This will not be the part of the exception. A user should visit the
> > > > > documentation page and read the action section by corresponding
> error
> > > > code.
> > > > >
> > > > >
> > > > > >    - "*All public methods throw only unchecked IgniteException*"
> -
> > I
> > > > > assume
> > > > > >    we still respect JCache specification and prefer using
> standard
> > > Java
> > > > > >    exception to signal about invalid parameters.
> > > > > >
> > > > >
> > > > > Using standard java exceptions whenever possible makes sense.
> > > > >
> > > > >
> > > > > >    - Why we do not follow the "classic" structured exception
> > handling
> > > > > >    practices in Ignite:
> > > > > >
> > > > >
> > > > > Ignite 3 will be multi language, and other languages use other
> error
> > > > > processing models. SQL for example uses error codes.
> > > > > The single exception approach simplifies and unifies error handling
> > > > across
> > > > > platforms for me.
> > > > >
> > > > >
> > > > > >       - Why do we not allow using checked exceptions? It seems to
> > me
> > > > > >       sometimes forcing the user to handle an error serves as a
> > hint
> > > > and
> > > > > > thus
> > > > > >       improves usability. For example, handling an
> > > > optimistic/pessimistic
> > > > > >       transaction conflict/deadlock. Or handling a timeout for
> > > > operations
> > > > > > with
> > > > > >       timeouts.
> > > > > >
> > > > >
> > > > > A valid point. Checked exceptions must be used for whose methods,
> > where
> > > > > error handling is enforced, for example tx optimistic failure.
> > > > > Such errors will also have corresponding error codes.
> > > > >
> > > > >
> > > > > >       - Why single public IgniteException and no exception
> > hierarchy?
> > > > > Java
> > > > > >       is optimized for structured exception handling instead of
> > using
> > > > > > IF-ELSE to
> > > > > >       analyze the codes.
> > > > > >
> > > > >
> > > > > Exception hierarchy is not required when using error codes and
> > > applicable
> > > > > only to java API, so I would avoid spending efforts on it.
> > > > >
> > > > >
> > > > > >       - Why no nested exceptions? Sometimes an error handler is
> > > > > interested
> > > > > >       only in high level exceptions (like Invalid Configuration)
> > and
> > > > > > sometimes
> > > > > >       details are needed (like specific configuration parser
> > > > exceptions).
> > > > > >
> > > > >
> > > > > Nested exceptions are not forbidden to use. They can provide
> > additional
> > > > > details on the error for debug purposes, but not strictly required,
> > > > because
> > > > > error code + message should provide enough information to the user.
> > > > >
> > > > >
> > > > > >    - For async methods returning a Future we may have a universal
> > > rule
> > > > on
> > > > > >    how to handle exceptions. For example, we may specify that any
> > > async
> > > > > > method
> > > > > >    can throw only invalid argument exceptions. All other errors
> are
> > > > > > reported
> > > > > >    via the exceptionally(IgniteException -> {}) callback even if
> > the
> > > > > async
> > > > > >    method was executed synchronously.
> > > > > >
> > > > >
> > > > > This is ok to me.
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > > alexey.scherbakoff@gmail.com
> > > > > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > I would like to start the discussion about error handling in
> > > Ignite 3
> > > > > and
> > > > > > > how we can improve it compared to Ignite 2.
> > > > > > >
> > > > > > > The error handling in Ignite 2 was not very good because of
> > generic
> > > > > > > CacheException thrown on almost any occasion, having deeply
> > nested
> > > > root
> > > > > > > cause and often containing no useful information on further
> steps
> > > to
> > > > > fix
> > > > > > > the issue.
> > > > > > >
> > > > > > > I aim to fix it by introducing some rules on error handling.
> > > > > > >
> > > > > > > *Public exception structure.*
> > > > > > >
> > > > > > > A public exception must have an error code, a cause, and an
> > action.
> > > > > > >
> > > > > > > * The code - the combination of 2 byte scope id and 2 byte
> error
> > > > number
> > > > > > > within the module. This allows up to 2^16 errors for each
> scope,
> > > > which
> > > > > > > should be enough. The error code string representation can look
> > > like
> > > > > > > RFT-0001 or TBL-0001
> > > > > > > * The cause - short string description of an issue, readable by
> > > user.
> > > > > > This
> > > > > > > can have dynamic parameters depending on the error type for
> > better
> > > > user
> > > > > > > experience, like "Can't write a snapshot, no space left on
> device
> > > > {0}"
> > > > > > > * The action - steps for a user to resolve error situation
> > > described
> > > > in
> > > > > > the
> > > > > > > documentation in the corresponding error section, for example
> > > "Clean
> > > > up
> > > > > > > disk space and retry the operation".
> > > > > > >
> > > > > > > Common errors should have their own scope, for example IGN-0001
> > > > > > >
> > > > > > > All public methods throw only unchecked
> > > > > > > org.apache.ignite.lang.IgniteException containing
> aforementioned
> > > > > fields.
> > > > > > > Each public method must have a section in the javadoc with a
> list
> > > of
> > > > > all
> > > > > > > possible error codes for this method.
> > > > > > >
> > > > > > > A good example with similar structure can be found here [1]
> > > > > > >
> > > > > > > *Async timeouts.*
> > > > > > >
> > > > > > > Because almost all API methods in Ignite 3 are async, they all
> > will
> > > > > have
> > > > > > a
> > > > > > > configurable default timeout and can complete with timeout
> error
> > > if a
> > > > > > > computation is not finished in time, for example if a response
> > has
> > > > not
> > > > > > been
> > > > > > > yet received.
> > > > > > > I suggest to complete the async op future with TimeoutException
> > in
> > > > this
> > > > > > > case to make it on par with synchronous execution using
> > future.get,
> > > > > which
> > > > > > > will throw java.util.concurrent.TimeoutException on timeout.
> > > > > > > For reference, see
> > java.util.concurrent.CompletableFuture#orTimeout
> > > > > > > No special error code should be used for this scenario.
> > > > > > >
> > > > > > > *Internal exceptions hierarchy.*
> > > > > > >
> > > > > > > All internal exceptions should extend
> > > > > > > org.apache.ignite.internal.lang.IgniteInternalException for
> > checked
> > > > > > > exceptions and
> > > > > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> > for
> > > > > > > unchecked exceptions.
> > > > > > >
> > > > > > > Thoughts ?
> > > > > > >
> > > > > > > [1]
> > > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Alexey
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > > >
> > > > --
> > > > Vladislav Pyatkov
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Alexey,
I like the idea.

1.

>   TBL-0001 is a *string representation* of the error. It is built from 2
> byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> internally packed in int. No any kind of parsing will be necessary to read
> scope/category.

I think Alexey mean if it will be possible to make smth like that

catch (IgniteException e) {
    if (e.getScope() == "TBL" && e.getCode() == 1234)
        continue; // E.g. retry my TX
}

It looks useful to me.

2. How you see a cross-module exception throwing?
Assume, user call -> module A, which recursively call -> module B, which
fails.
So, module A component calls a module B component and got an Exception with
"B-1234" exception.
Module A do not expect any exception here and doesn't take care of "B-xxx"
error codes, but only "A-yyyy.
Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code
or reuse "B-xxxx" code with the own message, pointing original exception as
a cause for both cases?

The first approach may looks confusing, while the second one produces too
many "UNK-" codes.
What code should get the user in that case?





On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <il...@gmail.com>:
>
> > Hello!
> >
> > > All public methods throw only unchecked
> > org.apache.ignite.lang.IgniteException containing aforementioned fields.
> > > Each public method must have a section in the javadoc with a list of
> all
> > possible error codes for this method.
> >
> > I don't think this is feasible at all.
> > Imagine javadoc for cache.get() method featuring three pages of possible
> > error codes thrown by this method.
> >
>
> Of course there is no need to write 3 pages of error codes, this makes no
> sense.
> I think we can use error ranges here, or, probably, document most important
> error scenarios.
> The point here is to try to document errors as much as possible.
>
>
> > Also, updated every two weeks to account for changes in
> > underlying mechanisms.
> >
> > There is still a confusion between "error code for any error in logs" and
> > "error code for any pair of method & exception". Which one are we
> > discussing really?
> >
> > This is impossible to track or test, but is susceptible for common
> > error-hiding antipattern where all exceptions are caught in cache.get()
> and
> > discarded, and instead a brand new CH-0001 "error in cache.get()" is
> thrown
> > to the user.
> >
> > Which is certainly not something that anybody wants.
> >
>
> Certainly not. We are talking here about root cause - what is exactly the
> reason for method call failure.
>
>
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <vl...@gmail.com>:
> >
> > > Hi Alexei,
> > >
> > > > Each public method *must *have a section in the javadoc with a list
> of
> > > all possible error codes for this method.
> > >
> > > I consider it is redundant, because any public exception can be thrown
> > from
> > > public API.
> > > If it not happens today, it may change tomorrow: one will be removed,
> > > another one will be added.
> > >
> > > >Nested exceptions are not forbidden to use. They can provide
> additional
> > > details on the error for debug purposes
> > >
> > > I see another issue, which is in the Ignite 2.x, but not attend here.
> We
> > > can have a deep nested exception and use it for handling.
> > > In the result, all time when we are handling an exception we use
> > > pattern like this:
> > > try{
> > > ...
> > > }
> > > catch (Exception e) {
> > >     if (X.hasCause(e, TimeoutException.class))
> > >         throw e;
> > >
> > >     if (X.hasCause(e, ConnectException.class, EOFException.class))
> > >         continue;
> > >
> > >     if (X.hasCause(e, InterruptedException.class))
> > >         return false;
> > > }
> > >
> > > If we have a pant to make only one exception to the client side, we can
> > > also do it for an internal exception.
> > >
> > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com> wrote:
> > >
> > > > Alexey,
> > > >
> > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> > kukushkinalexey@gmail.com
> > > >:
> > > >
> > > > > Just some points looking questionable to me, although I realize the
> > > error
> > > > > handling style may be very opinionated:
> > > > >
> > > > >    - Would it make sense splitting the proposed composite error
> code
> > > > >    (TBL-0001) into separate numeric code (0001) and scope/category
> > > > ("TBL")
> > > > > to
> > > > >    avoid parsing the code when an error handler needs to analyze
> only
> > > the
> > > > >    category or the code?
> > > > >
> > > >
> > > >   TBL-0001 is a *string representation* of the error. It is built
> from
> > 2
> > > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > > internally packed in int. No any kind of parsing will be necessary to
> > > read
> > > > scope/category.
> > > >
> > > >
> > > > >    - "*The cause - short string description of an issue, readable
> by
> > > > > user.*".
> > > > >    This terminology sounds confusing to me since that "cause"
> sounds
> > > like
> > > > > Java
> > > > >    Throwable's Message to me and the "Cause" is a lower level
> > > exception.
> > > > >
> > > >
> > > > The string describes the cause of error, so the name. I'm ok to
> rename
> > it
> > > > to a message. It will be stored in IgniteException.message field
> > anyway.
> > > >
> > > >
> > > > >    - "*The action - steps for a user to resolve error...*". The
> > action
> > > is
> > > > >    very important but do we want to make it part of the
> > > IgniteException?
> > > > I
> > > > > do
> > > > >    not think the recovery action text should be part of the
> > exception.
> > > > >    IgniteException may include a URL pointing to the corresponding
> > > > >    documentation - this is discussable.
> > > > >
> > > >
> > > > This will not be the part of the exception. A user should visit the
> > > > documentation page and read the action section by corresponding error
> > > code.
> > > >
> > > >
> > > > >    - "*All public methods throw only unchecked IgniteException*" -
> I
> > > > assume
> > > > >    we still respect JCache specification and prefer using standard
> > Java
> > > > >    exception to signal about invalid parameters.
> > > > >
> > > >
> > > > Using standard java exceptions whenever possible makes sense.
> > > >
> > > >
> > > > >    - Why we do not follow the "classic" structured exception
> handling
> > > > >    practices in Ignite:
> > > > >
> > > >
> > > > Ignite 3 will be multi language, and other languages use other error
> > > > processing models. SQL for example uses error codes.
> > > > The single exception approach simplifies and unifies error handling
> > > across
> > > > platforms for me.
> > > >
> > > >
> > > > >       - Why do we not allow using checked exceptions? It seems to
> me
> > > > >       sometimes forcing the user to handle an error serves as a
> hint
> > > and
> > > > > thus
> > > > >       improves usability. For example, handling an
> > > optimistic/pessimistic
> > > > >       transaction conflict/deadlock. Or handling a timeout for
> > > operations
> > > > > with
> > > > >       timeouts.
> > > > >
> > > >
> > > > A valid point. Checked exceptions must be used for whose methods,
> where
> > > > error handling is enforced, for example tx optimistic failure.
> > > > Such errors will also have corresponding error codes.
> > > >
> > > >
> > > > >       - Why single public IgniteException and no exception
> hierarchy?
> > > > Java
> > > > >       is optimized for structured exception handling instead of
> using
> > > > > IF-ELSE to
> > > > >       analyze the codes.
> > > > >
> > > >
> > > > Exception hierarchy is not required when using error codes and
> > applicable
> > > > only to java API, so I would avoid spending efforts on it.
> > > >
> > > >
> > > > >       - Why no nested exceptions? Sometimes an error handler is
> > > > interested
> > > > >       only in high level exceptions (like Invalid Configuration)
> and
> > > > > sometimes
> > > > >       details are needed (like specific configuration parser
> > > exceptions).
> > > > >
> > > >
> > > > Nested exceptions are not forbidden to use. They can provide
> additional
> > > > details on the error for debug purposes, but not strictly required,
> > > because
> > > > error code + message should provide enough information to the user.
> > > >
> > > >
> > > > >    - For async methods returning a Future we may have a universal
> > rule
> > > on
> > > > >    how to handle exceptions. For example, we may specify that any
> > async
> > > > > method
> > > > >    can throw only invalid argument exceptions. All other errors are
> > > > > reported
> > > > >    via the exceptionally(IgniteException -> {}) callback even if
> the
> > > > async
> > > > >    method was executed synchronously.
> > > > >
> > > >
> > > > This is ok to me.
> > > >
> > > >
> > > > >
> > > > >
> > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com
> > > > > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I would like to start the discussion about error handling in
> > Ignite 3
> > > > and
> > > > > > how we can improve it compared to Ignite 2.
> > > > > >
> > > > > > The error handling in Ignite 2 was not very good because of
> generic
> > > > > > CacheException thrown on almost any occasion, having deeply
> nested
> > > root
> > > > > > cause and often containing no useful information on further steps
> > to
> > > > fix
> > > > > > the issue.
> > > > > >
> > > > > > I aim to fix it by introducing some rules on error handling.
> > > > > >
> > > > > > *Public exception structure.*
> > > > > >
> > > > > > A public exception must have an error code, a cause, and an
> action.
> > > > > >
> > > > > > * The code - the combination of 2 byte scope id and 2 byte error
> > > number
> > > > > > within the module. This allows up to 2^16 errors for each scope,
> > > which
> > > > > > should be enough. The error code string representation can look
> > like
> > > > > > RFT-0001 or TBL-0001
> > > > > > * The cause - short string description of an issue, readable by
> > user.
> > > > > This
> > > > > > can have dynamic parameters depending on the error type for
> better
> > > user
> > > > > > experience, like "Can't write a snapshot, no space left on device
> > > {0}"
> > > > > > * The action - steps for a user to resolve error situation
> > described
> > > in
> > > > > the
> > > > > > documentation in the corresponding error section, for example
> > "Clean
> > > up
> > > > > > disk space and retry the operation".
> > > > > >
> > > > > > Common errors should have their own scope, for example IGN-0001
> > > > > >
> > > > > > All public methods throw only unchecked
> > > > > > org.apache.ignite.lang.IgniteException containing aforementioned
> > > > fields.
> > > > > > Each public method must have a section in the javadoc with a list
> > of
> > > > all
> > > > > > possible error codes for this method.
> > > > > >
> > > > > > A good example with similar structure can be found here [1]
> > > > > >
> > > > > > *Async timeouts.*
> > > > > >
> > > > > > Because almost all API methods in Ignite 3 are async, they all
> will
> > > > have
> > > > > a
> > > > > > configurable default timeout and can complete with timeout error
> > if a
> > > > > > computation is not finished in time, for example if a response
> has
> > > not
> > > > > been
> > > > > > yet received.
> > > > > > I suggest to complete the async op future with TimeoutException
> in
> > > this
> > > > > > case to make it on par with synchronous execution using
> future.get,
> > > > which
> > > > > > will throw java.util.concurrent.TimeoutException on timeout.
> > > > > > For reference, see
> java.util.concurrent.CompletableFuture#orTimeout
> > > > > > No special error code should be used for this scenario.
> > > > > >
> > > > > > *Internal exceptions hierarchy.*
> > > > > >
> > > > > > All internal exceptions should extend
> > > > > > org.apache.ignite.internal.lang.IgniteInternalException for
> checked
> > > > > > exceptions and
> > > > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException
> for
> > > > > > unchecked exceptions.
> > > > > >
> > > > > > Thoughts ?
> > > > > >
> > > > > > [1]
> > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Alexey
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> > >
> > > --
> > > Vladislav Pyatkov
> > >
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Best regards,
Andrey V. Mashenkov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexei Scherbakov <al...@gmail.com>.
чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev <il...@gmail.com>:

> Hello!
>
> > All public methods throw only unchecked
> org.apache.ignite.lang.IgniteException containing aforementioned fields.
> > Each public method must have a section in the javadoc with a list of all
> possible error codes for this method.
>
> I don't think this is feasible at all.
> Imagine javadoc for cache.get() method featuring three pages of possible
> error codes thrown by this method.
>

Of course there is no need to write 3 pages of error codes, this makes no
sense.
I think we can use error ranges here, or, probably, document most important
error scenarios.
The point here is to try to document errors as much as possible.


> Also, updated every two weeks to account for changes in
> underlying mechanisms.
>
> There is still a confusion between "error code for any error in logs" and
> "error code for any pair of method & exception". Which one are we
> discussing really?
>
> This is impossible to track or test, but is susceptible for common
> error-hiding antipattern where all exceptions are caught in cache.get() and
> discarded, and instead a brand new CH-0001 "error in cache.get()" is thrown
> to the user.
>
> Which is certainly not something that anybody wants.
>

Certainly not. We are talking here about root cause - what is exactly the
reason for method call failure.


>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <vl...@gmail.com>:
>
> > Hi Alexei,
> >
> > > Each public method *must *have a section in the javadoc with a list of
> > all possible error codes for this method.
> >
> > I consider it is redundant, because any public exception can be thrown
> from
> > public API.
> > If it not happens today, it may change tomorrow: one will be removed,
> > another one will be added.
> >
> > >Nested exceptions are not forbidden to use. They can provide additional
> > details on the error for debug purposes
> >
> > I see another issue, which is in the Ignite 2.x, but not attend here. We
> > can have a deep nested exception and use it for handling.
> > In the result, all time when we are handling an exception we use
> > pattern like this:
> > try{
> > ...
> > }
> > catch (Exception e) {
> >     if (X.hasCause(e, TimeoutException.class))
> >         throw e;
> >
> >     if (X.hasCause(e, ConnectException.class, EOFException.class))
> >         continue;
> >
> >     if (X.hasCause(e, InterruptedException.class))
> >         return false;
> > }
> >
> > If we have a pant to make only one exception to the client side, we can
> > also do it for an internal exception.
> >
> > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > Alexey,
> > >
> > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <
> kukushkinalexey@gmail.com
> > >:
> > >
> > > > Just some points looking questionable to me, although I realize the
> > error
> > > > handling style may be very opinionated:
> > > >
> > > >    - Would it make sense splitting the proposed composite error code
> > > >    (TBL-0001) into separate numeric code (0001) and scope/category
> > > ("TBL")
> > > > to
> > > >    avoid parsing the code when an error handler needs to analyze only
> > the
> > > >    category or the code?
> > > >
> > >
> > >   TBL-0001 is a *string representation* of the error. It is built from
> 2
> > > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > > internally packed in int. No any kind of parsing will be necessary to
> > read
> > > scope/category.
> > >
> > >
> > > >    - "*The cause - short string description of an issue, readable by
> > > > user.*".
> > > >    This terminology sounds confusing to me since that "cause" sounds
> > like
> > > > Java
> > > >    Throwable's Message to me and the "Cause" is a lower level
> > exception.
> > > >
> > >
> > > The string describes the cause of error, so the name. I'm ok to rename
> it
> > > to a message. It will be stored in IgniteException.message field
> anyway.
> > >
> > >
> > > >    - "*The action - steps for a user to resolve error...*". The
> action
> > is
> > > >    very important but do we want to make it part of the
> > IgniteException?
> > > I
> > > > do
> > > >    not think the recovery action text should be part of the
> exception.
> > > >    IgniteException may include a URL pointing to the corresponding
> > > >    documentation - this is discussable.
> > > >
> > >
> > > This will not be the part of the exception. A user should visit the
> > > documentation page and read the action section by corresponding error
> > code.
> > >
> > >
> > > >    - "*All public methods throw only unchecked IgniteException*" - I
> > > assume
> > > >    we still respect JCache specification and prefer using standard
> Java
> > > >    exception to signal about invalid parameters.
> > > >
> > >
> > > Using standard java exceptions whenever possible makes sense.
> > >
> > >
> > > >    - Why we do not follow the "classic" structured exception handling
> > > >    practices in Ignite:
> > > >
> > >
> > > Ignite 3 will be multi language, and other languages use other error
> > > processing models. SQL for example uses error codes.
> > > The single exception approach simplifies and unifies error handling
> > across
> > > platforms for me.
> > >
> > >
> > > >       - Why do we not allow using checked exceptions? It seems to me
> > > >       sometimes forcing the user to handle an error serves as a hint
> > and
> > > > thus
> > > >       improves usability. For example, handling an
> > optimistic/pessimistic
> > > >       transaction conflict/deadlock. Or handling a timeout for
> > operations
> > > > with
> > > >       timeouts.
> > > >
> > >
> > > A valid point. Checked exceptions must be used for whose methods, where
> > > error handling is enforced, for example tx optimistic failure.
> > > Such errors will also have corresponding error codes.
> > >
> > >
> > > >       - Why single public IgniteException and no exception hierarchy?
> > > Java
> > > >       is optimized for structured exception handling instead of using
> > > > IF-ELSE to
> > > >       analyze the codes.
> > > >
> > >
> > > Exception hierarchy is not required when using error codes and
> applicable
> > > only to java API, so I would avoid spending efforts on it.
> > >
> > >
> > > >       - Why no nested exceptions? Sometimes an error handler is
> > > interested
> > > >       only in high level exceptions (like Invalid Configuration) and
> > > > sometimes
> > > >       details are needed (like specific configuration parser
> > exceptions).
> > > >
> > >
> > > Nested exceptions are not forbidden to use. They can provide additional
> > > details on the error for debug purposes, but not strictly required,
> > because
> > > error code + message should provide enough information to the user.
> > >
> > >
> > > >    - For async methods returning a Future we may have a universal
> rule
> > on
> > > >    how to handle exceptions. For example, we may specify that any
> async
> > > > method
> > > >    can throw only invalid argument exceptions. All other errors are
> > > > reported
> > > >    via the exceptionally(IgniteException -> {}) callback even if the
> > > async
> > > >    method was executed synchronously.
> > > >
> > >
> > > This is ok to me.
> > >
> > >
> > > >
> > > >
> > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com
> > > > >:
> > > >
> > > > > Igniters,
> > > > >
> > > > > I would like to start the discussion about error handling in
> Ignite 3
> > > and
> > > > > how we can improve it compared to Ignite 2.
> > > > >
> > > > > The error handling in Ignite 2 was not very good because of generic
> > > > > CacheException thrown on almost any occasion, having deeply nested
> > root
> > > > > cause and often containing no useful information on further steps
> to
> > > fix
> > > > > the issue.
> > > > >
> > > > > I aim to fix it by introducing some rules on error handling.
> > > > >
> > > > > *Public exception structure.*
> > > > >
> > > > > A public exception must have an error code, a cause, and an action.
> > > > >
> > > > > * The code - the combination of 2 byte scope id and 2 byte error
> > number
> > > > > within the module. This allows up to 2^16 errors for each scope,
> > which
> > > > > should be enough. The error code string representation can look
> like
> > > > > RFT-0001 or TBL-0001
> > > > > * The cause - short string description of an issue, readable by
> user.
> > > > This
> > > > > can have dynamic parameters depending on the error type for better
> > user
> > > > > experience, like "Can't write a snapshot, no space left on device
> > {0}"
> > > > > * The action - steps for a user to resolve error situation
> described
> > in
> > > > the
> > > > > documentation in the corresponding error section, for example
> "Clean
> > up
> > > > > disk space and retry the operation".
> > > > >
> > > > > Common errors should have their own scope, for example IGN-0001
> > > > >
> > > > > All public methods throw only unchecked
> > > > > org.apache.ignite.lang.IgniteException containing aforementioned
> > > fields.
> > > > > Each public method must have a section in the javadoc with a list
> of
> > > all
> > > > > possible error codes for this method.
> > > > >
> > > > > A good example with similar structure can be found here [1]
> > > > >
> > > > > *Async timeouts.*
> > > > >
> > > > > Because almost all API methods in Ignite 3 are async, they all will
> > > have
> > > > a
> > > > > configurable default timeout and can complete with timeout error
> if a
> > > > > computation is not finished in time, for example if a response has
> > not
> > > > been
> > > > > yet received.
> > > > > I suggest to complete the async op future with TimeoutException in
> > this
> > > > > case to make it on par with synchronous execution using future.get,
> > > which
> > > > > will throw java.util.concurrent.TimeoutException on timeout.
> > > > > For reference, see java.util.concurrent.CompletableFuture#orTimeout
> > > > > No special error code should be used for this scenario.
> > > > >
> > > > > *Internal exceptions hierarchy.*
> > > > >
> > > > > All internal exceptions should extend
> > > > > org.apache.ignite.internal.lang.IgniteInternalException for checked
> > > > > exceptions and
> > > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException for
> > > > > unchecked exceptions.
> > > > >
> > > > > Thoughts ?
> > > > >
> > > > > [1]
> > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Alexey
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> > Vladislav Pyatkov
> >
>


-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

> All public methods throw only unchecked
org.apache.ignite.lang.IgniteException containing aforementioned fields.
> Each public method must have a section in the javadoc with a list of all
possible error codes for this method.

I don't think this is feasible at all.
Imagine javadoc for cache.get() method featuring three pages of possible
error codes thrown by this method.
Also, updated every two weeks to account for changes in
underlying mechanisms.

There is still a confusion between "error code for any error in logs" and
"error code for any pair of method & exception". Which one are we
discussing really?

This is impossible to track or test, but is susceptible for common
error-hiding antipattern where all exceptions are caught in cache.get() and
discarded, and instead a brand new CH-0001 "error in cache.get()" is thrown
to the user.

Which is certainly not something that anybody wants.

Regards,
-- 
Ilya Kasnacheev


чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov <vl...@gmail.com>:

> Hi Alexei,
>
> > Each public method *must *have a section in the javadoc with a list of
> all possible error codes for this method.
>
> I consider it is redundant, because any public exception can be thrown from
> public API.
> If it not happens today, it may change tomorrow: one will be removed,
> another one will be added.
>
> >Nested exceptions are not forbidden to use. They can provide additional
> details on the error for debug purposes
>
> I see another issue, which is in the Ignite 2.x, but not attend here. We
> can have a deep nested exception and use it for handling.
> In the result, all time when we are handling an exception we use
> pattern like this:
> try{
> ...
> }
> catch (Exception e) {
>     if (X.hasCause(e, TimeoutException.class))
>         throw e;
>
>     if (X.hasCause(e, ConnectException.class, EOFException.class))
>         continue;
>
>     if (X.hasCause(e, InterruptedException.class))
>         return false;
> }
>
> If we have a pant to make only one exception to the client side, we can
> also do it for an internal exception.
>
> On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > Alexey,
> >
> > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <kukushkinalexey@gmail.com
> >:
> >
> > > Just some points looking questionable to me, although I realize the
> error
> > > handling style may be very opinionated:
> > >
> > >    - Would it make sense splitting the proposed composite error code
> > >    (TBL-0001) into separate numeric code (0001) and scope/category
> > ("TBL")
> > > to
> > >    avoid parsing the code when an error handler needs to analyze only
> the
> > >    category or the code?
> > >
> >
> >   TBL-0001 is a *string representation* of the error. It is built from 2
> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> > internally packed in int. No any kind of parsing will be necessary to
> read
> > scope/category.
> >
> >
> > >    - "*The cause - short string description of an issue, readable by
> > > user.*".
> > >    This terminology sounds confusing to me since that "cause" sounds
> like
> > > Java
> > >    Throwable's Message to me and the "Cause" is a lower level
> exception.
> > >
> >
> > The string describes the cause of error, so the name. I'm ok to rename it
> > to a message. It will be stored in IgniteException.message field anyway.
> >
> >
> > >    - "*The action - steps for a user to resolve error...*". The action
> is
> > >    very important but do we want to make it part of the
> IgniteException?
> > I
> > > do
> > >    not think the recovery action text should be part of the exception.
> > >    IgniteException may include a URL pointing to the corresponding
> > >    documentation - this is discussable.
> > >
> >
> > This will not be the part of the exception. A user should visit the
> > documentation page and read the action section by corresponding error
> code.
> >
> >
> > >    - "*All public methods throw only unchecked IgniteException*" - I
> > assume
> > >    we still respect JCache specification and prefer using standard Java
> > >    exception to signal about invalid parameters.
> > >
> >
> > Using standard java exceptions whenever possible makes sense.
> >
> >
> > >    - Why we do not follow the "classic" structured exception handling
> > >    practices in Ignite:
> > >
> >
> > Ignite 3 will be multi language, and other languages use other error
> > processing models. SQL for example uses error codes.
> > The single exception approach simplifies and unifies error handling
> across
> > platforms for me.
> >
> >
> > >       - Why do we not allow using checked exceptions? It seems to me
> > >       sometimes forcing the user to handle an error serves as a hint
> and
> > > thus
> > >       improves usability. For example, handling an
> optimistic/pessimistic
> > >       transaction conflict/deadlock. Or handling a timeout for
> operations
> > > with
> > >       timeouts.
> > >
> >
> > A valid point. Checked exceptions must be used for whose methods, where
> > error handling is enforced, for example tx optimistic failure.
> > Such errors will also have corresponding error codes.
> >
> >
> > >       - Why single public IgniteException and no exception hierarchy?
> > Java
> > >       is optimized for structured exception handling instead of using
> > > IF-ELSE to
> > >       analyze the codes.
> > >
> >
> > Exception hierarchy is not required when using error codes and applicable
> > only to java API, so I would avoid spending efforts on it.
> >
> >
> > >       - Why no nested exceptions? Sometimes an error handler is
> > interested
> > >       only in high level exceptions (like Invalid Configuration) and
> > > sometimes
> > >       details are needed (like specific configuration parser
> exceptions).
> > >
> >
> > Nested exceptions are not forbidden to use. They can provide additional
> > details on the error for debug purposes, but not strictly required,
> because
> > error code + message should provide enough information to the user.
> >
> >
> > >    - For async methods returning a Future we may have a universal rule
> on
> > >    how to handle exceptions. For example, we may specify that any async
> > > method
> > >    can throw only invalid argument exceptions. All other errors are
> > > reported
> > >    via the exceptionally(IgniteException -> {}) callback even if the
> > async
> > >    method was executed synchronously.
> > >
> >
> > This is ok to me.
> >
> >
> > >
> > >
> > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com
> > > >:
> > >
> > > > Igniters,
> > > >
> > > > I would like to start the discussion about error handling in Ignite 3
> > and
> > > > how we can improve it compared to Ignite 2.
> > > >
> > > > The error handling in Ignite 2 was not very good because of generic
> > > > CacheException thrown on almost any occasion, having deeply nested
> root
> > > > cause and often containing no useful information on further steps to
> > fix
> > > > the issue.
> > > >
> > > > I aim to fix it by introducing some rules on error handling.
> > > >
> > > > *Public exception structure.*
> > > >
> > > > A public exception must have an error code, a cause, and an action.
> > > >
> > > > * The code - the combination of 2 byte scope id and 2 byte error
> number
> > > > within the module. This allows up to 2^16 errors for each scope,
> which
> > > > should be enough. The error code string representation can look like
> > > > RFT-0001 or TBL-0001
> > > > * The cause - short string description of an issue, readable by user.
> > > This
> > > > can have dynamic parameters depending on the error type for better
> user
> > > > experience, like "Can't write a snapshot, no space left on device
> {0}"
> > > > * The action - steps for a user to resolve error situation described
> in
> > > the
> > > > documentation in the corresponding error section, for example "Clean
> up
> > > > disk space and retry the operation".
> > > >
> > > > Common errors should have their own scope, for example IGN-0001
> > > >
> > > > All public methods throw only unchecked
> > > > org.apache.ignite.lang.IgniteException containing aforementioned
> > fields.
> > > > Each public method must have a section in the javadoc with a list of
> > all
> > > > possible error codes for this method.
> > > >
> > > > A good example with similar structure can be found here [1]
> > > >
> > > > *Async timeouts.*
> > > >
> > > > Because almost all API methods in Ignite 3 are async, they all will
> > have
> > > a
> > > > configurable default timeout and can complete with timeout error if a
> > > > computation is not finished in time, for example if a response has
> not
> > > been
> > > > yet received.
> > > > I suggest to complete the async op future with TimeoutException in
> this
> > > > case to make it on par with synchronous execution using future.get,
> > which
> > > > will throw java.util.concurrent.TimeoutException on timeout.
> > > > For reference, see java.util.concurrent.CompletableFuture#orTimeout
> > > > No special error code should be used for this scenario.
> > > >
> > > > *Internal exceptions hierarchy.*
> > > >
> > > > All internal exceptions should extend
> > > > org.apache.ignite.internal.lang.IgniteInternalException for checked
> > > > exceptions and
> > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException for
> > > > unchecked exceptions.
> > > >
> > > > Thoughts ?
> > > >
> > > > [1]
> https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Alexey
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Vladislav Pyatkov
>

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Vladislav Pyatkov <vl...@gmail.com>.
Hi Alexei,

> Each public method *must *have a section in the javadoc with a list of
all possible error codes for this method.

I consider it is redundant, because any public exception can be thrown from
public API.
If it not happens today, it may change tomorrow: one will be removed,
another one will be added.

>Nested exceptions are not forbidden to use. They can provide additional
details on the error for debug purposes

I see another issue, which is in the Ignite 2.x, but not attend here. We
can have a deep nested exception and use it for handling.
In the result, all time when we are handling an exception we use
pattern like this:
try{
...
}
catch (Exception e) {
    if (X.hasCause(e, TimeoutException.class))
        throw e;

    if (X.hasCause(e, ConnectException.class, EOFException.class))
        continue;

    if (X.hasCause(e, InterruptedException.class))
        return false;
}

If we have a pant to make only one exception to the client side, we can
also do it for an internal exception.

On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Alexey,
>
> ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <ku...@gmail.com>:
>
> > Just some points looking questionable to me, although I realize the error
> > handling style may be very opinionated:
> >
> >    - Would it make sense splitting the proposed composite error code
> >    (TBL-0001) into separate numeric code (0001) and scope/category
> ("TBL")
> > to
> >    avoid parsing the code when an error handler needs to analyze only the
> >    category or the code?
> >
>
>   TBL-0001 is a *string representation* of the error. It is built from 2
> byte scope id (mapped to name TBL) and 2 byte number (0001). Both
> internally packed in int. No any kind of parsing will be necessary to read
> scope/category.
>
>
> >    - "*The cause - short string description of an issue, readable by
> > user.*".
> >    This terminology sounds confusing to me since that "cause" sounds like
> > Java
> >    Throwable's Message to me and the "Cause" is a lower level exception.
> >
>
> The string describes the cause of error, so the name. I'm ok to rename it
> to a message. It will be stored in IgniteException.message field anyway.
>
>
> >    - "*The action - steps for a user to resolve error...*". The action is
> >    very important but do we want to make it part of the IgniteException?
> I
> > do
> >    not think the recovery action text should be part of the exception.
> >    IgniteException may include a URL pointing to the corresponding
> >    documentation - this is discussable.
> >
>
> This will not be the part of the exception. A user should visit the
> documentation page and read the action section by corresponding error code.
>
>
> >    - "*All public methods throw only unchecked IgniteException*" - I
> assume
> >    we still respect JCache specification and prefer using standard Java
> >    exception to signal about invalid parameters.
> >
>
> Using standard java exceptions whenever possible makes sense.
>
>
> >    - Why we do not follow the "classic" structured exception handling
> >    practices in Ignite:
> >
>
> Ignite 3 will be multi language, and other languages use other error
> processing models. SQL for example uses error codes.
> The single exception approach simplifies and unifies error handling across
> platforms for me.
>
>
> >       - Why do we not allow using checked exceptions? It seems to me
> >       sometimes forcing the user to handle an error serves as a hint and
> > thus
> >       improves usability. For example, handling an optimistic/pessimistic
> >       transaction conflict/deadlock. Or handling a timeout for operations
> > with
> >       timeouts.
> >
>
> A valid point. Checked exceptions must be used for whose methods, where
> error handling is enforced, for example tx optimistic failure.
> Such errors will also have corresponding error codes.
>
>
> >       - Why single public IgniteException and no exception hierarchy?
> Java
> >       is optimized for structured exception handling instead of using
> > IF-ELSE to
> >       analyze the codes.
> >
>
> Exception hierarchy is not required when using error codes and applicable
> only to java API, so I would avoid spending efforts on it.
>
>
> >       - Why no nested exceptions? Sometimes an error handler is
> interested
> >       only in high level exceptions (like Invalid Configuration) and
> > sometimes
> >       details are needed (like specific configuration parser exceptions).
> >
>
> Nested exceptions are not forbidden to use. They can provide additional
> details on the error for debug purposes, but not strictly required, because
> error code + message should provide enough information to the user.
>
>
> >    - For async methods returning a Future we may have a universal rule on
> >    how to handle exceptions. For example, we may specify that any async
> > method
> >    can throw only invalid argument exceptions. All other errors are
> > reported
> >    via the exceptionally(IgniteException -> {}) callback even if the
> async
> >    method was executed synchronously.
> >
>
> This is ok to me.
>
>
> >
> >
> > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com
> > >:
> >
> > > Igniters,
> > >
> > > I would like to start the discussion about error handling in Ignite 3
> and
> > > how we can improve it compared to Ignite 2.
> > >
> > > The error handling in Ignite 2 was not very good because of generic
> > > CacheException thrown on almost any occasion, having deeply nested root
> > > cause and often containing no useful information on further steps to
> fix
> > > the issue.
> > >
> > > I aim to fix it by introducing some rules on error handling.
> > >
> > > *Public exception structure.*
> > >
> > > A public exception must have an error code, a cause, and an action.
> > >
> > > * The code - the combination of 2 byte scope id and 2 byte error number
> > > within the module. This allows up to 2^16 errors for each scope, which
> > > should be enough. The error code string representation can look like
> > > RFT-0001 or TBL-0001
> > > * The cause - short string description of an issue, readable by user.
> > This
> > > can have dynamic parameters depending on the error type for better user
> > > experience, like "Can't write a snapshot, no space left on device {0}"
> > > * The action - steps for a user to resolve error situation described in
> > the
> > > documentation in the corresponding error section, for example "Clean up
> > > disk space and retry the operation".
> > >
> > > Common errors should have their own scope, for example IGN-0001
> > >
> > > All public methods throw only unchecked
> > > org.apache.ignite.lang.IgniteException containing aforementioned
> fields.
> > > Each public method must have a section in the javadoc with a list of
> all
> > > possible error codes for this method.
> > >
> > > A good example with similar structure can be found here [1]
> > >
> > > *Async timeouts.*
> > >
> > > Because almost all API methods in Ignite 3 are async, they all will
> have
> > a
> > > configurable default timeout and can complete with timeout error if a
> > > computation is not finished in time, for example if a response has not
> > been
> > > yet received.
> > > I suggest to complete the async op future with TimeoutException in this
> > > case to make it on par with synchronous execution using future.get,
> which
> > > will throw java.util.concurrent.TimeoutException on timeout.
> > > For reference, see java.util.concurrent.CompletableFuture#orTimeout
> > > No special error code should be used for this scenario.
> > >
> > > *Internal exceptions hierarchy.*
> > >
> > > All internal exceptions should extend
> > > org.apache.ignite.internal.lang.IgniteInternalException for checked
> > > exceptions and
> > > org.apache.ignite.internal.lang.IgniteInternalCheckedException for
> > > unchecked exceptions.
> > >
> > > Thoughts ?
> > >
> > > [1] https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> > Best regards,
> > Alexey
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Vladislav Pyatkov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexei Scherbakov <al...@gmail.com>.
Alexey,

ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin <ku...@gmail.com>:

> Just some points looking questionable to me, although I realize the error
> handling style may be very opinionated:
>
>    - Would it make sense splitting the proposed composite error code
>    (TBL-0001) into separate numeric code (0001) and scope/category ("TBL")
> to
>    avoid parsing the code when an error handler needs to analyze only the
>    category or the code?
>

  TBL-0001 is a *string representation* of the error. It is built from 2
byte scope id (mapped to name TBL) and 2 byte number (0001). Both
internally packed in int. No any kind of parsing will be necessary to read
scope/category.


>    - "*The cause - short string description of an issue, readable by
> user.*".
>    This terminology sounds confusing to me since that "cause" sounds like
> Java
>    Throwable's Message to me and the "Cause" is a lower level exception.
>

The string describes the cause of error, so the name. I'm ok to rename it
to a message. It will be stored in IgniteException.message field anyway.


>    - "*The action - steps for a user to resolve error...*". The action is
>    very important but do we want to make it part of the IgniteException? I
> do
>    not think the recovery action text should be part of the exception.
>    IgniteException may include a URL pointing to the corresponding
>    documentation - this is discussable.
>

This will not be the part of the exception. A user should visit the
documentation page and read the action section by corresponding error code.


>    - "*All public methods throw only unchecked IgniteException*" - I assume
>    we still respect JCache specification and prefer using standard Java
>    exception to signal about invalid parameters.
>

Using standard java exceptions whenever possible makes sense.


>    - Why we do not follow the "classic" structured exception handling
>    practices in Ignite:
>

Ignite 3 will be multi language, and other languages use other error
processing models. SQL for example uses error codes.
The single exception approach simplifies and unifies error handling across
platforms for me.


>       - Why do we not allow using checked exceptions? It seems to me
>       sometimes forcing the user to handle an error serves as a hint and
> thus
>       improves usability. For example, handling an optimistic/pessimistic
>       transaction conflict/deadlock. Or handling a timeout for operations
> with
>       timeouts.
>

A valid point. Checked exceptions must be used for whose methods, where
error handling is enforced, for example tx optimistic failure.
Such errors will also have corresponding error codes.


>       - Why single public IgniteException and no exception hierarchy? Java
>       is optimized for structured exception handling instead of using
> IF-ELSE to
>       analyze the codes.
>

Exception hierarchy is not required when using error codes and applicable
only to java API, so I would avoid spending efforts on it.


>       - Why no nested exceptions? Sometimes an error handler is interested
>       only in high level exceptions (like Invalid Configuration) and
> sometimes
>       details are needed (like specific configuration parser exceptions).
>

Nested exceptions are not forbidden to use. They can provide additional
details on the error for debug purposes, but not strictly required, because
error code + message should provide enough information to the user.


>    - For async methods returning a Future we may have a universal rule on
>    how to handle exceptions. For example, we may specify that any async
> method
>    can throw only invalid argument exceptions. All other errors are
> reported
>    via the exceptionally(IgniteException -> {}) callback even if the async
>    method was executed synchronously.
>

This is ok to me.


>
>
> вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> >:
>
> > Igniters,
> >
> > I would like to start the discussion about error handling in Ignite 3 and
> > how we can improve it compared to Ignite 2.
> >
> > The error handling in Ignite 2 was not very good because of generic
> > CacheException thrown on almost any occasion, having deeply nested root
> > cause and often containing no useful information on further steps to fix
> > the issue.
> >
> > I aim to fix it by introducing some rules on error handling.
> >
> > *Public exception structure.*
> >
> > A public exception must have an error code, a cause, and an action.
> >
> > * The code - the combination of 2 byte scope id and 2 byte error number
> > within the module. This allows up to 2^16 errors for each scope, which
> > should be enough. The error code string representation can look like
> > RFT-0001 or TBL-0001
> > * The cause - short string description of an issue, readable by user.
> This
> > can have dynamic parameters depending on the error type for better user
> > experience, like "Can't write a snapshot, no space left on device {0}"
> > * The action - steps for a user to resolve error situation described in
> the
> > documentation in the corresponding error section, for example "Clean up
> > disk space and retry the operation".
> >
> > Common errors should have their own scope, for example IGN-0001
> >
> > All public methods throw only unchecked
> > org.apache.ignite.lang.IgniteException containing aforementioned fields.
> > Each public method must have a section in the javadoc with a list of all
> > possible error codes for this method.
> >
> > A good example with similar structure can be found here [1]
> >
> > *Async timeouts.*
> >
> > Because almost all API methods in Ignite 3 are async, they all will have
> a
> > configurable default timeout and can complete with timeout error if a
> > computation is not finished in time, for example if a response has not
> been
> > yet received.
> > I suggest to complete the async op future with TimeoutException in this
> > case to make it on par with synchronous execution using future.get, which
> > will throw java.util.concurrent.TimeoutException on timeout.
> > For reference, see java.util.concurrent.CompletableFuture#orTimeout
> > No special error code should be used for this scenario.
> >
> > *Internal exceptions hierarchy.*
> >
> > All internal exceptions should extend
> > org.apache.ignite.internal.lang.IgniteInternalException for checked
> > exceptions and
> > org.apache.ignite.internal.lang.IgniteInternalCheckedException for
> > unchecked exceptions.
> >
> > Thoughts ?
> >
> > [1] https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Best regards,
> Alexey
>


-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSSION] Error handling in Ignite 3

Posted by Alexey Kukushkin <ku...@gmail.com>.
Just some points looking questionable to me, although I realize the error
handling style may be very opinionated:

   - Would it make sense splitting the proposed composite error code
   (TBL-0001) into separate numeric code (0001) and scope/category ("TBL") to
   avoid parsing the code when an error handler needs to analyze only the
   category or the code?
   - "*The cause - short string description of an issue, readable by user.*".
   This terminology sounds confusing to me since that "cause" sounds like Java
   Throwable's Message to me and the "Cause" is a lower level exception.
   - "*The action - steps for a user to resolve error...*". The action is
   very important but do we want to make it part of the IgniteException? I do
   not think the recovery action text should be part of the exception.
   IgniteException may include a URL pointing to the corresponding
   documentation - this is discussable.
   - "*All public methods throw only unchecked IgniteException*" - I assume
   we still respect JCache specification and prefer using standard Java
   exception to signal about invalid parameters.
   - Why we do not follow the "classic" structured exception handling
   practices in Ignite:
      - Why do we not allow using checked exceptions? It seems to me
      sometimes forcing the user to handle an error serves as a hint and thus
      improves usability. For example, handling an optimistic/pessimistic
      transaction conflict/deadlock. Or handling a timeout for operations with
      timeouts.
      - Why single public IgniteException and no exception hierarchy? Java
      is optimized for structured exception handling instead of using
IF-ELSE to
      analyze the codes.
      - Why no nested exceptions? Sometimes an error handler is interested
      only in high level exceptions (like Invalid Configuration) and sometimes
      details are needed (like specific configuration parser exceptions).
   - For async methods returning a Future we may have a universal rule on
   how to handle exceptions. For example, we may specify that any async method
   can throw only invalid argument exceptions. All other errors are reported
   via the exceptionally(IgniteException -> {}) callback even if the async
   method was executed synchronously.


вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <alexey.scherbakoff@gmail.com
>:

> Igniters,
>
> I would like to start the discussion about error handling in Ignite 3 and
> how we can improve it compared to Ignite 2.
>
> The error handling in Ignite 2 was not very good because of generic
> CacheException thrown on almost any occasion, having deeply nested root
> cause and often containing no useful information on further steps to fix
> the issue.
>
> I aim to fix it by introducing some rules on error handling.
>
> *Public exception structure.*
>
> A public exception must have an error code, a cause, and an action.
>
> * The code - the combination of 2 byte scope id and 2 byte error number
> within the module. This allows up to 2^16 errors for each scope, which
> should be enough. The error code string representation can look like
> RFT-0001 or TBL-0001
> * The cause - short string description of an issue, readable by user. This
> can have dynamic parameters depending on the error type for better user
> experience, like "Can't write a snapshot, no space left on device {0}"
> * The action - steps for a user to resolve error situation described in the
> documentation in the corresponding error section, for example "Clean up
> disk space and retry the operation".
>
> Common errors should have their own scope, for example IGN-0001
>
> All public methods throw only unchecked
> org.apache.ignite.lang.IgniteException containing aforementioned fields.
> Each public method must have a section in the javadoc with a list of all
> possible error codes for this method.
>
> A good example with similar structure can be found here [1]
>
> *Async timeouts.*
>
> Because almost all API methods in Ignite 3 are async, they all will have a
> configurable default timeout and can complete with timeout error if a
> computation is not finished in time, for example if a response has not been
> yet received.
> I suggest to complete the async op future with TimeoutException in this
> case to make it on par with synchronous execution using future.get, which
> will throw java.util.concurrent.TimeoutException on timeout.
> For reference, see java.util.concurrent.CompletableFuture#orTimeout
> No special error code should be used for this scenario.
>
> *Internal exceptions hierarchy.*
>
> All internal exceptions should extend
> org.apache.ignite.internal.lang.IgniteInternalException for checked
> exceptions and
> org.apache.ignite.internal.lang.IgniteInternalCheckedException for
> unchecked exceptions.
>
> Thoughts ?
>
> [1] https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Best regards,
Alexey