You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Александр Меньшиков <sh...@gmail.com> on 2017/03/31 14:01:45 UTC

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Alexey, I just remind you about issue.

2017-02-20 16:42 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:

> Alexey,
>
> I'm ready to make some conclusions. You can see result immediately here:
> PR:          https://github.com/apache/ignite/pull/1537/files
> CI Tests:  http://ci.ignite.apache.org/project.html?projectId=
> IgniteTests&tab=projectOverview&branch_IgniteTests=pull/1537/head
>
> I fixed only "ClusterTopologyCheckedException", and didn't touched
> "ClusterTopologyException". "ClusterTopologyException" has a same problem
> like "ClusterTopologyCheckedException", but even now changes is huge (80
> files). So if you think fixing of "ClusterTopologyException" is necessary,
> you could add different issue in JIRA (or I can do that). And I will fix it
> in future. I can't implement your idea about using Runtime exception,
> because right now a lot of logic tied to the fact that this is
> "IgniteCheckedException". I real tried to mix it but I couldn't make it
> work.
>
> So before my changes we have 3 affected classes:
> 1. "ClusterTopologyCheckedException".
> 2. "ClusterTopologyServerNotFoundException".
> 3. "ClusterGroupEmptyCheckedException".
>
> Now we there are 5 affected classes:
>
> "ClusterTopologyCheckedException" split into 2 classes:
> 1. "ClusterTopologyCheckedException" (with future).
> 2. "ClusterTopologyLocalException" (without future, and it's parent for "
> ClusterTopologyCheckedException").
>
> "ClusterTopologyServerNotFoundException" rename to
> 3. "ClusterTopologyServerNotFoundLocalException" (For consistency of
> names, I didn't find any cases where "ClusterTopologyServerNotFoundException"
> need his future).
>
> "ClusterGroupEmptyCheckedException" split into 2 classes:
> 4. "ClusterGroupEmptyCheckedException".
> 5. "ClusterGroupEmptyLocalException".
>
> Also in code "ready future" is using for waiting, and don't using for get
> real value (just look at code, all values just ignored). In fact "ready
> future" has type "IgniteFuture<?>". It suggests that result doesn't matter.
>
> 2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <al...@gmail.com>:
>
>> Alexander,
>>
>> I do not see why you would need to overwrite the cause field.
>>
>> What I meant in the previous email is that instead of setting a retry
>> ready future in the checked exception, you would set a correct affinity
>> topology version here. Then, during a construction of CacheException
>> (unchecked, which is guaranteed to be thrown locally and will not be
>> serialized), you would create the retry ready future based on the topology
>> version set.
>>
>> Hope this helps,
>> AG
>>
>> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
>>
>>> Alexey, I ran into some difficulties.
>>>
>>> Look at the method: GridNearTxFinishFuture.CheckBa
>>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res)
>>>
>>>             *Throwable err* = res.checkCommittedError();
>>>
>>>             if (*err* != null) {
>>>                 if (*err* instanceof IgniteCheckedException) {
>>>                     *ClusterTopologyCheckedException cause* =
>>>                         ((IgniteCheckedException)*err*).
>>> *getCause(ClusterTopologyCheckedException.class)*;
>>>
>>>                     if (*cause* != null)
>>>                         *cause.retryReadyFuture(*cctx.ne
>>> xtAffinityReadyFuture(tx.topologyVersion()));
>>> *                                   //^_____Here update the readyFut in
>>> some first "cause". *
>>>                 }
>>>
>>>                 onDone(*err*);
>>>             }
>>>
>>>
>>> So I can't rewrite "cause" field in exception without reflection. It
>>> means if I try to use two exception: one with "readyFut" and second without
>>> "readyFut", I see no way to do it in this place.
>>>
>>> Perhaps I misunderstood your original idea. Or maybe this code some kind
>>> of a crutch and should be removed. What do you think?
>>>
>>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <al...@gmail.com>
>>> :
>>>
>>>> In the example above there is no point of setting the retry future in
>>>> the
>>>> exception because it will be serialized and sent to a remote node.
>>>>
>>>> I see the only possible way to ensure this: make this be verified at
>>>> compile time. This looks like a big change, but the draft may look like
>>>> so:
>>>> 1) Introduce new exception type, like CacheTopology*Checked*Exception
>>>> which
>>>> *must* take the minimum topology version to wait for
>>>> 2) In all cases when a cache operation fails because of topology change,
>>>> provide the appropriate exception
>>>> 3) When CacheTopologyException is being constructed, create a
>>>> corresponding
>>>> local future based on wait version and throw the exception.
>>>>
>>>> Even though this still does not give us 100% guarantee that we did not
>>>> miss
>>>> anything, it should cover a lot more cases than now.
>>>>
>>>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
>>>>
>>>> > Alexey,
>>>> >
>>>> > For GridCacheIoManager#sendNoRetry it's not necessary because
>>>> exception
>>>> > will be ignored (or will cast to String). Perhaps my message was
>>>> unclear.
>>>> > I try to say that three methods can throw
>>>> "ClusterTopologyCheckedException"
>>>> > without any problem.
>>>> >
>>>> > But you are right, in some methods I can't set "retryFuture". We must
>>>> > ensure that exception doesn't escape away. The best option is to
>>>> ignore it
>>>> > (or cast to String).
>>>> >
>>>> > But any way, look here:
>>>> >
>>>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
>>>> boolean
>>>> > committed, GridCacheVersion nearTxId)
>>>> >
>>>> > Inside you can found next lines:
>>>> > __________________________
>>>> > ClusterTopologyCheckedException *cause* =
>>>> >     new ClusterTopologyCheckedException("Primary node left grid.");
>>>> >
>>>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed
>>>> to
>>>> > commit transaction " +
>>>> >     "(transaction has been rolled back on backup node): " +
>>>> req.version(),
>>>> > *cause*));
>>>> > __________________________
>>>> >
>>>> > How do you unsure that *"cause"* can't come to
>>>> GridCacheUtils#retryTopologySafe(final
>>>>
>>>> > Callable<S> c) ?
>>>> >
>>>> >
>>>> > Perhaps I don't know some rules which you use to maintain it now?
>>>> >
>>>> >
>>>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <
>>>> alexey.goncharuk@gmail.com>:
>>>> >
>>>> >> Alexander,
>>>> >>
>>>> >> Do you have a use-case in mind which results in
>>>> IgniteTopologyException
>>>> >> thrown from public API with retryFuture unset?
>>>> >>
>>>> >> retryFuture makes sense only for certain cache operations and may be
>>>> set
>>>> >> only in certain places in the code where we already know what to
>>>> wait for.
>>>> >> I do not see how you can initialize this future, for example, in
>>>> >>  GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>>>> msg,
>>>> >> byte
>>>> >> plc)
>>>> >>
>>>> >> --AG
>>>> >>
>>>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com
>>>> >:
>>>> >>
>>>> >> > I found a lot of using "ClusterTopologyCheckedException"
>>>> exception. In
>>>> >> > most
>>>> >> > use-case these exceptions were just ignored. It's easier to see if
>>>> >> > issue-4612 will be applied (I'm waiting for code review by my team
>>>> >> leader)
>>>> >> > where I renamed all unused exceptions in the "ignored".
>>>> >> >
>>>> >> > It means that in some case "retryReadyFuture" can left unset. But
>>>> there
>>>> >> are
>>>> >> > code where exception is being getting from "get()" method in some
>>>> >> "future"
>>>> >> > class and don't ignored (exception is sending to method
>>>> >> > "GridFutureAdapter#onDone()" for example).
>>>> >> >
>>>> >> > So I decided not to do a deep global analysis of code and make some
>>>> >> simple
>>>> >> > rules.
>>>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
>>>> >> ignores
>>>> >> > it
>>>> >> > (or converts to String, there are some variants to lose exception
>>>> like
>>>> >> > object) in all methods that call the method "X", then we can don't
>>>> set
>>>> >> > "retryReadyFuture" for optimization. But this should be clarified
>>>> in
>>>> >> > javadoc.
>>>> >> > 2. But if exception escapes at least once like object: we must set
>>>> >> > "retryReadyFuture" in that method.
>>>> >> >
>>>> >> > A few times when call tree was simple, I went deeper.
>>>> >> >
>>>> >> > So I found only three methods which can throw
>>>> >> > "ClusterTopologyCheckedException" without setting
>>>> "retryReadyFuture":
>>>> >> >
>>>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
>>>> >> > IgfsCommunicationMessage msg)
>>>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node,
>>>> GridCacheMessage
>>>> >> msg,
>>>> >> > byte plc)
>>>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
>>>> >> topic,
>>>> >> > GridCacheMessage msg, byte plc, long timeout)
>>>> >> >
>>>> >> > In all other methods "ClusterTopologyCheckedException" escapes
>>>> away too
>>>> >> > far.
>>>> >> >
>>>> >> > What do you think about this approach to make compromise between
>>>> >> > optimization and correctness?
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>