You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Denis Chudov (Jira)" <ji...@apache.org> on 2021/04/29 12:46:00 UTC

[jira] [Comment Edited] (IGNITE-14474) Improve error message in case rebalance fails

    [ https://issues.apache.org/jira/browse/IGNITE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335430#comment-17335430 ] 

Denis Chudov edited comment on IGNITE-14474 at 4/29/21, 12:45 PM:
------------------------------------------------------------------

[~Smolnikov] LGTM, pls fix typo in test (you can just apply suggested change on github) and proceed to commiter's review.


was (Author: denis chudov):
[~Smolnikov] LGTM, pls fix typo in test (you can just apply suggested change on github) and proceed to core team review.

> Improve error message in case rebalance fails
> ---------------------------------------------
>
>                 Key: IGNITE-14474
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14474
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 2.5
>            Reporter: Denis Chudov
>            Assignee: Rodion
>            Priority: Major
>             Fix For: 2.9.2
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently we can get a message like this when rebalance fails with an exception (examples from ignite 2.5, in newer versions the log messages were changed but the problem is still actual):
> {code:java}
> 2019-11-27 13:41:14,504[WARN ][utility-#79%xxx%][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topic=0]. Supply message couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to unmarshal object with optimized marshaller
> 2019-11-27 13:41:14,504[INFO ][utility-#79%xxx%][GridDhtPartitionDemander] Cancelled rebalancing [grp=ignite-sys-cache, supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], time=88 ms]
> 2019-11-27 13:41:14,508[WARN ][utility-#76%xxx%][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], supplier=dfa5ee06-48c9-4458-ae55-48cc6ceda998, topic=0]. Supply message couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to unmarshal object with optimized marshaller
> {code}
> In the case above, a marshalling exception leads to rebalance failure which will never be resolved - i.e. the cluster enters into a erroneous state.
> We should report issues like this as ERROR. The message should explain that the rebalance has failed, data for the cache was not fully copied to the node, the backup factor is not recovered and the cluster may not work correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)