You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Matija Polajnar (Jira)" <ji...@apache.org> on 2020/05/29 12:05:00 UTC
[jira] [Comment Edited] (IGNITE-12297) Detect lost partitions is not happened during cluster activation

    [ https://issues.apache.org/jira/browse/IGNITE-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975026#comment-16975026 ] 

Matija Polajnar edited comment on IGNITE-12297 at 5/29/20, 12:04 PM:
---------------------------------------------------------------------

For the record, as discussed in IGNITE-10226, this resulted in us getting a *org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Cannot run update query. Node must own all the necessary partitions.* This was happening on a one-node "cluster".

A sane but sometimes difficult to execute workaround was provided by [~jokser]:
{quote}1) Start another node, this is a topology event that will trigger detecting lost partitions.
2) Stop started node
3) If you have partition loss policy != IGNORE trigger explicitly `resetLostPartitions`
It should help to return back partition to OWNING state.
{quote}
It works, but you need to configure another node for the cluster. A dangerous and ugly but more practical workaround is to have this reflection-based method ready to invoke when you need it:

 
{code:java}
public void resetMovingPartitions() {
    try {
        Field igniteKernalField = IgniteSpringBean.class.getDeclaredField("g");
        igniteKernalField.setAccessible(true);
        IgniteKernal igniteKernal = (IgniteKernal)igniteKernalField.get(this);
        GridKernalContextImpl kernalContext = (GridKernalContextImpl)igniteKernal.context();
        kernalContext.cache().context().exchange().scheduleResendPartitions();
    } catch (IllegalAccessException | NoSuchFieldException | ClassCastException e) {
        throw new AssertionError(e);
    }
}
{code}
It works for us.

 


was (Author: matijap):
For the record, as discussed in IGNITE-10266, this resulted in us getting a *org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Cannot run update query. Node must own all the necessary partitions.* This was happening on a one-node "cluster".

A sane but sometimes difficult to execute workaround was provided by [~jokser]:
{quote}1) Start another node, this is a topology event that will trigger detecting lost partitions.
2) Stop started node
3) If you have partition loss policy != IGNORE trigger explicitly `resetLostPartitions`
It should help to return back partition to OWNING state.
{quote}
It works, but you need to configure another node for the cluster. A dangerous and ugly but more practical workaround is to have this reflection-based method ready to invoke when you need it:

 
{code:java}
public void resetMovingPartitions() {
    try {
        Field igniteKernalField = IgniteSpringBean.class.getDeclaredField("g");
        igniteKernalField.setAccessible(true);
        IgniteKernal igniteKernal = (IgniteKernal)igniteKernalField.get(this);
        GridKernalContextImpl kernalContext = (GridKernalContextImpl)igniteKernal.context();
        kernalContext.cache().context().exchange().scheduleResendPartitions();
    } catch (IllegalAccessException | NoSuchFieldException | ClassCastException e) {
        throw new AssertionError(e);
    }
}
{code}
It works for us.

 

> Detect lost partitions is not happened during cluster activation
> ----------------------------------------------------------------
>
>                 Key: IGNITE-12297
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12297
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.4
>            Reporter: Pavel Kovalenko
>            Priority: Major
>              Labels: newbie
>
> We invoke `detectLostPartitions` during PME only if there is a server join or server left.
> However,  we can activate a persistent cluster where a partition may have MOVING status on all nodes. In this case, a partition may stay in MOVING state forever before any other topology event. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)