You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Stanislav Lukyanov (Jira)" <ji...@apache.org> on 2021/09/30 10:49:00 UTC

[jira] [Created] (IGNITE-15653) Automatic resetLostPartitions()

Stanislav Lukyanov created IGNITE-15653:
-------------------------------------------

             Summary: Automatic resetLostPartitions()
                 Key: IGNITE-15653
                 URL: https://issues.apache.org/jira/browse/IGNITE-15653
             Project: Ignite
          Issue Type: Bug
            Reporter: Stanislav Lukyanov


h1. Motivation

Requests to automate `resetLostPartitions()` are inccredibly popular among Ignite users. One could say that, in the world of k8s and other orchestrators, handling of partition loss, rebalance, and graceful shutdowns remains the most difficult thing to automated.

 

`shutdownPolicy=GRACEFUL` should already be enough to prevent partition loss under normal operation (given proper configuration). However, it would be really nice to implement automatic recovery from partition loss for cases when nodes crash uncontrollably (e.g. OOM from an SQL query).

 

`resetLostPartitions()` is potentially dangerous as it fixes the state of the partitions that's currently in the cluster: if some data wasn't returned, it's lost forever. That's why any automation is tricky here.
h1. Proposal

I suggest that we enable a distributed boolean property `resetLostPartitionsAutomatically` that's `false` by default (for safety and compatibility). When the property is enabled, it allows the cluster to reset lost partitions automatically when it decides it should be safe.

To implement this, we store the maximum update counter for each partition in the metastore on each PME. Then we'll know the lower bound for each partition counter at all times.

When the cluster has partition P lost, on each PME the coordinator checks P's state in the cluster (`pCluster`) vs what's in the metastore (`pStore`) with the following pseudocode:
{code:java}
maxCurrentUpdateCounter = max(map(pCluster.owners, o -> updateCounter));
lastKnownUpdateCounter = pStore.updateCounter

if (maxCurrentUpdateCounter >= lastKnownUpdateCounter)
    resetLostPartition(p); // current would-be-primary has higher update counter than recorded on last PME; so, there should be no data loss and resetLostPartition is safe
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)