You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Lapin (Jira)" <ji...@apache.org> on 2023/10/31 13:57:00 UTC

[jira] [Commented] (IGNITE-20678) Adding ReplicaMeta#getLeaseholderId to avoid errors during node recovery

    [ https://issues.apache.org/jira/browse/IGNITE-20678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781371#comment-17781371 ] 

Alexander Lapin commented on IGNITE-20678:
------------------------------------------

[~ktkalenko@gridgain.com] LGTM, thanks!

> Adding ReplicaMeta#getLeaseholderId to avoid errors during node recovery
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-20678
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20678
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Tkalenko
>            Assignee: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>          Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> After discussions and code analysis, I found out that this problem needs to be solved using method *org.apache.ignite.internal.placementdriver.PlacementDriver#getPrimaryReplica* on recovery. 
> h3. But now it has the following bug:
> When restarting the cluster (for simplicity, a cluster of one node) on recovery using *PlacementDriver#getPrimaryReplica*, we can get that the local node is a primary replica that has not yet expired (*org.apache.ignite.internal.placementdriver.ReplicaMeta#getExpirationTime* < now). Then start building the index, but the index was already built; it’s just that the replication log did not have time to be applied.
> h3. How to fix the bug:
> Add field *org.apache.ignite.internal.placementdriver.ReplicaMeta#getLeaseholderId*, meaning this is the node ID that is assigned at the start of the node (*org.apache.ignite.network.ClusterNode#id*), which will be needed to check whether the local replica is the primary one.
> If during recovery we use *PlacementDriver#getPrimaryReplica* and get that we are not the primary replica, then we will not build an index, otherwise an honest selection of the primary replica will occur using the replication log.
> h3. Total:
> Corrections related to index building recovey will be made in IGNITE-20544 and IGNITE-20637.
> In this ticket, *org.apache.ignite.internal.placementdriver.ReplicaMeta#getLeaseholderId* will be added and modified so that it works correctly, for example, its serialization/deserialization in *org.apache.ignite.internal.placementdriver.leases.Lease*, and also so that the prolongation of the lease does not occur if *ReplicaMeta#getLeaseholderId* changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)