You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by 38797715 <38...@qq.com> on 2022/10/12 13:49:10 UTC
Re: partitionLossPolicy confused
https://issues.apache.org/jira/browse/IGNITE-17835
在 2022/9/30 18:14, Вячеслав Коптилин 写道:
> Hello,
>
> In general there are two possible ways to handle lost partitions for a
> cluster that uses Ignite Native Persistence:
> 1.
> - Return all failed nodes to baseline topology.
> - Call resetLostPartitions
>
> 2.
> - Stop all remaining nodes in the cluster.
> - Start all nodes in the cluster (including previously failed
> nodes) and activate a cluster.
>
> it’s important to return all failed nodes to the topology before
> calling resetLostPartitions, otherwise a cluster could end up having
> stale data.
>
> If some owners cannot be returned to the topology for a some reason,
> they should be excluded from baseline before attempting resetting lost
> partition state or an ClusterTopologyCheckedException will be thrown
> with a message "Cannot reset lost partitions because no baseline nodes
> are online [cache=someCahe, partition=someLostPart]” indicating safe
> recovery is not possible.
>
> In your particular case, the cache does not have backups and returning
> a node that holds a lost partition should not lead to data
> inconsistencies.
> This particular case can be detected and automatically "resolved". I
> will file a jira ticket in order to address this improvement.
>
> Thanks,
> Slava.
>
> пн, 26 сент. 2022 г. в 16:51, 38797715 <38...@qq.com>:
>
> hello,
>
> Start two nodes with native persistent enabled, and then activate it.
>
> create a table with no backups, sql like follows:
>
> CREATE TABLE City (
> ID INT,
> Name VARCHAR,
> CountryCode CHAR(3),
> District VARCHAR,
> Population INT,
> PRIMARY KEY (ID, CountryCode)
> ) WITH "template=partitioned, affinityKey=CountryCode,
> CACHE_NAME=City, KEY_TYPE=demo.model.CityKey,
> VALUE_TYPE=demo.model.City";
>
> INSERT INTO City(ID, Name, CountryCode, District, Population)
> VALUES (1,'Kabul','AFG','Kabol',1780000);
> INSERT INTO City(ID, Name, CountryCode, District, Population)
> VALUES (2,'Qandahar','AFG','Qandahar',237500);
>
> then execute SELECT COUNT(*) FROM city;
>
> normal.
>
> then kill one node.
>
> then execute SELECT COUNT(*) FROM city;
>
> Failed to execute query because cache partition has been lostPart
> [cacheName=City, part=0]
>
> this alse normal.
>
> Next, start the node that was shut down before.
>
> then execute SELECT COUNT(*) FROM city;
>
> Failed to execute query because cache partition has been lostPart
> [cacheName=City, part=0]
>
> At this time, all partitions have been recovered, and all baseline
> nodes are ONLINE. Why still report this error? It is very
> confusing. Execute reset_lost_partitions operation at this time
> seems redundant. Do have any special considerations here?
>
> if this time restart the whole cluster, thenexecute SELECT
> COUNT(*) FROM city; normal, this state is the same as the previous
> state, but the behavior is different.
>
>
>
>
>