You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by 18624049226 <18...@163.com> on 2021/04/19 07:56:04 UTC

Re: Partition loss and data balancing

Hello Alexei,

Has this problem been solved? What is the corresponding JIRA ID?

在 2020/5/4 下午9:51, Alexei Scherbakov 写道:
> Hi.
>
> 1.org.apache.ignite.IgniteCache#lostPartitions.
>
> 2. If a number of simulatneoulsy failed nodes >= backups + 1 a data 
> loss will happen.
> The loss is permanent for in-memory caches and temporary for 
> persistent caches, until failed nodes with a data returned to the grid 
> and resetLostPartitions is called.
>
> 3. Looks like this is known issue (not related to a partition loss) - 
> if a node is restarted in persistent mode and no write load on the 
> grid, the partition distribution will not be switched back to ideal 
> after restart, causing load imbalance.
> The fix is ready but not yet merged to Ignite.
> The workaround is to restart all nodes.
>
>
> вс, 26 апр. 2020 г. в 15:10, 18624049226 <18624049226@163.com 
> <ma...@163.com>>:
>
>     The screenshot of the data imbalance is as follows:
>
>     在 2020/4/26 下午5:52, 18624049226 写道:
>>
>>     Hi community,
>>
>>     We have 4 servers, 4 tables, backups = 1, cacheMode =
>>     PARTITIONED,partitionlosspolicy =
>>     READ_ONLY_SAFE,persistenceEnabled = true,ignite version 2.7.6
>>
>>     When deleting the data of a table with a large amount of data,
>>     three servers failed due to OutOfMemory.
>>
>>     After starting the failed 3 servers, it is found that the table
>>     data cannot be queried, and the following errors are throw:
>>
>>      err=Failed to execute query because cache partition has been lost
>>
>>     At this time, execute the following command:
>>
>>     ./control.sh --cache reset_lost_partitions tableName;
>>
>>     Then, the table data can be queried, and the total amount of data
>>     is correct without data loss.
>>
>>     However, if you execute the cache -a command in the visor, the
>>     following situations will occur:
>>
>>     We find that there is no primary partition data in one server,
>>     and no backup partition data in one server, which leads to
>>     significant data imbalance, and all partition tables have data
>>     imbalance, which is the same distribution pattern as the above
>>     figure.
>>
>>     At this time, if the entire cluster is restarted, everything will
>>     return to normal, the data distribution is as follows:
>>
>>     My question is:
>>
>>     1.Is there any way to see which partitions of which nodes are lost?
>>
>>     2.In the end, it seems that there is no real partition loss, but
>>     the status is wrong. What is the reason for the partition loss?
>>
>>     3.What are the reasons for the data imbalance? Besides add /
>>     remove nodes, is there any way that can trigger data rebalancing
>>     manually?
>>
>>
>
>
> -- 
>
> Best regards,
> Alexei Scherbakov

Re: Partition loss and data balancing

Posted by Alexei Scherbakov <al...@gmail.com>.
Hi,

If I recall correctly it was fixed by [1]

https://issues.apache.org/jira/browse/IGNITE-13147

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Без
вирусов. www.avast.ru
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

пн, 19 апр. 2021 г. в 10:56, 18624049226 <18...@163.com>:

> Hello Alexei,
>
> Has this problem been solved? What is the corresponding JIRA ID?
> 在 2020/5/4 下午9:51, Alexei Scherbakov 写道:
>
> Hi.
>
> 1.org.apache.ignite.IgniteCache#lostPartitions.
>
> 2. If a number of simulatneoulsy failed nodes >= backups + 1 a data loss
> will happen.
> The loss is permanent for in-memory caches and temporary for persistent
> caches, until failed nodes with a data returned to the grid and
> resetLostPartitions is called.
>
> 3. Looks like this is known issue (not related to a partition loss) - if a
> node is restarted in persistent mode and no write load on the grid, the
> partition distribution will not be switched back to ideal after restart,
> causing load imbalance.
> The fix is ready but not yet merged to Ignite.
> The workaround is to restart all nodes.
>
>
> вс, 26 апр. 2020 г. в 15:10, 18624049226 <18...@163.com>:
>
>> The screenshot of the data imbalance is as follows:
>>
>> 在 2020/4/26 下午5:52, 18624049226 写道:
>>
>> Hi community,
>>
>> We have 4 servers, 4 tables, backups = 1, cacheMode =
>> PARTITIONED,partitionlosspolicy = READ_ONLY_SAFE,persistenceEnabled =
>> true,ignite version 2.7.6
>>
>> When deleting the data of a table with a large amount of data, three
>> servers failed due to OutOfMemory.
>>
>> After starting the failed 3 servers, it is found that the table data
>> cannot be queried, and the following errors are throw:
>>
>>  err=Failed to execute query because cache partition has been lost
>>
>> At this time, execute the following command:
>>
>> ./control.sh --cache reset_lost_partitions tableName;
>>
>> Then, the table data can be queried, and the total amount of data is
>> correct without data loss.
>>
>> However, if you execute the cache -a command in the visor, the following
>> situations will occur:
>>
>> We find that there is no primary partition data in one server, and no
>> backup partition data in one server, which leads to significant data
>> imbalance, and all partition tables have data imbalance, which is the same
>> distribution pattern as the above figure.
>>
>> At this time, if the entire cluster is restarted, everything will return
>> to normal, the data distribution is as follows:
>>
>> My question is:
>>
>> 1.Is there any way to see which partitions of which nodes are lost?
>>
>> 2.In the end, it seems that there is no real partition loss, but the
>> status is wrong. What is the reason for the partition loss?
>>
>> 3.What are the reasons for the data imbalance? Besides add / remove
>> nodes, is there any way that can trigger data rebalancing manually?
>>
>>
>>
>
> --
>
> Best regards,
> Alexei Scherbakov
>
>

-- 

Best regards,
Alexei Scherbakov