You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Denis Magda <dm...@apache.org> on 2017/10/26 18:09:21 UTC

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

+ dev list

This scenario has to be handled automatically by Ignite. Seems like a bug. Please refer to the initial description of the issue. Alex G, please have a look:

To reproduce: 
1. create a replicated cache with multiple indexedtypes, with some indexes
2. Start first server node
3. Insert data into cache (1000000 entries)
4. Start second server node

At this point, seems all is ok, data is apparently successfully rebalanced
making sql queries (count(*))

5. Stop server nodes
6. Restart server nodes
7. Doing sql queries (count(*)) returns less data

—
Denis

> On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com> wrote:
> 
> Hi,
>  
> I tried to write the same code that will execute the described scenario. The results are as follows: 
> If I do not give enough time to completely rebalance partitions, then the newly launched node will not have enough data to count(*).
> If I do not wait for enough time to allow to distribute the data on the grid, the query will return a smaller number - the number of records that have been uploaded to the node. I guess there is GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this moment.
>  
> If I wait for a sufficient amount of time or directly call the wait on the newly joined node
> ignite2.cache (CACHE) .rebalance (). get ();
> then all results will be correct.
> 
> About your question>  what's happen if one cluster node crashes in the middle of rebalance process?
> In this case normal failover scenario is started, data is rebalanced within cluster. And if there is enought WAL records on nodes representing history from crash point, then only recent changes (delta) will be send over network. If there is no enought history to apply rebalance with most recent changes, then partition will be rebalanced from scratch to new node.
> 
> Sincerely,
> Pavlov Dmitry
> 
> 
> сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <ma...@hotmail.com>>:
> Hi,
> 
> after restart data seems not be consistent.
> 
> We have been waiting until rebalance was fully completed to restart the
> cluster to check if durable memory data rebalance works correctly and sql
> queries still work.
> Another question (it´s not this case), what's happen if one cluster node
> crashes in the middle of rebalance process?
> 
> Thanks!
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi, I've created issue for this case:
https://issues.apache.org/jira/browse/IGNITE-6792

Reproducer is attached to JIRA, also test was added into branch.

According to my brief testing, cache.size() method also returns less count
than initial load.

Sincerely,
Dmitriy Pavlov

пт, 27 окт. 2017 г. в 16:51, Dmitry Pavlov <dp...@gmail.com>:

> Hi Denis,
>
> I had short chat with Alex G.
>
>  You're right, It may be a bug. I'll prepare my reproducer and add is as
> test. Also I will raise the ticket if count(*) will give incorrect result.
>
> Sincerely,
> Dmitry Pavlov
>
> пт, 27 окт. 2017 г., 1:48 Denis Magda <dm...@apache.org>:
>
>> Dmitriy,
>>
>> I don’t see why a result of a simple query such as “select count(*) from
>> t;” should be different if a rebalancing is in progress or after a cluster
>> restart. Ignite’s SQL engine claims that its fault-tolerant and returns a
>> consistent result set all the times unless a partition loss happened. Here
>> is we don’t have a partition loss, thus, seems we caught a bug.
>>
>> Vladimir O., please chime in.
>>
>> —
>> Denis
>>
>> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dp...@gmail.com> wrote:
>>
>> Hi Denis
>>
>> It seems to me that this is not a bug for my scenario, because the data
>> was not loaded within the same transaction using transactional cache. In
>> this case it is ok that cache data is rebalanced according to partition
>> update counters,isn't it?
>>
>> I suppose in this case the data was not lost ,it was just not completely
>> transferred to the second node.
>>
>> Sincerely,
>>
>> чт, 26 окт. 2017 г., 21:09 Denis Magda <dm...@apache.org>:
>>
>>> + dev list
>>>
>>> This scenario has to be handled automatically by Ignite. Seems like a
>>> bug. Please refer to the initial description of the issue. Alex G, please
>>> have a look:
>>>
>>> To reproduce:
>>> 1. create a replicated cache with multiple indexedtypes, with some
>>> indexes
>>> 2. Start first server node
>>> 3. Insert data into cache (1000000 entries)
>>> 4. Start second server node
>>>
>>> At this point, seems all is ok, data is apparently successfully
>>> rebalanced
>>> making sql queries (count(*))
>>>
>>> 5. Stop server nodes
>>> 6. Restart server nodes
>>> 7. Doing sql queries (count(*)) returns less data
>>>
>>> —
>>> Denis
>>>
>>> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I tried to write the same code that will execute the described
>>> scenario. The results are as follows:
>>> > If I do not give enough time to completely rebalance partitions, then
>>> the newly launched node will not have enough data to count(*).
>>> > If I do not wait for enough time to allow to distribute the data on
>>> the grid, the query will return a smaller number - the number of records
>>> that have been uploaded to the node. I guess there is
>>> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
>>> moment.
>>> >
>>> > If I wait for a sufficient amount of time or directly call the wait on
>>> the newly joined node
>>> > ignite2.cache (CACHE) .rebalance (). get ();
>>> > then all results will be correct.
>>> >
>>> > About your question>  what's happen if one cluster node crashes in the
>>> middle of rebalance process?
>>> > In this case normal failover scenario is started, data is rebalanced
>>> within cluster. And if there is enought WAL records on nodes representing
>>> history from crash point, then only recent changes (delta) will be send
>>> over network. If there is no enought history to apply rebalance with most
>>> recent changes, then partition will be rebalanced from scratch to new node.
>>> >
>>> > Sincerely,
>>> > Pavlov Dmitry
>>> >
>>> >
>>> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:
>>> maxnu00@hotmail.com>>:
>>> > Hi,
>>> >
>>> > after restart data seems not be consistent.
>>> >
>>> > We have been waiting until rebalance was fully completed to restart the
>>> > cluster to check if durable memory data rebalance works correctly and
>>> sql
>>> > queries still work.
>>> > Another question (it´s not this case), what's happen if one cluster
>>> node
>>> > crashes in the middle of rebalance process?
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > --
>>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
>>> http://apache-ignite-users.70518.x6.nabble.com/>
>>
>>
>>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi, I've created issue for this case:
https://issues.apache.org/jira/browse/IGNITE-6792

Reproducer is attached to JIRA, also test was added into branch.

According to my brief testing, cache.size() method also returns less count
than initial load.

Sincerely,
Dmitriy Pavlov

пт, 27 окт. 2017 г. в 16:51, Dmitry Pavlov <dp...@gmail.com>:

> Hi Denis,
>
> I had short chat with Alex G.
>
>  You're right, It may be a bug. I'll prepare my reproducer and add is as
> test. Also I will raise the ticket if count(*) will give incorrect result.
>
> Sincerely,
> Dmitry Pavlov
>
> пт, 27 окт. 2017 г., 1:48 Denis Magda <dm...@apache.org>:
>
>> Dmitriy,
>>
>> I don’t see why a result of a simple query such as “select count(*) from
>> t;” should be different if a rebalancing is in progress or after a cluster
>> restart. Ignite’s SQL engine claims that its fault-tolerant and returns a
>> consistent result set all the times unless a partition loss happened. Here
>> is we don’t have a partition loss, thus, seems we caught a bug.
>>
>> Vladimir O., please chime in.
>>
>> —
>> Denis
>>
>> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dp...@gmail.com> wrote:
>>
>> Hi Denis
>>
>> It seems to me that this is not a bug for my scenario, because the data
>> was not loaded within the same transaction using transactional cache. In
>> this case it is ok that cache data is rebalanced according to partition
>> update counters,isn't it?
>>
>> I suppose in this case the data was not lost ,it was just not completely
>> transferred to the second node.
>>
>> Sincerely,
>>
>> чт, 26 окт. 2017 г., 21:09 Denis Magda <dm...@apache.org>:
>>
>>> + dev list
>>>
>>> This scenario has to be handled automatically by Ignite. Seems like a
>>> bug. Please refer to the initial description of the issue. Alex G, please
>>> have a look:
>>>
>>> To reproduce:
>>> 1. create a replicated cache with multiple indexedtypes, with some
>>> indexes
>>> 2. Start first server node
>>> 3. Insert data into cache (1000000 entries)
>>> 4. Start second server node
>>>
>>> At this point, seems all is ok, data is apparently successfully
>>> rebalanced
>>> making sql queries (count(*))
>>>
>>> 5. Stop server nodes
>>> 6. Restart server nodes
>>> 7. Doing sql queries (count(*)) returns less data
>>>
>>> —
>>> Denis
>>>
>>> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I tried to write the same code that will execute the described
>>> scenario. The results are as follows:
>>> > If I do not give enough time to completely rebalance partitions, then
>>> the newly launched node will not have enough data to count(*).
>>> > If I do not wait for enough time to allow to distribute the data on
>>> the grid, the query will return a smaller number - the number of records
>>> that have been uploaded to the node. I guess there is
>>> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
>>> moment.
>>> >
>>> > If I wait for a sufficient amount of time or directly call the wait on
>>> the newly joined node
>>> > ignite2.cache (CACHE) .rebalance (). get ();
>>> > then all results will be correct.
>>> >
>>> > About your question>  what's happen if one cluster node crashes in the
>>> middle of rebalance process?
>>> > In this case normal failover scenario is started, data is rebalanced
>>> within cluster. And if there is enought WAL records on nodes representing
>>> history from crash point, then only recent changes (delta) will be send
>>> over network. If there is no enought history to apply rebalance with most
>>> recent changes, then partition will be rebalanced from scratch to new node.
>>> >
>>> > Sincerely,
>>> > Pavlov Dmitry
>>> >
>>> >
>>> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:
>>> maxnu00@hotmail.com>>:
>>> > Hi,
>>> >
>>> > after restart data seems not be consistent.
>>> >
>>> > We have been waiting until rebalance was fully completed to restart the
>>> > cluster to check if durable memory data rebalance works correctly and
>>> sql
>>> > queries still work.
>>> > Another question (it´s not this case), what's happen if one cluster
>>> node
>>> > crashes in the middle of rebalance process?
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > --
>>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
>>> http://apache-ignite-users.70518.x6.nabble.com/>
>>
>>
>>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi Denis,

I had short chat with Alex G.

 You're right, It may be a bug. I'll prepare my reproducer and add is as
test. Also I will raise the ticket if count(*) will give incorrect result.

Sincerely,
Dmitry Pavlov

пт, 27 окт. 2017 г., 1:48 Denis Magda <dm...@apache.org>:

> Dmitriy,
>
> I don’t see why a result of a simple query such as “select count(*) from
> t;” should be different if a rebalancing is in progress or after a cluster
> restart. Ignite’s SQL engine claims that its fault-tolerant and returns a
> consistent result set all the times unless a partition loss happened. Here
> is we don’t have a partition loss, thus, seems we caught a bug.
>
> Vladimir O., please chime in.
>
> —
> Denis
>
> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dp...@gmail.com> wrote:
>
> Hi Denis
>
> It seems to me that this is not a bug for my scenario, because the data
> was not loaded within the same transaction using transactional cache. In
> this case it is ok that cache data is rebalanced according to partition
> update counters,isn't it?
>
> I suppose in this case the data was not lost ,it was just not completely
> transferred to the second node.
>
> Sincerely,
>
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dm...@apache.org>:
>
>> + dev list
>>
>> This scenario has to be handled automatically by Ignite. Seems like a
>> bug. Please refer to the initial description of the issue. Alex G, please
>> have a look:
>>
>> To reproduce:
>> 1. create a replicated cache with multiple indexedtypes, with some indexes
>> 2. Start first server node
>> 3. Insert data into cache (1000000 entries)
>> 4. Start second server node
>>
>> At this point, seems all is ok, data is apparently successfully rebalanced
>> making sql queries (count(*))
>>
>> 5. Stop server nodes
>> 6. Restart server nodes
>> 7. Doing sql queries (count(*)) returns less data
>>
>> —
>> Denis
>>
>> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I tried to write the same code that will execute the described
>> scenario. The results are as follows:
>> > If I do not give enough time to completely rebalance partitions, then
>> the newly launched node will not have enough data to count(*).
>> > If I do not wait for enough time to allow to distribute the data on the
>> grid, the query will return a smaller number - the number of records that
>> have been uploaded to the node. I guess there is
>> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
>> moment.
>> >
>> > If I wait for a sufficient amount of time or directly call the wait on
>> the newly joined node
>> > ignite2.cache (CACHE) .rebalance (). get ();
>> > then all results will be correct.
>> >
>> > About your question>  what's happen if one cluster node crashes in the
>> middle of rebalance process?
>> > In this case normal failover scenario is started, data is rebalanced
>> within cluster. And if there is enought WAL records on nodes representing
>> history from crash point, then only recent changes (delta) will be send
>> over network. If there is no enought history to apply rebalance with most
>> recent changes, then partition will be rebalanced from scratch to new node.
>> >
>> > Sincerely,
>> > Pavlov Dmitry
>> >
>> >
>> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:
>> maxnu00@hotmail.com>>:
>> > Hi,
>> >
>> > after restart data seems not be consistent.
>> >
>> > We have been waiting until rebalance was fully completed to restart the
>> > cluster to check if durable memory data rebalance works correctly and
>> sql
>> > queries still work.
>> > Another question (it´s not this case), what's happen if one cluster node
>> > crashes in the middle of rebalance process?
>> >
>> > Thanks!
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
>> http://apache-ignite-users.70518.x6.nabble.com/>
>
>
>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi Denis,

I had short chat with Alex G.

 You're right, It may be a bug. I'll prepare my reproducer and add is as
test. Also I will raise the ticket if count(*) will give incorrect result.

Sincerely,
Dmitry Pavlov

пт, 27 окт. 2017 г., 1:48 Denis Magda <dm...@apache.org>:

> Dmitriy,
>
> I don’t see why a result of a simple query such as “select count(*) from
> t;” should be different if a rebalancing is in progress or after a cluster
> restart. Ignite’s SQL engine claims that its fault-tolerant and returns a
> consistent result set all the times unless a partition loss happened. Here
> is we don’t have a partition loss, thus, seems we caught a bug.
>
> Vladimir O., please chime in.
>
> —
> Denis
>
> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dp...@gmail.com> wrote:
>
> Hi Denis
>
> It seems to me that this is not a bug for my scenario, because the data
> was not loaded within the same transaction using transactional cache. In
> this case it is ok that cache data is rebalanced according to partition
> update counters,isn't it?
>
> I suppose in this case the data was not lost ,it was just not completely
> transferred to the second node.
>
> Sincerely,
>
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dm...@apache.org>:
>
>> + dev list
>>
>> This scenario has to be handled automatically by Ignite. Seems like a
>> bug. Please refer to the initial description of the issue. Alex G, please
>> have a look:
>>
>> To reproduce:
>> 1. create a replicated cache with multiple indexedtypes, with some indexes
>> 2. Start first server node
>> 3. Insert data into cache (1000000 entries)
>> 4. Start second server node
>>
>> At this point, seems all is ok, data is apparently successfully rebalanced
>> making sql queries (count(*))
>>
>> 5. Stop server nodes
>> 6. Restart server nodes
>> 7. Doing sql queries (count(*)) returns less data
>>
>> —
>> Denis
>>
>> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I tried to write the same code that will execute the described
>> scenario. The results are as follows:
>> > If I do not give enough time to completely rebalance partitions, then
>> the newly launched node will not have enough data to count(*).
>> > If I do not wait for enough time to allow to distribute the data on the
>> grid, the query will return a smaller number - the number of records that
>> have been uploaded to the node. I guess there is
>> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
>> moment.
>> >
>> > If I wait for a sufficient amount of time or directly call the wait on
>> the newly joined node
>> > ignite2.cache (CACHE) .rebalance (). get ();
>> > then all results will be correct.
>> >
>> > About your question>  what's happen if one cluster node crashes in the
>> middle of rebalance process?
>> > In this case normal failover scenario is started, data is rebalanced
>> within cluster. And if there is enought WAL records on nodes representing
>> history from crash point, then only recent changes (delta) will be send
>> over network. If there is no enought history to apply rebalance with most
>> recent changes, then partition will be rebalanced from scratch to new node.
>> >
>> > Sincerely,
>> > Pavlov Dmitry
>> >
>> >
>> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:
>> maxnu00@hotmail.com>>:
>> > Hi,
>> >
>> > after restart data seems not be consistent.
>> >
>> > We have been waiting until rebalance was fully completed to restart the
>> > cluster to check if durable memory data rebalance works correctly and
>> sql
>> > queries still work.
>> > Another question (it´s not this case), what's happen if one cluster node
>> > crashes in the middle of rebalance process?
>> >
>> > Thanks!
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
>> http://apache-ignite-users.70518.x6.nabble.com/>
>
>
>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Dmitriy,

I don’t see why a result of a simple query such as “select count(*) from t;” should be different if a rebalancing is in progress or after a cluster restart. Ignite’s SQL engine claims that its fault-tolerant and returns a consistent result set all the times unless a partition loss happened. Here is we don’t have a partition loss, thus, seems we caught a bug.

Vladimir O., please chime in.

—
Denis

> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dp...@gmail.com> wrote:
> 
> Hi Denis 
> 
> It seems to me that this is not a bug for my scenario, because the data was not loaded within the same transaction using transactional cache. In this case it is ok that cache data is rebalanced according to partition update counters,isn't it?
> 
> I suppose in this case the data was not lost ,it was just not completely transferred to the second node.
> 
> Sincerely, 
> 
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dmagda@apache.org <ma...@apache.org>>:
> + dev list
> 
> This scenario has to be handled automatically by Ignite. Seems like a bug. Please refer to the initial description of the issue. Alex G, please have a look:
> 
> To reproduce:
> 1. create a replicated cache with multiple indexedtypes, with some indexes
> 2. Start first server node
> 3. Insert data into cache (1000000 entries)
> 4. Start second server node
> 
> At this point, seems all is ok, data is apparently successfully rebalanced
> making sql queries (count(*))
> 
> 5. Stop server nodes
> 6. Restart server nodes
> 7. Doing sql queries (count(*)) returns less data
> 
> —
> Denis
> 
> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dpavlov.spb@gmail.com <ma...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I tried to write the same code that will execute the described scenario. The results are as follows:
> > If I do not give enough time to completely rebalance partitions, then the newly launched node will not have enough data to count(*).
> > If I do not wait for enough time to allow to distribute the data on the grid, the query will return a smaller number - the number of records that have been uploaded to the node. I guess there is GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this moment.
> >
> > If I wait for a sufficient amount of time or directly call the wait on the newly joined node
> > ignite2.cache (CACHE) .rebalance (). get ();
> > then all results will be correct.
> >
> > About your question>  what's happen if one cluster node crashes in the middle of rebalance process?
> > In this case normal failover scenario is started, data is rebalanced within cluster. And if there is enought WAL records on nodes representing history from crash point, then only recent changes (delta) will be send over network. If there is no enought history to apply rebalance with most recent changes, then partition will be rebalanced from scratch to new node.
> >
> > Sincerely,
> > Pavlov Dmitry
> >
> >
> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <ma...@hotmail.com> <mailto:maxnu00@hotmail.com <ma...@hotmail.com>>>:
> > Hi,
> >
> > after restart data seems not be consistent.
> >
> > We have been waiting until rebalance was fully completed to restart the
> > cluster to check if durable memory data rebalance works correctly and sql
> > queries still work.
> > Another question (it´s not this case), what's happen if one cluster node
> > crashes in the middle of rebalance process?
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/> <http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/>>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Denis Magda <dm...@apache.org>.

Dmitriy,

I don’t see why a result of a simple query such as “select count(*) from t;” should be different if a rebalancing is in progress or after a cluster restart. Ignite’s SQL engine claims that its fault-tolerant and returns a consistent result set all the times unless a partition loss happened. Here is we don’t have a partition loss, thus, seems we caught a bug.

Vladimir O., please chime in.

—
Denis

> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dp...@gmail.com> wrote:
> 
> Hi Denis 
> 
> It seems to me that this is not a bug for my scenario, because the data was not loaded within the same transaction using transactional cache. In this case it is ok that cache data is rebalanced according to partition update counters,isn't it?
> 
> I suppose in this case the data was not lost ,it was just not completely transferred to the second node.
> 
> Sincerely, 
> 
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dmagda@apache.org <ma...@apache.org>>:
> + dev list
> 
> This scenario has to be handled automatically by Ignite. Seems like a bug. Please refer to the initial description of the issue. Alex G, please have a look:
> 
> To reproduce:
> 1. create a replicated cache with multiple indexedtypes, with some indexes
> 2. Start first server node
> 3. Insert data into cache (1000000 entries)
> 4. Start second server node
> 
> At this point, seems all is ok, data is apparently successfully rebalanced
> making sql queries (count(*))
> 
> 5. Stop server nodes
> 6. Restart server nodes
> 7. Doing sql queries (count(*)) returns less data
> 
> —
> Denis
> 
> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dpavlov.spb@gmail.com <ma...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I tried to write the same code that will execute the described scenario. The results are as follows:
> > If I do not give enough time to completely rebalance partitions, then the newly launched node will not have enough data to count(*).
> > If I do not wait for enough time to allow to distribute the data on the grid, the query will return a smaller number - the number of records that have been uploaded to the node. I guess there is GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this moment.
> >
> > If I wait for a sufficient amount of time or directly call the wait on the newly joined node
> > ignite2.cache (CACHE) .rebalance (). get ();
> > then all results will be correct.
> >
> > About your question>  what's happen if one cluster node crashes in the middle of rebalance process?
> > In this case normal failover scenario is started, data is rebalanced within cluster. And if there is enought WAL records on nodes representing history from crash point, then only recent changes (delta) will be send over network. If there is no enought history to apply rebalance with most recent changes, then partition will be rebalanced from scratch to new node.
> >
> > Sincerely,
> > Pavlov Dmitry
> >
> >
> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <ma...@hotmail.com> <mailto:maxnu00@hotmail.com <ma...@hotmail.com>>>:
> > Hi,
> >
> > after restart data seems not be consistent.
> >
> > We have been waiting until rebalance was fully completed to restart the
> > cluster to check if durable memory data rebalance works correctly and sql
> > queries still work.
> > Another question (it´s not this case), what's happen if one cluster node
> > crashes in the middle of rebalance process?
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/> <http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/>>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi Denis

It seems to me that this is not a bug for my scenario, because the data was
not loaded within the same transaction using transactional cache. In this
case it is ok that cache data is rebalanced according to partition update
counters,isn't it?

I suppose in this case the data was not lost ,it was just not completely
transferred to the second node.

Sincerely,

чт, 26 окт. 2017 г., 21:09 Denis Magda <dm...@apache.org>:

> + dev list
>
> This scenario has to be handled automatically by Ignite. Seems like a bug.
> Please refer to the initial description of the issue. Alex G, please have a
> look:
>
> To reproduce:
> 1. create a replicated cache with multiple indexedtypes, with some indexes
> 2. Start first server node
> 3. Insert data into cache (1000000 entries)
> 4. Start second server node
>
> At this point, seems all is ok, data is apparently successfully rebalanced
> making sql queries (count(*))
>
> 5. Stop server nodes
> 6. Restart server nodes
> 7. Doing sql queries (count(*)) returns less data
>
> —
> Denis
>
> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I tried to write the same code that will execute the described scenario.
> The results are as follows:
> > If I do not give enough time to completely rebalance partitions, then
> the newly launched node will not have enough data to count(*).
> > If I do not wait for enough time to allow to distribute the data on the
> grid, the query will return a smaller number - the number of records that
> have been uploaded to the node. I guess there is
> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
> moment.
> >
> > If I wait for a sufficient amount of time or directly call the wait on
> the newly joined node
> > ignite2.cache (CACHE) .rebalance (). get ();
> > then all results will be correct.
> >
> > About your question>  what's happen if one cluster node crashes in the
> middle of rebalance process?
> > In this case normal failover scenario is started, data is rebalanced
> within cluster. And if there is enought WAL records on nodes representing
> history from crash point, then only recent changes (delta) will be send
> over network. If there is no enought history to apply rebalance with most
> recent changes, then partition will be rebalanced from scratch to new node.
> >
> > Sincerely,
> > Pavlov Dmitry
> >
> >
> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:
> maxnu00@hotmail.com>>:
> > Hi,
> >
> > after restart data seems not be consistent.
> >
> > We have been waiting until rebalance was fully completed to restart the
> > cluster to check if durable memory data rebalance works correctly and sql
> > queries still work.
> > Another question (it´s not this case), what's happen if one cluster node
> > crashes in the middle of rebalance process?
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
> http://apache-ignite-users.70518.x6.nabble.com/>
>
>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi Denis

It seems to me that this is not a bug for my scenario, because the data was
not loaded within the same transaction using transactional cache. In this
case it is ok that cache data is rebalanced according to partition update
counters,isn't it?

I suppose in this case the data was not lost ,it was just not completely
transferred to the second node.

Sincerely,

чт, 26 окт. 2017 г., 21:09 Denis Magda <dm...@apache.org>:

> + dev list
>
> This scenario has to be handled automatically by Ignite. Seems like a bug.
> Please refer to the initial description of the issue. Alex G, please have a
> look:
>
> To reproduce:
> 1. create a replicated cache with multiple indexedtypes, with some indexes
> 2. Start first server node
> 3. Insert data into cache (1000000 entries)
> 4. Start second server node
>
> At this point, seems all is ok, data is apparently successfully rebalanced
> making sql queries (count(*))
>
> 5. Stop server nodes
> 6. Restart server nodes
> 7. Doing sql queries (count(*)) returns less data
>
> —
> Denis
>
> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dp...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I tried to write the same code that will execute the described scenario.
> The results are as follows:
> > If I do not give enough time to completely rebalance partitions, then
> the newly launched node will not have enough data to count(*).
> > If I do not wait for enough time to allow to distribute the data on the
> grid, the query will return a smaller number - the number of records that
> have been uploaded to the node. I guess there is
> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
> moment.
> >
> > If I wait for a sufficient amount of time or directly call the wait on
> the newly joined node
> > ignite2.cache (CACHE) .rebalance (). get ();
> > then all results will be correct.
> >
> > About your question>  what's happen if one cluster node crashes in the
> middle of rebalance process?
> > In this case normal failover scenario is started, data is rebalanced
> within cluster. And if there is enought WAL records on nodes representing
> history from crash point, then only recent changes (delta) will be send
> over network. If there is no enought history to apply rebalance with most
> recent changes, then partition will be rebalanced from scratch to new node.
> >
> > Sincerely,
> > Pavlov Dmitry
> >
> >
> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:
> maxnu00@hotmail.com>>:
> > Hi,
> >
> > after restart data seems not be consistent.
> >
> > We have been waiting until rebalance was fully completed to restart the
> > cluster to check if durable memory data rebalance works correctly and sql
> > queries still work.
> > Another question (it´s not this case), what's happen if one cluster node
> > crashes in the middle of rebalance process?
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
> http://apache-ignite-users.70518.x6.nabble.com/>
>
>