You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by j_recuerda <je...@shapelets.io> on 2020/01/23 17:06:01 UTC

Lost partitions automatically reset

Hello

I have a cluster with 3 nodes with persistence enabled. I have a distributed
cache with backup = 1 where I put some data.

After I shutdown NODE-2 and NODE-3 some partitions state become LOST. Then I
run NODE-2 again, all the data from the cache is enabled again, but the
partition loss policy is still applied. 

Do I have to manually check what partitions are in the node which is waking
up and call manually to resetLostPartitions? I would expect it to be found
out "automatically"

Thank you!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Lost partitions automatically reset

Posted by j_recuerda <je...@shapelets.io>.
akorensh wrote
>   The issue you described is a bit different from the original topic.
>    This one deals with incorrect lostPartitions() count

Sorry about that. As I mentioned I have two different behaviors, hence the
mess up. I am trying to reproduce the original issue, the one I am
experiencing in my project, in a toy project but I have not been able yet. I
can't figure out why.

Thank you.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Lost partitions automatically reset

Posted by akorensh <al...@gmail.com>.
Hi,
  Thanks for the reproducer project. 
  The issue you described is a bit different from the original topic.
   This one deals with incorrect lostPartitions() count

(Original issue:
I have a cluster with 3 nodes with persistence enabled. I have a distributed
cache with backup = 1 where I put some data.
After I shutdown NODE-2 and NODE-3 some partitions state become LOST. Then I
run NODE-2 again calling cache.lostPartitions(), which returns the array
with all
the lost partitions. This call
    is done from NODE-2 when it is run again. At this point, I would expect
those partitions not to be lost since all the data is available again.)

Reproducer issue:
  - Run 5 nodes NodeStartup.kt
  - Run Client.kt which insert some data into an
  - Shut down 2 out of the 5 Nodes.
  - Run Client.kt again.
          * Calling to lostPartitions() returns an empty list, I would
expect it to return some partitions since the backup is set to one and two
nodes were turned off.
   

  We were able to reproduce the issue with the incorrect lostPartitions(),
and are planning a fix.

Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Lost partitions automatically reset

Posted by Igor Belyakov <ig...@gmail.com>.
Hi,

I've tried to run provided example with partitionLossPolicy changed to
"READ_WRITE_SAFE", as was described in initial message, and received next
results:

1. After shutting down of 2 nodes (out of 5) I see lost partitions on the
client:
JRH: LostData = [6, 32, 35, 41, 66, 83, 112, 115, 134, 136, 137, 171, 188,
195, 227, 231, 233, 243, 265, 273, 277, 289, 298, 300, 306, 314, 328, 347,
366, 371, 382, 383, 391, 394, 401, 410, 413, 417, 420, 426, 433, 461, 475,
484, 494, 496, 527, 537, 542, 547, 550, 570, 584, 599, 604, 608, 612, 616,
639, 653, 655, 660, 661, 693, 695, 701, 707, 711, 715, 717, 731, 752, 764,
776, 782, 789, 810, 817, 818, 834, 847, 849, 854, 856, 862, 879, 893, 897,
909, 921, 924, 926, 932, 938, 955, 969, 970, 974, 978, 979, 980, 994, 1007,
1013, 1021]
And "Failed to map keys for cache (all partition nodes left the grid)." is
thrown since we don't have partitions for such keys.

2. After running 1 of the shutted down nodes there are no more lost
partitions found:
JRH: LostData = []
Populating the cache...
Done: 1000
Done: 2000
Done: 3000
Done: 4000
Done: 5000
Done: 6000
Done: 7000
Done: 8000
Done: 9000
LOST PARTITION = []

Seems like the policy works correctly. Did you cleanup your work directory
between test runs?

On Mon, Jan 27, 2020 at 2:59 PM j_recuerda <je...@shapelets.io>
wrote:

> I have two different scenarios or behaviors, none is working as I would
> expect. One is what I am experiencing in my project (whole project) and the
> other one is what I am experiencing when creating a toy project trying to
> reproduce it. In both cases, I am using Ignite 2.7.6.
>
> This is the code I am using for the toy project ( Github:Code
> <
> https://github.com/jrecuerda/IgnitePlayground/tree/master/PartitionLossPolicy>
>
> ). It is written in Kotlin but I think it is quite simple so it should be
> understandable even if you don't know Kotlin.
>
> Steps to reproduce it:
>   - Run 5 nodes NodeStartup.kt
>   - Run Client.kt which activate the cluster and insert some data into an
> IgniteCache<Long, Long>
>   - Shut down 2 out of the 5 Nodes.
>   - Run Client.kt again.
>           * Calling to lostPartitions() returns an empty list, I would
> expect it to return some partitions since the backup is set to one and two
> nodes were turned off.
>           * When trying to put data into the cache, even when the
> partitionLossPolicy is set to IGNORE, throws:
> Exception in thread "main"
> org.apache.ignite.cache.CacheServerNotFoundException: Failed to map keys
> for
> cache (all partition nodes left the grid).
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1321)
>         at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1758)
>         at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1108)
>         at
>
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:820)
>         at jrh.ClientKt.insertData(Client.kt:30)
>         at jrh.ClientKt.main(Client.kt:42)
>         at jrh.ClientKt.main(Client.kt)
> Caused by: class
> org.apache.ignite.internal.cluster.ClusterTopologyServerNotFoundException:
> Failed to map keys for cache (all partition nodes left the grid).
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapSingleUpdate(GridNearAtomicSingleUpdateFuture.java:562)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:454)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1153)
>         at
>
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:611)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2449)
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2426)
>         at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1105)
>         ... 4 more
>
> Thank you!
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Lost partitions automatically reset

Posted by j_recuerda <je...@shapelets.io>.
I have two different scenarios or behaviors, none is working as I would
expect. One is what I am experiencing in my project (whole project) and the
other one is what I am experiencing when creating a toy project trying to
reproduce it. In both cases, I am using Ignite 2.7.6.

This is the code I am using for the toy project ( Github:Code
<https://github.com/jrecuerda/IgnitePlayground/tree/master/PartitionLossPolicy> 
). It is written in Kotlin but I think it is quite simple so it should be
understandable even if you don't know Kotlin.

Steps to reproduce it:
  - Run 5 nodes NodeStartup.kt
  - Run Client.kt which activate the cluster and insert some data into an
IgniteCache<Long, Long>
  - Shut down 2 out of the 5 Nodes.
  - Run Client.kt again. 
          * Calling to lostPartitions() returns an empty list, I would
expect it to return some partitions since the backup is set to one and two
nodes were turned off.
          * When trying to put data into the cache, even when the
partitionLossPolicy is set to IGNORE, throws: 
Exception in thread "main"
org.apache.ignite.cache.CacheServerNotFoundException: Failed to map keys for
cache (all partition nodes left the grid).
	at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1321)
	at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1758)
	at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1108)
	at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:820)
	at jrh.ClientKt.insertData(Client.kt:30)
	at jrh.ClientKt.main(Client.kt:42)
	at jrh.ClientKt.main(Client.kt)
Caused by: class
org.apache.ignite.internal.cluster.ClusterTopologyServerNotFoundException:
Failed to map keys for cache (all partition nodes left the grid).
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapSingleUpdate(GridNearAtomicSingleUpdateFuture.java:562)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:454)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1153)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:611)
	at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2449)
	at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2426)
	at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1105)
	... 4 more

Thank you!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Lost partitions automatically reset

Posted by akorensh <al...@gmail.com>.
In 2.7.6 I did not observe the behavior you described. There were some issues
with partition loss policies in earlier versions --
https://issues.apache.org/jira/browse/IGNITE-10043 -- that were subsequently
fixed.

What version of Ignite are you using?

Can you attach a reproducer project?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Lost partitions automatically reset

Posted by j_recuerda <je...@shapelets.io>.
Hi,

The partition loss policy is READ_WRITE_SAFE.

I know because:
  - I am subscribed to the EVT_CACHE_REBALANCE_PART_DATA_LOST, which is
being called for each 
    lost partition.
  - I am calling cache.lostPartitions(), which returns the array with all
the lost partitions. This call
    is done from NODE-2 when it is run again. At this point, I would expect
those partitions not to be lost 
    since all the data is available again.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Lost partitions automatically reset

Posted by akorensh <al...@gmail.com>.
Hi,  
    What is your partition loss policy?
    How did you determine that partitions were lost? Did you call
cache.lostPartitions()  ? / used logs?/ tried iterating through all the
keys? 

https://www.gridgain.com/docs/latest/developers-guide/data-modeling/data-partitioning#partition-loss-policy

   What version of Ignite are you using? 
Thanks, Alex
 
    



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/