You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by the_palakkaran <ji...@suntecsbs.com> on 2018/11/26 19:18:04 UTC

Values lost when cache is in partitioned mode in cluster

Hi,

I have two nodes. A cache which has an Object as Key and a list of Objects
as values is configured in partitioned mode. On a single node every value in
the list is loaded correctly. While started in a cluster of two nodes, even
though the data is queried and put into the cache correctly, afterwards when
I check entries in the cache on both nodes, it is observed that many keys
have lost values from the list of values against it. 

To be specific I expect 4 values as list against a particular key but when I
check there would be 2 or 3 entries only for like 20% of the keys.

Why I think this is a partitioning or rebalancing problem is because it
works fine in a single node and data is not lost at all !!!

Once when I started, all entries were there correctly in the cache, every
other time I had this loss problem which makes me really comfused because
there was no change other than a restart.

Cache is partitioned. Data is loaded using datastreamer through an ignite
service on a node and should get balanced into all other nodes in the
cluster. Persistence is enabled as well as off heap data storage is
configured.

Am I missing any configuration? Is there any chance of serialization issue,
so that I should create a serializable object as value and set tge list into
it?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Values lost when cache is in partitioned mode in cluster

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Well, it's not a reproducer - it doesn't have any test or main classes,
and, moreover, it has a lot of dependencies on classes, that were not
provided.

As for writeSynchronizationMode - with default PRIMARY_SYNC value, Ignite
waits only till primary copy will be updated, so, you may need to wait some
time till everything will be updated on backups. It's not a workaround, you
just need to choose, what work for your use case.

Evgenii

пн, 26 нояб. 2018 г. в 23:53, the_palakkaran <ji...@suntecsbs.com>:

> Hi,
>
> An update :
>
> when I set writeSynchronizationMode as FULL_SYNC, this is working now.
> [Seems like a work around, hoping to get a correct solution]
>
> Earlier cache store was getting executed even though I had never invoked
> its
> load method explicitly. Also one time, I got this message in the log:
>
> [2018-11-27T11:47:43,728][INFO ][tcp-disco-sock-reader-#5][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/192.168.79.97:63949,
> rmtPort=63949
> [2018-11-27T11:47:43,731][WARN ][grid-timeout-worker-#39][TcpDiscoverySpi]
> Socket write has timed out (consider increasing
> 'IgniteConfiguration.failureDetectionTimeout' configuration property)
> [failureDetectionTimeout=10000, rmtAddr=/192.168.79.97:63949,
> rmtPort=63949,
> sockTimeout=5000]
> [2018-11-27T11:48:13,729][WARN
> ][disco-event-worker-#61][GridDiscoveryManager] Node FAILED:
> TcpDiscoveryNode [id=dff2089b-1c8c-4077-8176-389d52241823,
> addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.79.97],
> sockAddrs=[SBSKKWSVM314.suntecsbs.com/192.168.79.97:0, /0:0:0:0:0:0:0:1:0,
> /127.0.0.1:0], discPort=0, order=3, intOrder=3,
> lastExchangeTime=1543299352342, loc=false,
> ver=2.6.0#20180710-sha1:669feacc,
> isClient=true]
> [2018-11-27T11:48:13,730][INFO
> ][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=4,
> servers=2, clients=0, CPUs=32, offheap=29.0GB, heap=2.0GB]
> [2018-11-27T11:48:13,730][INFO
> ][disco-event-worker-#61][GridDiscoveryManager]   ^-- Node
> [id=96FB77AE-5D5A-4D53-B40A-7DBB1A90E429, clusterState=ACTIVE]
> [2018-11-27T11:48:13,730][INFO
> ][disco-event-worker-#61][GridDiscoveryManager]   ^-- Baseline [id=0,
> size=2, online=2, offline=0]
> [2018-11-27T11:48:13,730][INFO
> ][disco-event-worker-#61][GridDiscoveryManager] Data Regions Configured:
> [2018-11-27T11:48:13,731][INFO
> ][disco-event-worker-#61][GridDiscoveryManager]   ^-- default
> [initSize=256.0 MiB, maxSize=6.3 GiB, persistenceEnabled=false]
> [2018-11-27T11:48:13,731][INFO
> ][disco-event-worker-#61][GridDiscoveryManager]   ^-- defaultDataRegion
> [initSize=1.0 GiB, maxSize=8.0 GiB, persistenceEnabled=true]
> [2018-11-27T11:48:13,743][INFO ][exchange-worker-#62][time] Started
> exchange
> init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], crd=false,
> evt=NODE_FAILED, evtNode=dff2089b-1c8c-4077-8176-389d52241823,
> customEvt=null, allowMerge=true]
> [2018-11-27T11:48:13,744][INFO
> ][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange
> future [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
> resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null]
> [2018-11-27T11:48:13,744][INFO ][exchange-worker-#62][time] Finished
> exchange init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
> crd=false]
> [2018-11-27T11:48:13,757][INFO
> ][exchange-worker-#62][GridCachePartitionExchangeManager] Skipping
> rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=4,
> minorTopVer=0], evt=NODE_FAILED, node=dff2089b-1c8c-4077-8176-389d52241823]
> [2018-11-27T11:48:42,973][INFO ][grid-timeout-worker-#39][IgniteKernal]
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Values lost when cache is in partitioned mode in cluster

Posted by the_palakkaran <ji...@suntecsbs.com>.

Hi, 

An update :

when I set writeSynchronizationMode as FULL_SYNC, this is working now.
[Seems like a work around, hoping to get a correct solution]

Earlier cache store was getting executed even though I had never invoked its
load method explicitly. Also one time, I got this message in the log:

[2018-11-27T11:47:43,728][INFO ][tcp-disco-sock-reader-#5][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/192.168.79.97:63949,
rmtPort=63949
[2018-11-27T11:47:43,731][WARN ][grid-timeout-worker-#39][TcpDiscoverySpi]
Socket write has timed out (consider increasing
'IgniteConfiguration.failureDetectionTimeout' configuration property)
[failureDetectionTimeout=10000, rmtAddr=/192.168.79.97:63949, rmtPort=63949,
sockTimeout=5000]
[2018-11-27T11:48:13,729][WARN
][disco-event-worker-#61][GridDiscoveryManager] Node FAILED:
TcpDiscoveryNode [id=dff2089b-1c8c-4077-8176-389d52241823,
addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.79.97],
sockAddrs=[SBSKKWSVM314.suntecsbs.com/192.168.79.97:0, /0:0:0:0:0:0:0:1:0,
/127.0.0.1:0], discPort=0, order=3, intOrder=3,
lastExchangeTime=1543299352342, loc=false, ver=2.6.0#20180710-sha1:669feacc,
isClient=true]
[2018-11-27T11:48:13,730][INFO
][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=4,
servers=2, clients=0, CPUs=32, offheap=29.0GB, heap=2.0GB]
[2018-11-27T11:48:13,730][INFO
][disco-event-worker-#61][GridDiscoveryManager]   ^-- Node
[id=96FB77AE-5D5A-4D53-B40A-7DBB1A90E429, clusterState=ACTIVE]
[2018-11-27T11:48:13,730][INFO
][disco-event-worker-#61][GridDiscoveryManager]   ^-- Baseline [id=0,
size=2, online=2, offline=0]
[2018-11-27T11:48:13,730][INFO
][disco-event-worker-#61][GridDiscoveryManager] Data Regions Configured:
[2018-11-27T11:48:13,731][INFO
][disco-event-worker-#61][GridDiscoveryManager]   ^-- default
[initSize=256.0 MiB, maxSize=6.3 GiB, persistenceEnabled=false]
[2018-11-27T11:48:13,731][INFO
][disco-event-worker-#61][GridDiscoveryManager]   ^-- defaultDataRegion
[initSize=1.0 GiB, maxSize=8.0 GiB, persistenceEnabled=true]
[2018-11-27T11:48:13,743][INFO ][exchange-worker-#62][time] Started exchange
init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], crd=false,
evt=NODE_FAILED, evtNode=dff2089b-1c8c-4077-8176-389d52241823,
customEvt=null, allowMerge=true]
[2018-11-27T11:48:13,744][INFO
][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange
future [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null]
[2018-11-27T11:48:13,744][INFO ][exchange-worker-#62][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
crd=false]
[2018-11-27T11:48:13,757][INFO
][exchange-worker-#62][GridCachePartitionExchangeManager] Skipping
rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=4,
minorTopVer=0], evt=NODE_FAILED, node=dff2089b-1c8c-4077-8176-389d52241823]
[2018-11-27T11:48:42,973][INFO ][grid-timeout-worker-#39][IgniteKernal] 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Values lost when cache is in partitioned mode in cluster

Posted by the_palakkaran <ji...@suntecsbs.com>.

I do flush the data in the streamer. Reproducer attached along.
reproducer.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1795/reproducer.zip>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Values lost when cache is in partitioned mode in cluster

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Hi,

Do you flush data in data streamer? Can you share a simple reproducer?

Best Regards,
Evgenii

пн, 26 нояб. 2018 г. в 11:18, the_palakkaran <ji...@suntecsbs.com>:

> Hi,
>
> I have two nodes. A cache which has an Object as Key and a list of Objects
> as values is configured in partitioned mode. On a single node every value
> in
> the list is loaded correctly. While started in a cluster of two nodes, even
> though the data is queried and put into the cache correctly, afterwards
> when
> I check entries in the cache on both nodes, it is observed that many keys
> have lost values from the list of values against it.
>
> To be specific I expect 4 values as list against a particular key but when
> I
> check there would be 2 or 3 entries only for like 20% of the keys.
>
> Why I think this is a partitioning or rebalancing problem is because it
> works fine in a single node and data is not lost at all !!!
>
> Once when I started, all entries were there correctly in the cache, every
> other time I had this loss problem which makes me really comfused because
> there was no change other than a restart.
>
> Cache is partitioned. Data is loaded using datastreamer through an ignite
> service on a node and should get balanced into all other nodes in the
> cluster. Persistence is enabled as well as off heap data storage is
> configured.
>
> Am I missing any configuration? Is there any chance of serialization issue,
> so that I should create a serializable object as value and set tge list
> into
> it?
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>