You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Vladimir <vl...@yandex.ru> on 2017/06/30 13:06:02 UTC

NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Hi,

Could anyone please explain to me what happens with my small test cluster
when I use NodeFilter for a cache? A node cannot join the grid. Another
working node says (in short):

[exchange-worker-#29%null%] ... GridDhtPartitionsExchangeFuture: Failed to
wait for partition release future [topVer=AffinityTopologyVersion [topVer=2,
minorTopVer=2], node=...]. Dumping pending objects that might be the cause: 

GridCachePartitionExchangeManager: Pending exchange futures:
GridCachePartitionExchangeManager: Pending transactions: 
GridCachePartitionExchangeManager: Pending explicit locks:
GridCachePartitionExchangeManager: Pending cache futures:
GridCachePartitionExchangeManager: >>> GridDhtTxPrepareFuture [...
GridCachePartitionExchangeManager: >>> GridNearPessimisticTxPrepareFuture
[...
GridCachePartitionExchangeManager: >>> GridNearTxFinishFuture [...
GridCachePartitionExchangeManager: Pending atomic cache futures:
GridCachePartitionExchangeManager: Pending data streamer futures:
GridCachePartitionExchangeManager: Pending transaction deadlock detection
futures:
TcpCommunicationSpi: Communication SPI recovery descriptors:  [...]
   Communication SPI clients [...]
TcpCommunicationSpi: NIO server statistics [readerSesBalanceCnt=0,
writerSesBalanceCnt=0]


org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi: 
>> Selector info [idx=3, keysCnt=0, bytesRcvd=0, bytesRcvd0=0, bytesSent=0,
>> bytesSent0=0]
...
...


and spams it again, again and again. No pendings are posted into the log. No
locks, no transactions etc.


What I do is launching only 2 very simple nodes. Let's name them A and B.
Node A deploys two indexed caches with read/write-thorugh feature. Both
caches have the same node filter which bounds it strictly to node A. After,
node B starts and tries to get these caches. And the problem appears. I
thought I can get a proxy of a cache if it is not hold on the current node.
Can't I? What’s wrong?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by Vladimir <vl...@yandex.ru>.

Here it is:

logs_dumps.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/n14270/logs_dumps.zip>  



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14270.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by vkulichenko <va...@gmail.com>.

Vladimir wrote
> It is just a test. In real project, of course, there are many nodes
> holding the cache. Node A represents a part of cluster.

You can lose the whole cluster group as well. My point is that getting in
cache during initialization doesn't make it safe, because cache can
disappear in runtime and you have to handle such scenario anyway.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14493.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by Vladimir <vl...@yandex.ru>.

>>> What is the exact scenario when cache is not available? If this means
absence of node A, then you can lose after initialization of B,

It is just a test. In real project, of course, there are many nodes holding
the cache. Node A represents a part of cluster.


>>> I would recommend to use Ignite#getOrCreateCache method 

Yes, on the nodes keeping and serving the cache. "Client" nodes don't
provide the cache configuration. They can only use cache(). I moved the
cache acquring in a handler of ContextRefreshedEvent. Looks enough for now.


>>> In addition, it sounds like node B can be a client which eliminates a
>>> requirement to have a node filter.

No. It's a server node for another purposes. We don't want to spread some
caches across whole cluster bit only on dedicated nodes. Often they are the
ones which need faster access to the cache.


>>> This sounds weird, I think there is some other factor that we're
>>> missing. Can you create a simple GitHub project that reproduces this
>>> behavior and share it with us? 

Ok. Once I get time I'll create a special topic.

Thanks



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14489.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by vkulichenko <va...@gmail.com>.

Hi Vladimir,


Vladimir wrote
> Node B must know wheter the cache is already created. If not it must not
> start to avoid such runtime problems. That's why the cache is acquired at
> the initialization. That looks like reasonable and comfortable way.

What is the exact scenario when cache is not available? If this means
absence of node A, then you can lose after initialization of B, so I believe
you have the issue anyway. It all boils down to proper API use and exception
handling. I would recommend to use Ignite#getOrCreateCache method and handle
exceptions that can be thrown by cache operations (for example, if there are
no server nodes left). In addition, it sounds like node B can be a client
which eliminates a requirement to have a node filter.


Vladimir wrote
> 1) Why does node B acquire that cache store bean? This node does not hold,
> service and create cache due to the node filter. Note that this node
> doesn't even have the cache config. Node B is supposed to seldom use the
> cache as a remote source without working directly with the database. I was
> forced to create properly named datasource bean only to satisfy the
> requirement which actually relates to other node (Node A)!

Cache store initialized on all nodes including clients. In current
implementation this is actually required only for transactional caches, but
all caches are still processed in the same way for consistency. So this is
correct behavior.


Vladimir wrote
> 2) Why no problem is met when Node B matches the node filter?

This sounds weird, I think there is some other factor that we're missing.
Can you create a simple GitHub project that reproduces this behavior and
share it with us?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14347.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by Vladimir <vl...@yandex.ru>.

Node B must know wheter the cache is already created. If not it must not
start to avoid such runtime problems. That's why the cache is acquired at
the initialization. That looks like reasonable and comfortable way. There
are several interesting issues:

1) Why does node B acquire that cache store bean? This node does not hold,
service and create cache due to the node filter. Note that this node doesn't
even have the cache config. Node B is supposed to seldom use the cache as a
remote source without working directly with the database. I was forced to
create properly named datasource bean only to satisfy the requirement which
actually relates to other node (Node A)!

2) Why no problem is met when Node B matches the node filter?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14308.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by vkulichenko <va...@gmail.com>.

What is IgniteDictionaryMapper class? It creates cache within Spring bean
initialization, and then the cache tries to acquire a bean for cache store
in another thread. This thread tries to acquire the same Spring lock which
causes a deadlock. See 'main' thread in the same thread dump.

I think you should move cache creation out of init method to fix this.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14289.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by Vladimir <vl...@yandex.ru>.

YourKits thinks:

Frozen threads found (potential deadlock)

It seems that the following threads have not changed their stack for more
than 10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.

exchange-worker-#29%null% <--- Frozen for at least 14s
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(String,
ObjectFactory) DefaultSingletonBeanRegistry.java:213
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(String,
Class, Object[], boolean) AbstractBeanFactory.java:302
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(String)
AbstractBeanFactory.java:197
org.springframework.context.support.AbstractApplicationContext.getBean(String)
AbstractApplicationContext.java:1081
org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadBeanFromAppContext(Object,
String) IgniteSpringHelperImpl.java:217
org.apache.ignite.cache.store.jdbc.CacheJdbcPojoStoreFactory.create()
CacheJdbcPojoStoreFactory.java:178
org.apache.ignite.cache.store.jdbc.CacheJdbcPojoStoreFactory.create()
CacheJdbcPojoStoreFactory.java:100
org.apache.ignite.internal.processors.cache.GridCacheProcessor.createCache(CacheConfiguration,
CachePluginManager, CacheType, CacheObjectContext, boolean)
GridCacheProcessor.java:1458
org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(CacheConfiguration,
NearCacheConfiguration, CacheType, boolean, UUID, IgniteUuid,
AffinityTopologyVersion, QuerySchema) GridCacheProcessor.java:1931
org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(DynamicCacheChangeRequest,
AffinityTopologyVersion) GridCacheProcessor.java:1833
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(GridDhtPartitionsExchangeFuture,
boolean, Collection) CacheAffinitySharedManager.java:379
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(boolean)
GridDhtPartitionsExchangeFuture.java:688
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init()
GridDhtPartitionsExchangeFuture.java:529
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body()
GridCachePartitionExchangeManager.java:1806
org.apache.ignite.internal.util.worker.GridWorker.run() GridWorker.java:110
java.lang.Thread.run() Thread.java:745




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14272.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: NodeFilter for cache and GridDhtPartitionsExchangeFuture (Failed to wait for partition release future)

Posted by vkulichenko <va...@gmail.com>.

Please attach full verbose logs and thread dumps from both nodes.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/NodeFilter-for-cache-and-GridDhtPartitionsExchangeFuture-Failed-to-wait-for-partition-release-future-tp14179p14185.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.