You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Deepesh Malviya <de...@gmail.com> on 2018/04/21 15:11:46 UTC

Failed to wait for partition map exchange

Hi,

What are possible causes of below error? The cluster become very slow and
there are long running transactions once this start coming.

WARN
exchange-worker-#35%vertx.ignite.node.554c44b2-0f87-4eea-90aa-d5431eb6d444%
GridCachePartitionExchangeManager:480 - Failed to wait for partition map
exchange [topVer=AffinityTopologyVersion [topVer=86, minorTopVer=2],
node=554c44b2-0f87-4eea-90aa-d5431eb6d444]. Dumping pending objects that
might be the cause:

Regards,
_DM

Re: Failed to wait for partition map exchange

Posted by Deepesh Malviya <de...@gmail.com>.

Hi Arseny,

Thank you for the help. I checked further logs from the earliest and I see
the following errors. What is the interpretation of these logs?

2018-04-12 04:41:49 126 WARN grid-timeout-worker-#17%vertx.ignite.node.
d46efc6f-702e-47cb-a498-29cb7f7e4461% d46efc6f-702e-47cb-a498-29cb7f7e4461:
480 - Possible thread pool starvation detected (no task completed in last
30000ms, is public thread pool size large enough?)

2018-04-12 07:50:50 873 WARN grid-timeout-worker-#17%vertx.ignite.node.
d46efc6f-702e-47cb-a498-29cb7f7e4461% G:480 - >>> Possible starvation in
striped pool: sys-stripe-1-#2%vertx.ignite.node.d46efc6f-702e-47cb-a498-
29cb7f7e4461% [] deadlock: false completed: 935028 Thread [name="sys-stripe-
1-#2%vertx.ignite.node.d46efc6f-702e-47cb-a498-29cb7f7e4461%", id=18, state=
RUNNABLE, blockCnt=13549, waitCnt=541162] at sun.misc.Unsafe.unpark(Native
Method) at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:
141) at java.util.concurrent.locks.AbstractQueuedSynchronizer.
unparkSuccessor(AbstractQueuedSynchronizer.java:662) at java.util.concurrent
.locks.AbstractQueuedSynchronizer.doReleaseShared(AbstractQueuedSynchronizer
.java:689) at java.util.concurrent.locks.AbstractQueuedSynchronizer.
releaseShared(AbstractQueuedSynchronizer.java:1342) at o.a.i.i.util.future.
GridFutureAdapter.onDone(GridFutureAdapter.java:380) at o.a.i.i.util.future.
GridFutureAdapter.onDone(GridFutureAdapter.java:355) at o.a.i.i.processors.
cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(
GridDhtPartitionsExchangeFuture.java:1063) at o.a.i.i.processors.cache.
distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(
GridDhtPartitionsExchangeFuture.java:86) at o.a.i.i.util.future.
GridFutureAdapter.onDone(GridFutureAdapter.java:332) at o.a.i.i.processors.
cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.
processMessage(GridDhtPartitionsExchangeFuture.java:1411) at o.a.i.i.
processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.
access$400(GridDhtPartitionsExchangeFuture.java:86) at o.a.i.i.processors.
cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(
GridDhtPartitionsExchangeFuture.java:1380) at o.a.i.i.processors.cache.
distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(
GridDhtPartitionsExchangeFuture.java:1368) at o.a.i.i.util.future.
GridFutureAdapter.notifyListener(GridFutureAdapter.java:271) at o.a.i.i.util
.future.GridFutureAdapter.listen(GridFutureAdapter.java:228) at o.a.i.i.
processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.
onReceive(GridDhtPartitionsExchangeFuture.java:1368) at o.a.i.i.processors.
cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(
GridCachePartitionExchangeManager.java:1238) at o.a.i.i.processors.cache.
GridCachePartitionExchangeManager.access$1300(
GridCachePartitionExchangeManager.java:116) at o.a.i.i.processors.cache.
GridCachePartitionExchangeManager$3.onMessage(
GridCachePartitionExchangeManager.java:317) at o.a.i.i.processors.cache.
GridCachePartitionExchangeManager$3.onMessage(
GridCachePartitionExchangeManager.java:315) at o.a.i.i.processors.cache.
GridCachePartitionExchangeManager$MessageHandler.apply(
GridCachePartitionExchangeManager.java:1992) at o.a.i.i.processors.cache.
GridCachePartitionExchangeManager$MessageHandler.apply(
GridCachePartitionExchangeManager.java:1974) at o.a.i.i.processors.cache.
GridCacheIoManager.processMessage(GridCacheIoManager.java:827) at o.a.i.i.
processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:369)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(
GridCacheIoManager.java:293) at o.a.i.i.processors.cache.GridCacheIoManager.
access$000(GridCacheIoManager.java:95) at o.a.i.i.processors.cache.
GridCacheIoManager$1.onMessage(GridCacheIoManager.java:238) at o.a.i.i.
managers.communication.GridIoManager.invokeListener(GridIoManager.java:1222)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(
GridIoManager.java:850) at o.a.i.i.managers.communication.GridIoManager.
access$2100(GridIoManager.java:108) at o.a.i.i.managers.communication.
GridIoManager$7.run(GridIoManager.java:790) at o.a.i.i.util.StripedExecutor$
Stripe.run(StripedExecutor.java:428) at java.lang.Thread.run(Thread.java:
748)

Regards,
Deepesh

On Mon, Apr 23, 2018 at 1:36 PM, Arseny Kovalchuk <
arseny.kovalchuk@synesis.ru> wrote:

> Hi Deepesh,
>
> This kind of message says that some of node(s) cannot process exchange
> message and usually it happens when some server node crashes or some
> exception happens on the node. I'd better check full logs from all server
> nodes for the stacktrace from the beginning. Pay attention to the errors
> that may happen in the compute closures, wrap the code in the closures in
> try-catch and report error to the log in the catch section. Increase log
> level.
>
> Hope that it will help to catch the reason.
>
>
> 
> Arseny Kovalchuk
>
> Senior Software Engineer at Synesis
> skype: arseny.kovalchuk
> mobile: +375 (29) 666-16-16
> LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>
>
> On 23 April 2018 at 09:58, Deepesh Malviya <de...@gmail.com> wrote:
>
>> This is the complete log that is being printed in all nodes repeatedly.
>>
>> Regards,
>> Deepesh
>>
>> On Sun, Apr 22, 2018 at 12:00 PM, begineer <re...@gmail.com> wrote:
>>
>>> Could you please paste complete log. This log is not enough.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
>>
>>
>> --
>> _Deepesh
>>
>
>

-- 
_Deepesh

Re: Failed to wait for partition map exchange

Posted by Arseny Kovalchuk <ar...@synesis.ru>.

Hi Deepesh,

This kind of message says that some of node(s) cannot process exchange
message and usually it happens when some server node crashes or some
exception happens on the node. I'd better check full logs from all server
nodes for the stacktrace from the beginning. Pay attention to the errors
that may happen in the compute closures, wrap the code in the closures in
try-catch and report error to the log in the catch section. Increase log
level.

Hope that it will help to catch the reason.

Arseny Kovalchuk

Senior Software Engineer at Synesis
skype: arseny.kovalchuk
mobile: +375 (29) 666-16-16
LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>

On 23 April 2018 at 09:58, Deepesh Malviya <de...@gmail.com> wrote:

> This is the complete log that is being printed in all nodes repeatedly.
>
> Regards,
> Deepesh
>
> On Sun, Apr 22, 2018 at 12:00 PM, begineer <re...@gmail.com> wrote:
>
>> Could you please paste complete log. This log is not enough.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>
>
>
> --
> _Deepesh
>

Re: Failed to wait for partition map exchange

Posted by Deepesh Malviya <de...@gmail.com>.

This is the complete log that is being printed in all nodes repeatedly.

Regards,
Deepesh

On Sun, Apr 22, 2018 at 12:00 PM, begineer <re...@gmail.com> wrote:

> Could you please paste complete log. This log is not enough.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

-- 
_Deepesh

Re: Failed to wait for partition map exchange

Posted by begineer <re...@gmail.com>.

Could you please paste complete log. This log is not enough. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/