You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Andrey Davydov <an...@gmail.com> on 2019/01/09 17:43:37 UTC

Failed to wait for partition map exchange on cluster activation

Hello, 
I found in test logs of my project that Ignite warns about failed partition maps exchange. In test environment 3 Ignite 2.7 server nodes run in the same JVM8 on Win10, using localhost networking.

2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Affinity changes applied in 10 ms.
2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Affinity changes applied in 10 ms.
2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Full map updating for 5 groups performed in 4 ms.
2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Full map updating for 5 groups performed in 5 ms.
2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], err=null]
2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], err=null]
2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec, startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143], checkpointLockWait=0ms, checkpointLockHoldTime=6ms, walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b, startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143], checkpointLockWait=0ms, checkpointLockHoldTime=8ms, walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc, startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143], checkpointLockWait=0ms, checkpointLockHoldTime=22ms, walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec, pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms, pagesWrite=84ms, fsync=1992ms, total=3179ms]
2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b, pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms, pagesWrite=79ms, fsync=2066ms, total=3358ms]
2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc, pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms, pagesWrite=79ms, fsync=2203ms, total=3544ms]
2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that might be the cause: 
2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118 - First 10 pending exchange futures [total=0]
2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures (total: 1):
2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Latch manager state: ExchangeLatchManager [serverLatches={}, clientLatches={}]
2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that might be the cause: 
2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Last exchange future: 
2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118 - First 10 pending exchange futures [total=0]
2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures (total: 1):
2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1547054124462, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Latch manager state: ExchangeLatchManager [serverLatches={}, clientLatches={}]

After that, everything works fine, but it cause every test to take more 10 seconds to run. And I am afraid to use it in prod before I understand problem.

Andrey.

RE: Failed to wait for partition map exchange on cluster activation

Posted by Andrey Davydov <an...@gmail.com>.

No, I don’t start services during activation (because I got some problems with it on older ignite some months ago). There is one broadcasted service in system, and I start it manually just after activation.

I will provide dumps and more detailed logs bit later.

Andrey.

От: Ilya Kasnacheev
Отправлено: 16 января 2019 г. в 15:13
Кому: user@ignite.apache.org
Тема: Re: Failed to wait for partition map exchange on cluster activation

Hello!

Sorry, wrong thread:)

It is hard to say what happens here exactly. Can you collect several thread dumps during this prolonged activation, share them with us?

Do you have e.g. services? I was told that services would start during activation.

Regards,
-- 
Ilya Kasnacheev


ср, 16 янв. 2019 г. в 15:10, Ilya Kasnacheev <il...@gmail.com>:
Hello!

I can see multiple "Failed to process selector key" errors in your log. Are you sure that your nodes can communicate via network freely and without delay?

Regards,
-- 
Ilya Kasnacheev


вт, 15 янв. 2019 г. в 20:12, Andrey Davydov <an...@gmail.com>:
Hello,

You can find full log there: https://drive.google.com/file/d/1FwCjsXMw5LQJnKO0x5GNJ2w9gVsDbXlc/view?usp=sharing

I can rerun tests with additional logging settings if needed

Andrey.






On Tue, Jan 15, 2019 at 6:23 PM Ilya Kasnacheev <il...@gmail.com> wrote:
Hello!

Can you please upload the full verbose log somewhere?

Regards,
-- 
Ilya Kasnacheev


ср, 9 янв. 2019 г. в 20:43, Andrey Davydov <an...@gmail.com>:
 
Hello, 
I found in test logs of my project that Ignite warns about failed partition maps exchange. In test environment 3 Ignite 2.7 server nodes run in the same JVM8 on Win10, using localhost networking.
 
2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Affinity changes applied in 10 ms.
2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Affinity changes applied in 10 ms.
2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Full map updating for 5 groups performed in 4 ms.
2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Full map updating for 5 groups performed in 5 ms.
2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], err=null]
2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102 - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], err=null]
2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec, startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143], checkpointLockWait=0ms, checkpointLockHoldTime=6ms, walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b, startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143], checkpointLockWait=0ms, checkpointLockHoldTime=8ms, walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc, startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143], checkpointLockWait=0ms, checkpointLockHoldTime=22ms, walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec, pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms, pagesWrite=84ms, fsync=1992ms, total=3179ms]
2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b, pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms, pagesWrite=79ms, fsync=2066ms, total=3358ms]
2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO  org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102 - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc, pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms, pagesWrite=79ms, fsync=2203ms, total=3544ms]
2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that might be the cause: 
2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118 - First 10 pending exchange futures [total=0]
2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures (total: 1):
2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN  org.apache.ignite.internal.diagnostic:118 - Latch manager state: ExchangeLatchManager [serverLatches={}, clientLatches={}]
2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that might be the cause: 
2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Last exchange future: 
2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118 - First 10 pending exchange futures [total=0]
2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures (total: 1):
2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1547054124462, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN  org.apache.ignite.internal.diagnostic:118 - Latch manager state: ExchangeLatchManager [serverLatches={}, clientLatches={}]
 
After that, everything works fine, but it cause every test to take more 10 seconds to run. And I am afraid to use it in prod before I understand problem.
 
Andrey.

Re: Failed to wait for partition map exchange on cluster activation

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Sorry, wrong thread:)

It is hard to say what happens here exactly. Can you collect several thread
dumps during this prolonged activation, share them with us?

Do you have e.g. services? I was told that services would start during
activation.

Regards,
-- 
Ilya Kasnacheev


ср, 16 янв. 2019 г. в 15:10, Ilya Kasnacheev <il...@gmail.com>:

> Hello!
>
> I can see multiple "Failed to process selector key" errors in your log.
> Are you sure that your nodes can communicate via network freely and without
> delay?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 15 янв. 2019 г. в 20:12, Andrey Davydov <an...@gmail.com>:
>
>> Hello,
>>
>> You can find full log there:
>> https://drive.google.com/file/d/1FwCjsXMw5LQJnKO0x5GNJ2w9gVsDbXlc/view?usp=sharing
>>
>> I can rerun tests with additional logging settings if needed
>>
>> Andrey.
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 15, 2019 at 6:23 PM Ilya Kasnacheev <
>> ilya.kasnacheev@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Can you please upload the full verbose log somewhere?
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> ср, 9 янв. 2019 г. в 20:43, Andrey Davydov <an...@gmail.com>:
>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I found in test logs of my project that Ignite warns about failed
>>>> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
>>>> in the same JVM8 on Win10, using localhost networking.
>>>>
>>>>
>>>>
>>>> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Affinity changes applied in 10 ms.
>>>>
>>>> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Affinity changes applied in 10 ms.
>>>>
>>>> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Full map updating for 5 groups performed in 4 ms.
>>>>
>>>> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Full map updating for 5 groups performed in 5 ms.
>>>>
>>>> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> err=null]
>>>>
>>>> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> err=null]
>>>>
>>>> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
>>>> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>>>>
>>>> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
>>>> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>>>>
>>>> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
>>>> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>>>>
>>>> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
>>>> pagesWrite=84ms, fsync=1992ms, total=3179ms]
>>>>
>>>> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
>>>> pagesWrite=79ms, fsync=2066ms, total=3358ms]
>>>>
>>>> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
>>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>>> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
>>>> pagesWrite=79ms, fsync=2203ms, total=3544ms]
>>>>
>>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>>>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
>>>> might be the cause:
>>>>
>>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>>>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>>
>>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>>>>
>>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>>>> - First 10 pending exchange futures [total=0]
>>>>
>>>> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>>>> (total: 1):
>>>>
>>>> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - >>>
>>>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>>>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>>> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
>>>> isClient=false], done=false]
>>>>
>>>> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>>>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>>
>>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>>>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>>> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
>>>> might be the cause:
>>>>
>>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>>>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>>
>>>> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>>>>
>>>> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>>>> - First 10 pending exchange futures [total=0]
>>>>
>>>> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>>>> (total: 1):
>>>>
>>>> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - >>>
>>>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>>>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>>>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>>> lastExchangeTime=1547054124462, loc=false,
>>>> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>>>>
>>>> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
>>>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>>>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>>
>>>>
>>>>
>>>> After that, everything works fine, but it cause every test to take
>>>> more 10 seconds to run. And I am afraid to use it in prod before I
>>>> understand problem.
>>>>
>>>>
>>>>
>>>> Andrey.
>>>>
>>>>
>>>>
>>>

Re: Failed to wait for partition map exchange on cluster activation

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

I can see multiple "Failed to process selector key" errors in your log. Are
you sure that your nodes can communicate via network freely and without
delay?

Regards,
-- 
Ilya Kasnacheev


вт, 15 янв. 2019 г. в 20:12, Andrey Davydov <an...@gmail.com>:

> Hello,
>
> You can find full log there:
> https://drive.google.com/file/d/1FwCjsXMw5LQJnKO0x5GNJ2w9gVsDbXlc/view?usp=sharing
>
> I can rerun tests with additional logging settings if needed
>
> Andrey.
>
>
>
>
>
>
> On Tue, Jan 15, 2019 at 6:23 PM Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> Can you please upload the full verbose log somewhere?
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 9 янв. 2019 г. в 20:43, Andrey Davydov <an...@gmail.com>:
>>
>>>
>>>
>>> Hello,
>>>
>>> I found in test logs of my project that Ignite warns about failed
>>> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
>>> in the same JVM8 on Win10, using localhost networking.
>>>
>>>
>>>
>>> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>> - Affinity changes applied in 10 ms.
>>>
>>> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>> - Affinity changes applied in 10 ms.
>>>
>>> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>> - Full map updating for 5 groups performed in 4 ms.
>>>
>>> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>> - Full map updating for 5 groups performed in 5 ms.
>>>
>>> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>> err=null]
>>>
>>> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
>>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>> err=null]
>>>
>>> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
>>> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>>>
>>> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
>>> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>>>
>>> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
>>> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>>>
>>> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
>>> pagesWrite=84ms, fsync=1992ms, total=3179ms]
>>>
>>> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
>>> pagesWrite=79ms, fsync=2066ms, total=3358ms]
>>>
>>> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
>>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>>> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
>>> pagesWrite=79ms, fsync=2203ms, total=3544ms]
>>>
>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
>>> might be the cause:
>>>
>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>
>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>>>
>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>>> - First 10 pending exchange futures [total=0]
>>>
>>> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>>> (total: 1):
>>>
>>> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - >>>
>>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
>>> isClient=false], done=false]
>>>
>>> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>
>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>>> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
>>> might be the cause:
>>>
>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>
>>> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>>>
>>> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>>> - First 10 pending exchange futures [total=0]
>>>
>>> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>>> (total: 1):
>>>
>>> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - >>>
>>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>> lastExchangeTime=1547054124462, loc=false,
>>> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>>>
>>> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
>>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>
>>>
>>>
>>> After that, everything works fine, but it cause every test to take more
>>> 10 seconds to run. And I am afraid to use it in prod before I
>>> understand problem.
>>>
>>>
>>>
>>> Andrey.
>>>
>>>
>>>
>>

Re: Failed to wait for partition map exchange on cluster activation

Posted by Andrey Davydov <an...@gmail.com>.

Hello,

You can find full log there:
https://drive.google.com/file/d/1FwCjsXMw5LQJnKO0x5GNJ2w9gVsDbXlc/view?usp=sharing

I can rerun tests with additional logging settings if needed

Andrey.






On Tue, Jan 15, 2019 at 6:23 PM Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> Can you please upload the full verbose log somewhere?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 9 янв. 2019 г. в 20:43, Andrey Davydov <an...@gmail.com>:
>
>>
>>
>> Hello,
>>
>> I found in test logs of my project that Ignite warns about failed
>> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
>> in the same JVM8 on Win10, using localhost networking.
>>
>>
>>
>> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>> - Affinity changes applied in 10 ms.
>>
>> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>> - Affinity changes applied in 10 ms.
>>
>> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>> - Full map updating for 5 groups performed in 4 ms.
>>
>> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>> - Full map updating for 5 groups performed in 5 ms.
>>
>> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>> err=null]
>>
>> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
>> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
>> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
>> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>> err=null]
>>
>> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
>> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>>
>> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
>> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>>
>> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
>> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
>> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>>
>> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
>> pagesWrite=84ms, fsync=1992ms, total=3179ms]
>>
>> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
>> pagesWrite=79ms, fsync=2066ms, total=3358ms]
>>
>> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
>> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
>> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
>> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
>> pagesWrite=79ms, fsync=2203ms, total=3544ms]
>>
>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
>> might be the cause:
>>
>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>
>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>>
>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>> - First 10 pending exchange futures [total=0]
>>
>> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>> (total: 1):
>>
>> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.diagnostic:118 - >>>
>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
>> isClient=false], done=false]
>>
>> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>
>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
>> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
>> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
>> might be the cause:
>>
>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
>> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>
>> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>>
>> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
>> - First 10 pending exchange futures [total=0]
>>
>> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
>> (total: 1):
>>
>> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.diagnostic:118 - >>>
>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
>> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
>> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>> lastExchangeTime=1547054124462, loc=false,
>> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>>
>> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
>> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
>> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>
>>
>>
>> After that, everything works fine, but it cause every test to take more
>> 10 seconds to run. And I am afraid to use it in prod before I understand
>> problem.
>>
>>
>>
>> Andrey.
>>
>>
>>
>

Re: Failed to wait for partition map exchange on cluster activation

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Can you please upload the full verbose log somewhere?

Regards,
-- 
Ilya Kasnacheev


ср, 9 янв. 2019 г. в 20:43, Andrey Davydov <an...@gmail.com>:

>
>
> Hello,
>
> I found in test logs of my project that Ignite warns about failed
> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
> in the same JVM8 on Win10, using localhost networking.
>
>
>
> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Affinity changes applied in 10 ms.
>
> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Affinity changes applied in 10 ms.
>
> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Full map updating for 5 groups performed in 4 ms.
>
> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Full map updating for 5 groups performed in 5 ms.
>
> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> err=null]
>
> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> err=null]
>
> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>
> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>
> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>
> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
> pagesWrite=84ms, fsync=1992ms, total=3179ms]
>
> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
> pagesWrite=79ms, fsync=2066ms, total=3358ms]
>
> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
> pagesWrite=79ms, fsync=2203ms, total=3544ms]
>
> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
> might be the cause:
>
> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>
> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>
> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
> - First 10 pending exchange futures [total=0]
>
> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
> (total: 1):
>
> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - >>>
> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
> isClient=false], done=false]
>
> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>
> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
> might be the cause:
>
> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>
> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>
> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
> - First 10 pending exchange futures [total=0]
>
> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
> (total: 1):
>
> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - >>>
> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1547054124462, loc=false,
> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>
> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>
>
>
> After that, everything works fine, but it cause every test to take more
> 10 seconds to run. And I am afraid to use it in prod before I understand
> problem.
>
>
>
> Andrey.
>
>
>

Re: Failed to wait for partition map exchange on cluster activation

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

It's not a deadlock it's just a (warning) timeout: It is expected that all
work is to be done by this moment, but due to slow I/O cluster is still
busy.

Regards,
-- 
Ilya Kasnacheev


сб, 19 янв. 2019 г. в 21:51, Andrey Davydov <an...@gmail.com>:

> Yes. It seems like safe to use. Seems like some race cause fail or
> deadlock on the first try of exchange and after timeout, ignite run second
> try which is successful. As I see in my tests (60 ignite activation per
> build), second try is stable.
>   But it cause unnecessary 10-15 seconds delay on each activation if run
> tests on portable drive.
>
> On Thursday, January 17, 2019, Ilya Kasnacheev <il...@gmail.com>
> wrote:
> > Hello!
> >> As I investigate during reproduction, evrething works fine when I run
> test on local drive (SSD in case of my laptop), but if I copy project to
> external HDD mounted over USB3.0, problem present. So it may be race caused
> by condition on slow IO. Initialy I got problem when I build my project on
> external drive.
> > "Failed to wait for PME" is not a problem but a symptom. It is a simple
> timeout. If you are hitting this timeout due to slow I/O you will see PME
> messages which will then go away. This is by design.
> > Nevertheless I will try to run your reproducer.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > ср, 16 янв. 2019 г. в 21:28, Andrey Davydov <an...@gmail.com>:
> >>
> >> I create small reproducer project/ It is available on:
> >>
> https://drive.google.com/file/d/1A2_i1YBI7OGtJM0b8bxuIJrTI61ZQXoR/view?usp=sharing
> >>
> >> Thera are project, and some logs and dumps in archive.
> >> There is only one DemoTest.java to run to reproduce.
> >> test run 3 ignite nodes with configuration very similar to our real
> project without any logic or additional services.
> >> IMPORTANT. As I investigate during reproduction, evrething works fine
> when I run test on local drive (SSD in case of my laptop), but if I copy
> project to external HDD mounted over USB3.0, problem present. So it may be
> race caused by condition on slow IO. Initialy I got problem when I build my
> project on external drive.
> >> Andrey.
> >>
> >>
> >> On Wed, Jan 9, 2019 at 8:43 PM Andrey Davydov <an...@gmail.com>
> wrote:
> >>>
> >>>
> >>>
> >>> Hello,
> >>>
> >>> I found in test logs of my project that Ignite warns about failed
> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
> in the same JVM8 on Win10, using localhost networking.
> >>>
> >>>
> >>>
> >>> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Affinity changes applied in 10 ms.
> >>>
> >>> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Affinity changes applied in 10 ms.
> >>>
> >>> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Full map updating for 5 groups performed in 4 ms.
> >>>
> >>> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Full map updating for 5 groups performed in 5 ms.
> >>>
> >>> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> err=null]
> >>>
> >>> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> err=null]
> >>>
> >>> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
> >>>
> >>> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
> >>>
> >>> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
> >>>
> >>> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
> pagesWrite=84ms, fsync=1992ms, total=3179ms]
> >>>
> >>> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
> pagesWrite=79ms, fsync=2066ms, total=3358ms]
> >>>
> >>> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
> pagesWrite=79ms, fsync=2203ms, total=3544ms]
> >>>
> >>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
> might be the cause:
> >>>
> >>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
> >>>
> >>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
> >>>
> >>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
> - First 10 pending exchange futures [total=0]
> >>>
> >>> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
> (total: 1):
> >>>
> >>> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - >>>
> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
> isClient=false], done=false]
> >>>
> >>> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
> ExchangeLatchManager [serverLatches={}, clientLatches={}]
> >>>
> >>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
> might be the cause:
> >>>
> >>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
> >>>
> >>> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
> >>>
> >>> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
> - First 10 pending exchange futures [total=0]
> >>>
> >>> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
> (total: 1):
> >>>
> >>> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - >>>
> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1547054124462, loc=false,
> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
> >>>
> >>> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
> ExchangeLatchManager [serverLatches={}, clientLatches={}]
> >>>
> >>>
> >>>
> >>> After that, everything works fine, but it cause every test to take
> more 10 seconds to run. And I am afraid to use it in prod before I
> understand problem.
> >>>
> >>>
> >>>
> >>> Andrey.
> >>>
> >>>

Re: Failed to wait for partition map exchange on cluster activation

Posted by Andrey Davydov <an...@gmail.com>.

Yes. It seems like safe to use. Seems like some race cause fail or deadlock
on the first try of exchange and after timeout, ignite run second try which
is successful. As I see in my tests (60 ignite activation per build),
second try is stable.
  But it cause unnecessary 10-15 seconds delay on each activation if run
tests on portable drive.

On Thursday, January 17, 2019, Ilya Kasnacheev <il...@gmail.com>
wrote:
> Hello!
>> As I investigate during reproduction, evrething works fine when I run
test on local drive (SSD in case of my laptop), but if I copy project to
external HDD mounted over USB3.0, problem present. So it may be race caused
by condition on slow IO. Initialy I got problem when I build my project on
external drive.
> "Failed to wait for PME" is not a problem but a symptom. It is a simple
timeout. If you are hitting this timeout due to slow I/O you will see PME
messages which will then go away. This is by design.
> Nevertheless I will try to run your reproducer.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 16 янв. 2019 г. в 21:28, Andrey Davydov <an...@gmail.com>:
>>
>> I create small reproducer project/ It is available on:
>>
https://drive.google.com/file/d/1A2_i1YBI7OGtJM0b8bxuIJrTI61ZQXoR/view?usp=sharing
>>
>> Thera are project, and some logs and dumps in archive.
>> There is only one DemoTest.java to run to reproduce.
>> test run 3 ignite nodes with configuration very similar to our real
project without any logic or additional services.
>> IMPORTANT. As I investigate during reproduction, evrething works fine
when I run test on local drive (SSD in case of my laptop), but if I copy
project to external HDD mounted over USB3.0, problem present. So it may be
race caused by condition on slow IO. Initialy I got problem when I build my
project on external drive.
>> Andrey.
>>
>>
>> On Wed, Jan 9, 2019 at 8:43 PM Andrey Davydov <an...@gmail.com>
wrote:
>>>
>>>
>>>
>>> Hello,
>>>
>>> I found in test logs of my project that Ignite warns about failed
partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
in the same JVM8 on Win10, using localhost networking.
>>>
>>>
>>>
>>> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
- Affinity changes applied in 10 ms.
>>>
>>> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
- Affinity changes applied in 10 ms.
>>>
>>> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
- Full map updating for 5 groups performed in 4 ms.
>>>
>>> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
- Full map updating for 5 groups performed in 5 ms.
>>>
>>> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
- Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
err=null]
>>>
>>> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
- Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
err=null]
>>>
>>> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
- Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>>>
>>> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
- Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>>>
>>> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
- Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>>>
>>> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
- Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
pagesWrite=84ms, fsync=1992ms, total=3179ms]
>>>
>>> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
- Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
pagesWrite=79ms, fsync=2066ms, total=3358ms]
>>>
>>> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
- Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
pagesWrite=79ms, fsync=2203ms, total=3544ms]
>>>
>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
might be the cause:
>>>
>>> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>
>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>>>
>>> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
- First 10 pending exchange futures [total=0]
>>>
>>> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
(total: 1):
>>>
>>> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.diagnostic:118 - >>>
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
[id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
isClient=false], done=false]
>>>
>>> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
org.apache.ignite.internal.diagnostic:118 - Latch manager state:
ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>
>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
might be the cause:
>>>
>>> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>>>
>>> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>>>
>>> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
- First 10 pending exchange futures [total=0]
>>>
>>> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
(total: 1):
>>>
>>> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.diagnostic:118 - >>>
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
[id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1547054124462, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>>>
>>> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
org.apache.ignite.internal.diagnostic:118 - Latch manager state:
ExchangeLatchManager [serverLatches={}, clientLatches={}]
>>>
>>>
>>>
>>> After that, everything works fine, but it cause every test to take more
10 seconds to run. And I am afraid to use it in prod before I understand
problem.
>>>
>>>
>>>
>>> Andrey.
>>>
>>>

Re: Failed to wait for partition map exchange on cluster activation

Posted by Andrey Davydov <an...@gmail.com>.

I create small reproducer project/ It is available on:

https://drive.google.com/file/d/1A2_i1YBI7OGtJM0b8bxuIJrTI61ZQXoR/view?usp=sharing

Thera are project, and some logs and dumps in archive.

There is only one DemoTest.java to run to reproduce.
test run 3 ignite nodes with configuration very similar to our real project
without any logic or additional services.

IMPORTANT. As I investigate during reproduction, evrething works fine when
I run test on local drive (SSD in case of my laptop), but if I copy project
to external HDD mounted over USB3.0, problem present. So it may be race
caused by condition on slow IO. Initialy I got problem when I build my
project on external drive.

Andrey.



On Wed, Jan 9, 2019 at 8:43 PM Andrey Davydov <an...@gmail.com>
wrote:

>
>
> Hello,
>
> I found in test logs of my project that Ignite warns about failed
> partition maps exchange. In test environment 3 Ignite 2.7 server nodes run
> in the same JVM8 on Win10, using localhost networking.
>
>
>
> 2019-01-09 20:15:27,719 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Affinity changes applied in 10 ms.
>
> 2019-01-09 20:15:27,719 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Affinity changes applied in 10 ms.
>
> 2019-01-09 20:15:27,724 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Full map updating for 5 groups performed in 4 ms.
>
> 2019-01-09 20:15:27,724 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Full map updating for 5 groups performed in 5 ms.
>
> 2019-01-09 20:15:27,725 [sys-#163%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> err=null]
>
> 2019-01-09 20:15:27,725 [sys-#164%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture:102
> - Finish exchange future [startVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], resVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> err=null]
>
> 2019-01-09 20:15:28,710 [db-checkpoint-thread-#157%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=6ms,
> walCpRecordFsyncDuration=248ms, pages=204, reason='node started']
>
> 2019-01-09 20:15:28,713 [db-checkpoint-thread-#151%TestNode-0%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=8ms,
> walCpRecordFsyncDuration=257ms, pages=204, reason='node started']
>
> 2019-01-09 20:15:28,715 [db-checkpoint-thread-#146%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint started [checkpointId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
> startPtr=FileWALPointer [idx=0, fileOff=929726, len=31143],
> checkpointLockWait=0ms, checkpointLockHoldTime=22ms,
> walCpRecordFsyncDuration=289ms, pages=204, reason='node started']
>
> 2019-01-09 20:15:30,788 [db-checkpoint-thread-#157%TestNode-1%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=443748a9-c1a5-4b3b-96e4-04a0862829ec,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1103ms,
> pagesWrite=84ms, fsync=1992ms, total=3179ms]
>
> 2019-01-09 20:15:30,858 [db-checkpoint-thread-#151%TestNode-0%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=cbc928e1-4ecd-40ae-9791-c6ba20c3669b,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1213ms,
> pagesWrite=79ms, fsync=2066ms, total=3358ms]
>
> 2019-01-09 20:15:30,998 [db-checkpoint-thread-#146%TestNode-2%] INFO
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager:102
> - Checkpoint finished [cpId=ef4c3d02-ca01-4d67-8128-48d4dc99aabc,
> pages=204, markPos=FileWALPointer [idx=0, fileOff=929726, len=31143],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=1262ms,
> pagesWrite=79ms, fsync=2203ms, total=3544ms]
>
> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> node=454d2051-cea6-4f2c-99a7-7c5698494175]. Dumping pending objects that
> might be the cause:
>
> 2019-01-09 20:15:37,510 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>
> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last exchange future: …
>
> 2019-01-09 20:15:37,515 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
> - First 10 pending exchange futures [total=0]
>
> 2019-01-09 20:15:37,518 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
> (total: 1):
>
> 2019-01-09 20:15:37,519 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - >>>
> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1547054137330, loc=true, ver=2.7.0#20181130-sha1:256ae401,
> isClient=false], done=false]
>
> 2019-01-09 20:15:37,520 [exchange-worker-#44%TestNode-0%] WARN
> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>
> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Failed to wait for partition
> map exchange [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
> node=a8e212dc-303e-4bc6-8bb6-fa2bc4c0478d]. Dumping pending objects that
> might be the cause:
>
> 2019-01-09 20:15:37,535 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Ready affinity version:
> AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>
> 2019-01-09 20:15:37,536 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last exchange future:
>
> 2019-01-09 20:15:37,537 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager:118
> - First 10 pending exchange futures [total=0]
>
> 2019-01-09 20:15:37,538 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Last 10 exchange futures
> (total: 1):
>
> 2019-01-09 20:15:37,539 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - >>>
> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=3,
> minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode
> [id=454d2051-cea6-4f2c-99a7-7c5698494175, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1547054124462, loc=false,
> ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=false]
>
> 2019-01-09 20:15:37,540 [exchange-worker-#132%TestNode-2%] WARN
> org.apache.ignite.internal.diagnostic:118 - Latch manager state:
> ExchangeLatchManager [serverLatches={}, clientLatches={}]
>
>
>
> After that, everything works fine, but it cause every test to take more
> 10 seconds to run. And I am afraid to use it in prod before I understand
> problem.
>
>
>
> Andrey.
>
>
>