You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Pavel Voronkin (JIRA)" <ji...@apache.org> on 2018/12/14 06:24:00 UTC

[jira] [Created] (IGNITE-10679) Add more debug info for 'Affinity changes' PME stage.

Pavel Voronkin created IGNITE-10679:
---------------------------------------

             Summary: Add more debug info for 'Affinity changes' PME stage.
                 Key: IGNITE-10679
                 URL: https://issues.apache.org/jira/browse/IGNITE-10679
             Project: Ignite
          Issue Type: Improvement
            Reporter: Pavel Voronkin


At current stage of PME time optimizations, next possible target to optimize is 'Affinity changes' apply stage, because it starts to slowdown PME when configured number of partitions is increased (up to 32k) 

We need to add more debug information to understand what is the sources of slowdown.

Example log:
{noformat}

[15:21:55,326][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Received full message, will finish exchange [node=8ca2f878-70cf-4d34-a3ea-97b244f8d3d6, resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
[15:21:55,725][INFO][sys-#121][CacheAffinitySharedManager] Affinity applying from full message performed in 398 ms.
[15:21:56,173][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=740ms, reason='timeout']
[15:21:57,150][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=13ms, checkpointLockHoldTime=699ms, reason='timeout']
[15:21:58,189][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=4ms, checkpointLockHoldTime=730ms, reason='timeout']
[15:21:58,340][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Affinity changes applied in 3013 ms.
[15:21:59,131][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=6ms, checkpointLockHoldTime=664ms, reason='timeout']
[15:21:59,311][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Full map updating for 67 groups performed in 971 ms.
[15:21:59,311][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], err=null]
[15:22:00,039][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=4ms, checkpointLockHoldTime=569ms, reason='timeout']
[15:22:00,108][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Detecting lost partitions performed in 796 ms.
[15:22:00,376][WARNING][sys-stripe-14-#15][finish] Received finish request for completed transaction (the message may be too late) [txId=GridCacheVersion [topVer=153677859, order=1542197978690, nodeOrder=10], dhtTxId=null, node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:00,376][WARNING][sys-stripe-9-#10][finish] Received finish request for completed transaction (the message may be too late) [txId=GridCacheVersion [topVer=153677859, order=1542197978695, nodeOrder=10], dhtTxId=null, node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:00,376][WARNING][sys-stripe-10-#11][GridDhtColocatedCache] <cache_group_2_043> Failed to acquire lock (transaction has been completed): GridCacheVersion [topVer=153677859, order=1542197978695, nodeOrder=10]
[15:22:00,376][WARNING][sys-stripe-11-#12][GridDhtColocatedCache] <cache_group_1_027> Failed to acquire lock (transaction has been completed): GridCacheVersion [topVer=153677859, order=1542197978690, nodeOrder=10]
[15:22:01,404][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=2ms, checkpointLockHoldTime=928ms, reason='timeout']
[15:22:02,069][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=4a3ff280-b07c-48fc-8352-dd1b09e4ddbb, startPtr=FileWALPointer [idx=35, fileOff=11841470, len=11791007], checkpointLockWait=5ms, checkpointLockHoldTime=568ms, walCpRecordFsyncDuration=14ms, pages=12, reason='timeout']
[15:22:02,070][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=4a3ff280-b07c-48fc-8352-dd1b09e4ddbb, pages=12, markPos=FileWALPointer [idx=35, fileOff=11841470, len=11791007], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=584ms, pagesWrite=0ms, fsync=1ms, total=590ms]
[15:22:02,476][INFO][exchange-worker-#66][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=0], evt=NODE_FAILED, node=a10011b8-7431-4bba-8b17-565a2a2be222]
[15:22:02,760][WARNING][sys-stripe-11-#12][finish] Received finish request for completed transaction (the message may be too late) [txId=GridCacheVersion [topVer=153677859, order=1542197978805, nodeOrder=10], dhtTxId=null, node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:03,044][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=557ms, reason='timeout']
[15:22:04,298][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=4c2fd666-4fba-41a5-8ca8-d5a0306ed757, startPtr=FileWALPointer [idx=35, fileOff=23657694, len=11791007], checkpointLockWait=0ms, checkpointLockHoldTime=794ms, walCpRecordFsyncDuration=9ms, pages=6, reason='timeout']
{noformat}

{noformat}
Affinity applying from full message performed in 398 ms.
..
Affinity changes applied in 3013 ms.
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)