You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Anton Kalashnikov (Jira)" <ji...@apache.org> on 2021/02/17 11:23:00 UTC

[jira] [Created] (IGNITE-14197) Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work

Anton Kalashnikov created IGNITE-14197:
------------------------------------------

             Summary: Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work
                 Key: IGNITE-14197
                 URL: https://issues.apache.org/jira/browse/IGNITE-14197
             Project: Ignite
          Issue Type: Bug
            Reporter: Anton Kalashnikov
            Assignee: Anton Kalashnikov


In case of enabled write throttling, when, for example, node parks data streamer thread, it still holds checkpoint read lock and it leads to the long pauses on waiting for checkpoint lock:
[2020-07-23 07:09:21,614][INFO ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of WAL without checkpoint']
All operations at this moment are blocked.

Sometimes, it can lead to a complete disaster:
Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
{quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
at org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
at org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)