You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Maxim Muzafarov (Jira)" <ji...@apache.org> on 2021/07/21 23:25:00 UTC

[jira] [Assigned] (IGNITE-15146) Checking the snapshot creates a large number of unused threads that do not terminate.

     [ https://issues.apache.org/jira/browse/IGNITE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maxim Muzafarov reassigned IGNITE-15146:
----------------------------------------

    Assignee: Maxim Muzafarov

> Checking the snapshot creates a large number of unused threads that do not terminate.
> -------------------------------------------------------------------------------------
>
>                 Key: IGNITE-15146
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15146
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.11
>            Reporter: Pavel Pereslegin
>            Assignee: Maxim Muzafarov
>            Priority: Critical
>             Fix For: 2.12
>
>
> Each new run of snapshot verification creates dozens of new threads that do not terminate after the procedure is complete. Over time, this can lead to an OutOfMemoryError and node failure.
> {code:java}
>     @Test
>     public void testClusterSnapshotCheckMultipleTimes() throws Exception {
>         IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, CACHE_KEYS_RANGE);
>         startClientGrid();
>         
>         ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
>             .get();
>         int activeThreadsCntBefore = Thread.activeCount();
>         int iterations = 10;
>         for (int i = 0; i < iterations; i++)
>             snp(ignite).checkSnapshot(SNAPSHOT_NAME).get();
>         int createdThreads = Thread.activeCount() - activeThreadsCntBefore;
>         assertTrue("Threads created: " + createdThreads, createdThreads < iterations);
>     }
> {code}
> Reproducer shows that 10 snapshot checks add approx *{color:#de350b}~250{color}* new threads.
> The dump of "leaked" thread looks like this:
> {noformat}
> "binary-metadata-writer-#2208" #2249 prio=5 os_prio=0 tid=0x00007f9974087000 nid=0x65b38 waiting on condition [0x00007f986cf9c000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <merged>(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body0(BinaryMetadataFileStore.java:460)
> 	at org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body(BinaryMetadataFileStore.java:441)
> 	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)