You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (JIRA)" <ji...@apache.org> on 2019/05/29 14:00:00 UTC

[jira] [Commented] (FLINK-12313) SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints is unstable

    [ https://issues.apache.org/jira/browse/FLINK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850873#comment-16850873 ] 

Piotr Nowojski commented on FLINK-12313:
----------------------------------------

Another instance: [https://api.travis-ci.org/v3/job/538336222/log.txt]

> SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints is unstable
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-12313
>                 URL: https://issues.apache.org/jira/browse/FLINK-12313
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Ufuk Celebi
>            Priority: Critical
>              Labels: test-stability
>
> {{SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints}} fails and prints the Thread stack traces due to no output on Travis occasionally.
> {code}
> ==============================================================================
> Printing stack trace of Java process 10071
> ==============================================================================
> 2019-04-24 07:55:29
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode):
> "Attach Listener" #17 daemon prio=9 os_prio=0 tid=0x00007f2948920000 nid=0x2cf5 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Async calls on Test Task (1/1)" #15 daemon prio=5 os_prio=0 tid=0x00007f2948dd1800 nid=0x27a9 waiting on condition [0x00007f292cea9000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000008bb5e558> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> "Async calls on Test Task (1/1)" #14 daemon prio=5 os_prio=0 tid=0x00007f2948dce800 nid=0x27a8 in Object.wait() [0x00007f292cfaa000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x000000008bac58f8> (a java.lang.Object)
> 	at java.lang.Object.wait(Object.java:502)
> 	at org.apache.flink.streaming.runtime.tasks.SynchronousSavepointLatch.blockUntilCheckpointIsAcknowledged(SynchronousSavepointLatch.java:66)
> 	- locked <0x000000008bac58f8> (a java.lang.Object)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:726)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:604)
> 	at org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTestingTask.triggerCheckpoint(SynchronousCheckpointITCase.java:174)
> 	at org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1182)
> 	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> "CloseableReaperThread" #13 daemon prio=5 os_prio=0 tid=0x00007f2948d9b800 nid=0x27a7 in Object.wait() [0x00007f292d0ab000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x000000008bbe3990> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
> 	- locked <0x000000008bbe3990> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
> 	at org.apache.flink.core.fs.SafetyNetCloseableRegistry$CloseableReaperThread.run(SafetyNetCloseableRegistry.java:193)
> "Test Task (1/1)" #12 prio=5 os_prio=0 tid=0x00007f2948d97000 nid=0x27a6 in Object.wait() [0x00007f292d1ac000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x000000008e63f7d8> (a java.lang.Object)
> 	at java.lang.Object.wait(Object.java:502)
> 	at org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63)
> 	- locked <0x000000008e63f7d8> (a java.lang.Object)
> 	at org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTestingTask.run(SynchronousCheckpointITCase.java:161)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:335)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:724)
> 	at java.lang.Thread.run(Thread.java:748)
> "process reaper" #11 daemon prio=10 os_prio=0 tid=0x00007f294885e000 nid=0x2793 waiting on condition [0x00007f292d7e5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000080089928> (a java.util.concurrent.SynchronousQueue$TransferStack)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
> 	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
> 	at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> "surefire-forkedjvm-ping-30s" #10 daemon prio=5 os_prio=0 tid=0x00007f2948433000 nid=0x2775 waiting on condition [0x00007f292dd20000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000080089b88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> "surefire-forkedjvm-command-thread" #9 daemon prio=5 os_prio=0 tid=0x00007f2948416000 nid=0x276e runnable [0x00007f292e02a000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.FileInputStream.readBytes(Native Method)
> 	at java.io.FileInputStream.read(FileInputStream.java:255)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> 	- locked <0x000000008008be38> (a java.io.BufferedInputStream)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:387)
> 	at org.apache.maven.surefire.booter.MasterProcessCommand.decode(MasterProcessCommand.java:115)
> 	at org.apache.maven.surefire.booter.CommandReader$CommandRunnable.run(CommandReader.java:391)
> 	at java.lang.Thread.run(Thread.java:748)
> "Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007f2948214800 nid=0x276a runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C1 CompilerThread1" #7 daemon prio=9 os_prio=0 tid=0x00007f2948207000 nid=0x2769 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" #6 daemon prio=9 os_prio=0 tid=0x00007f2948205000 nid=0x2768 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" #5 daemon prio=9 os_prio=0 tid=0x00007f2948203000 nid=0x2767 runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Surrogate Locker Thread (Concurrent GC)" #4 daemon prio=9 os_prio=0 tid=0x00007f2948201800 nid=0x2766 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f29481c9000 nid=0x2765 in Object.wait() [0x00007f292f346000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x000000008008c7b8> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
> 	- locked <0x000000008008c7b8> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f29481c4800 nid=0x2764 in Object.wait() [0x00007f292f447000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x000000008008c970> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:502)
> 	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> 	- locked <0x000000008008c970> (a java.lang.ref.Reference$Lock)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
> "main" #1 prio=5 os_prio=0 tid=0x00007f294800a800 nid=0x2759 in Object.wait() [0x00007f294df6f000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x000000008e63f4a0> (a java.lang.Object)
> 	at java.lang.Object.wait(Object.java:502)
> 	at org.apache.flink.core.testutils.MultiShotLatch.await(MultiShotLatch.java:50)
> 	- locked <0x000000008e63f4a0> (a java.lang.Object)
> 	at org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints(SynchronousCheckpointITCase.java:140)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> Full log: https://api.travis-ci.org/v3/job/523820519/log.txt
> The changes of the commit that caused this are in an independent module ({{flink-container}}) and they succeeded on Travis before (https://travis-ci.org/apache/flink/builds/520877649).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)