You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Weijie Guo (Jira)" <ji...@apache.org> on 2022/11/24 09:17:00 UTC

[jira] [Commented] (FLINK-29419) HybridShuffle.testHybridFullExchangesRestart hangs

    [ https://issues.apache.org/jira/browse/FLINK-29419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638218#comment-17638218 ] 

Weijie Guo commented on FLINK-29419:
------------------------------------

Through further investigation, we found two possible causes. One is the problem of hybrid result partition may loading data from the file that already consumed from memory, and the other is the bug in the LocalBufferPool (hybrid shuffle scene greatly increases the probability of recurrence).

After these two tickets are resolved, we should be able to enable the relevant tests.

> HybridShuffle.testHybridFullExchangesRestart hangs
> --------------------------------------------------
>
>                 Key: FLINK-29419
>                 URL: https://issues.apache.org/jira/browse/FLINK-29419
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.16.0, 1.17.0
>            Reporter: Huang Xingbo
>            Assignee: Weijie Guo
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-09-26T10:56:44.0766792Z Sep 26 10:56:44 "ForkJoinPool-1-worker-25" #27 daemon prio=5 os_prio=0 tid=0x00007f41a4efa000 nid=0x6d76 waiting on condition [0x00007f40ac135000]
> 2022-09-26T10:56:44.0767432Z Sep 26 10:56:44    java.lang.Thread.State: WAITING (parking)
> 2022-09-26T10:56:44.0767892Z Sep 26 10:56:44 	at sun.misc.Unsafe.park(Native Method)
> 2022-09-26T10:56:44.0768644Z Sep 26 10:56:44 	- parking to wait for  <0x00000000a0704e18> (a java.util.concurrent.CompletableFuture$Signaller)
> 2022-09-26T10:56:44.0769287Z Sep 26 10:56:44 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2022-09-26T10:56:44.0769949Z Sep 26 10:56:44 	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2022-09-26T10:56:44.0770623Z Sep 26 10:56:44 	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> 2022-09-26T10:56:44.0771349Z Sep 26 10:56:44 	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2022-09-26T10:56:44.0772092Z Sep 26 10:56:44 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2022-09-26T10:56:44.0772777Z Sep 26 10:56:44 	at org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:57)
> 2022-09-26T10:56:44.0773534Z Sep 26 10:56:44 	at org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:115)
> 2022-09-26T10:56:44.0774333Z Sep 26 10:56:44 	at org.apache.flink.test.runtime.HybridShuffleITCase.testHybridFullExchangesRestart(HybridShuffleITCase.java:59)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41343&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)