You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Ankur Goenka (JIRA)" <ji...@apache.org> on 2018/10/24 22:38:00 UTC

[jira] [Created] (FLINK-10672) Task stuck while writing output to flink

Ankur Goenka created FLINK-10672:
------------------------------------

             Summary: Task stuck while writing output to flink
                 Key: FLINK-10672
                 URL: https://issues.apache.org/jira/browse/FLINK-10672
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.5.4
         Environment: OS: Debuan rodente 4.17

Flink version: 1.5.4
||Key||Value||
|jobmanager.heap.mb|1024|
|jobmanager.rpc.address|localhost|
|jobmanager.rpc.port|6123|
|metrics.reporter.jmx.class|org.apache.flink.metrics.jmx.JMXReporter|
|metrics.reporter.jmx.port|9250-9260|
|metrics.reporters|jmx|
|parallelism.default|1|
|rest.port|8081|
|taskmanager.heap.mb|1024|
|taskmanager.numberOfTaskSlots|1|
|web.tmpdir|/tmp/flink-web-bdb73d6c-5b9e-47b5-9ebf-eed0a7c82c26|

 
h1. Overview
||Data Port||All Slots||Free Slots||CPU Cores||Physical Memory||JVM Heap Size||Flink Managed Memory||
|43501|1|0|12|62.9 GB|922 MB|642 MB|
h1. Memory
h2. JVM (Heap/Non-Heap)
||Type||Committed||Used||Maximum||
|Heap|922 MB|575 MB|922 MB|
|Non-Heap|68.8 MB|64.3 MB|-1 B|
|Total|991 MB|639 MB|922 MB|
h2. Outside JVM
||Type||Count||Used||Capacity||
|Direct|3,292|105 MB|105 MB|
|Mapped|0|0 B|0 B|
h1. Network
h2. Memory Segments
||Type||Count||
|Available|3,194|
|Total|3,278|
h1. Garbage Collection
||Collector||Count||Time||
|G1_Young_Generation|13|336|
|G1_Old_Generation|1|21|
            Reporter: Ankur Goenka
         Attachments: 1uruvakHxBu.png, 3aDKQ24WvKk.png, Po89UGDn58V.png, jmx_dump.json, jmx_dump_detailed.json, jstack_129827.log, jstack_163822.log, jstack_66985.log

I am running a fairly complex pipleline with 200+ task.

The pipeline works fine with small data (order of 10kb input) but gets stuck with a slightly larger data (300kb input).

 

The task gets stuck while writing the output toFlink, more specifically it gets stuck while requesting memory segment in local buffer pool. The Task manager UI shows that it has enough memory and memory segments to work with.

The relevant stack trace is 

{{"grpc-default-executor-0" #138 daemon prio=5 os_prio=0 tid=0x00007fedb0163800 nid=0x30b7f in Object.wait() [0x00007fedb4f90000]}}
{{ java.lang.Thread.State: TIMED_WAITING (on object monitor)}}
{{ at (C/C++) 0x00007fef201c7dae (Unknown Source)}}
{{ at (C/C++) 0x00007fef1f2aea07 (Unknown Source)}}
{{ at (C/C++) 0x00007fef1f241cd3 (Unknown Source)}}
{{ at java.lang.Object.wait(Native Method)}}
{{ - waiting on <0x00000000f6d56450> (a java.util.ArrayDeque)}}
{{ at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:247)}}
{{ - locked <0x00000000f6d56450> (a java.util.ArrayDeque)}}
{{ at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:204)}}
{{ at org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:213)}}
{{ at org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:144)}}
{{ at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:107)}}
{{ at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)}}
{{ at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)}}
{{ at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStagePruningFunction.flatMap(FlinkExecutableStagePruningFunction.java:42)}}
{{ at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStagePruningFunction.flatMap(FlinkExecutableStagePruningFunction.java:26)}}
{{ at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:80)}}
{{ at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)}}
{{ at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction$MyDataReceiver.accept(FlinkExecutableStageFunction.java:230)}}
{{ - locked <0x00000000f6a60bd0> (a java.lang.Object)}}
{{ at org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.accept(BeamFnDataInboundObserver.java:81)}}
{{ at org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.accept(BeamFnDataInboundObserver.java:32)}}
{{ at org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer$InboundObserver.onNext(BeamFnDataGrpcMultiplexer.java:139)}}
{{ at org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer$InboundObserver.onNext(BeamFnDataGrpcMultiplexer.java:125)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:683)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)}}
{{ at org.apache.beam.vendor.grpc.v1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)}}
{{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ at java.lang.Thread.run(Thread.java:748)}}

 

The full stack trace and logs are attached.
Please take a look and let me know if more information is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)