You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Mehar Simhadri (mesimhad)" <me...@cisco.com> on 2018/08/20 04:19:11 UTC

Flink threads cpu busy/stuck at NoFetchingInput.require (Versions 1.2 and 1.4)

Hi Flink Community,

In our flink deployments, we see some flink threads are cpu busy/stuck after few hours with the below stack

"Sink:AggregationSink (2/4)" #567 daemon prio=5 os_prio=0 tid=0x00007f901dc97000 nid=0x254 runnable [0x00007f8fe017f000]
java.lang.Thread.State: RUNNABLE
at org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:70)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:146)
at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:76)
at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:236)
at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:187)
at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:40)
at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:109)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:147)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:261)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:656)
at java.lang.Thread.run(Thread.java:745)

Few observations from the stack, while it is stuck.
SpillingAdaptiveSpanningRecordDeserializer.getNextRecord
nonSpanningRemaining = 13

All these 13 bytes were read even before reaching com.esotericsoftware.kryo.io.Input.readVarInt

We are wondering is this a serialization bug or memory segment corruption?

Any pointers on how to debug further will be much appreciated.

Regards,
Mehar