You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2017/04/03 12:13:42 UTC
[jira] [Updated] (FLINK-4193) Task manager JVM crashes while
deploying cancelling jobs
[ https://issues.apache.org/jira/browse/FLINK-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aljoscha Krettek updated FLINK-4193:
------------------------------------
Component/s: (was: Streaming)
State Backends, Checkpointing
> Task manager JVM crashes while deploying cancelling jobs
> --------------------------------------------------------
>
> Key: FLINK-4193
> URL: https://issues.apache.org/jira/browse/FLINK-4193
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination, State Backends, Checkpointing
> Reporter: Gyula Fora
> Priority: Critical
>
> We have observed several TM crashes while deploying larger stateful streaming jobs that use the RocksDB state backend.
> As the JVMs crash the logs don't show anything but I have uploaded all the info I have got from the standard output.
> This indicates some GC and possibly some RocksDB issues underneath but we could not really figure out much more.
> GC segfault
> https://gist.github.com/gyfora/9e56d4a0d4fc285a8d838e1b281ae125
> Other crashes (maybe rocks related)
> https://gist.github.com/gyfora/525c67c747873f0ff2ff2ed1682efefa
> https://gist.github.com/gyfora/b93611fde87b1f2516eeaf6bfbe8d818
> The third link shows 2 issues that happened in parallel...
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)