You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gary Yao (JIRA)" <ji...@apache.org> on 2018/04/17 14:37:00 UTC

[jira] [Reopened] (FLINK-8900) YARN FinalStatus always shows as KILLED with Flip-6

     [ https://issues.apache.org/jira/browse/FLINK-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Yao reopened FLINK-8900:
-----------------------------

When submitting in non-detached mode, the problem still surfaces. It detached mode the status is set correctly.

Command used to submit:
{noformat}
HADOOP_CLASSPATH=`hadoop classpath` bin/flink run -m yarn-cluster -yjm 2048 -ytm 2048  ./examples/streaming/WordCount.jar
{noformat}

State and FinalStatus is: KILLED



> YARN FinalStatus always shows as KILLED with Flip-6
> ---------------------------------------------------
>
>                 Key: FLINK-8900
>                 URL: https://issues.apache.org/jira/browse/FLINK-8900
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Nico Kruber
>            Assignee: Till Rohrmann
>            Priority: Blocker
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> Whenever I run a simple simple word count like this one on YARN with Flip-6 enabled,
> {code}
> ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c org.apache.flink.streaming.examples.wordcount.WordCount ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING
> {code}
> it will show up as {{KILLED}} in the {{State}} and {{FinalStatus}} columns even though the program ran successfully like this one (irrespective of FLINK-8899 occurring or not):
> {code}
> 2018-03-08 16:48:39,049 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Streaming WordCount (11a794d2f5dc2955d8015625ec300c20) switched from state RUNNING to FINISHED.
> 2018-03-08 16:48:39,050 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Stopping checkpoint coordinator for job 11a794d2f5dc2955d8015625ec300c20
> 2018-03-08 16:48:39,050 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore  - Shutting down
> 2018-03-08 16:48:39,078 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Job 11a794d2f5dc2955d8015625ec300c20 reached globally terminal state FINISHED.
> 2018-03-08 16:48:39,151 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager e58efd886429e8f080815ea74ddfa734 at the SlotManager.
> 2018-03-08 16:48:39,221 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Stopping the JobMaster for job Streaming WordCount(11a794d2f5dc2955d8015625ec300c20).
> 2018-03-08 16:48:39,270 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Close ResourceManager connection 43f725adaee14987d3ff99380701f52f: JobManager is shutting down..
> 2018-03-08 16:48:39,270 INFO  org.apache.flink.yarn.YarnResourceManager                     - Disconnect job manager 00000000000000000000000000000000@akka.tcp://flink@ip-172-31-7-0.eu-west-1.compute.internal:34281/user/jobmanager_0 for job 11a794d2f5dc2955d8015625ec300c20 from the resource manager.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Suspending SlotPool.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Stopping SlotPool.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunner           - JobManagerRunner already shutdown.
> 2018-03-08 16:48:39,775 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager 4e1fb6c8f95685e24b6a4cb4b71ffb92 at the SlotManager.
> 2018-03-08 16:48:39,846 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager b5bce0bdfa7fbb0f4a0905cc3ee1c233 at the SlotManager.
> 2018-03-08 16:48:39,876 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
> 2018-03-08 16:48:39,910 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager a35b0690fdc6ec38bbcbe18a965000fd at the SlotManager.
> 2018-03-08 16:48:39,942 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager 5175cabe428bea19230ac056ff2a17bb at the SlotManager.
> 2018-03-08 16:48:39,974 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:46511
> 2018-03-08 16:48:39,975 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting down BLOB cache
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)