You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/01 04:28:00 UTC

[jira] [Commented] (FLINK-8900) YARN FinalStatus always shows as KILLED with Flip-6

    [ https://issues.apache.org/jira/browse/FLINK-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459437#comment-16459437 ] 

ASF GitHub Bot commented on FLINK-8900:
---------------------------------------

Github user GJL commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5944#discussion_r185164034
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/MiniDispatcher.java ---
    @@ -109,7 +119,11 @@ public MiniDispatcher(
     
     		if (executionMode == ClusterEntrypoint.ExecutionMode.NORMAL) {
     			// terminate the MiniDispatcher once we served the first JobResult successfully
    -			jobResultFuture.whenComplete((JobResult ignored, Throwable throwable) -> shutDown());
    +			jobResultFuture.whenComplete((JobResult result, Throwable throwable) -> {
    +				ApplicationStatus status = result.getSerializedThrowable().isPresent() ?
    +						ApplicationStatus.FAILED : ApplicationStatus.SUCCEEDED;
    +				jobTerminationFuture.complete(status);
    --- End diff --
    
    I think the functional way would be:
    
    ```
    				jobTerminationFuture.complete(result.getSerializedThrowable()
    					.map(serializedThrowable -> ApplicationStatus.FAILED)
    					.orElse(ApplicationStatus.SUCCEEDED));
    ```


> YARN FinalStatus always shows as KILLED with Flip-6
> ---------------------------------------------------
>
>                 Key: FLINK-8900
>                 URL: https://issues.apache.org/jira/browse/FLINK-8900
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Nico Kruber
>            Assignee: Gary Yao
>            Priority: Blocker
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> Whenever I run a simple simple word count like this one on YARN with Flip-6 enabled,
> {code}
> ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c org.apache.flink.streaming.examples.wordcount.WordCount ./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING
> {code}
> it will show up as {{KILLED}} in the {{State}} and {{FinalStatus}} columns even though the program ran successfully like this one (irrespective of FLINK-8899 occurring or not):
> {code}
> 2018-03-08 16:48:39,049 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Streaming WordCount (11a794d2f5dc2955d8015625ec300c20) switched from state RUNNING to FINISHED.
> 2018-03-08 16:48:39,050 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Stopping checkpoint coordinator for job 11a794d2f5dc2955d8015625ec300c20
> 2018-03-08 16:48:39,050 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore  - Shutting down
> 2018-03-08 16:48:39,078 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Job 11a794d2f5dc2955d8015625ec300c20 reached globally terminal state FINISHED.
> 2018-03-08 16:48:39,151 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager e58efd886429e8f080815ea74ddfa734 at the SlotManager.
> 2018-03-08 16:48:39,221 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Stopping the JobMaster for job Streaming WordCount(11a794d2f5dc2955d8015625ec300c20).
> 2018-03-08 16:48:39,270 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Close ResourceManager connection 43f725adaee14987d3ff99380701f52f: JobManager is shutting down..
> 2018-03-08 16:48:39,270 INFO  org.apache.flink.yarn.YarnResourceManager                     - Disconnect job manager 00000000000000000000000000000000@akka.tcp://flink@ip-172-31-7-0.eu-west-1.compute.internal:34281/user/jobmanager_0 for job 11a794d2f5dc2955d8015625ec300c20 from the resource manager.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Suspending SlotPool.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Stopping SlotPool.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunner           - JobManagerRunner already shutdown.
> 2018-03-08 16:48:39,775 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager 4e1fb6c8f95685e24b6a4cb4b71ffb92 at the SlotManager.
> 2018-03-08 16:48:39,846 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager b5bce0bdfa7fbb0f4a0905cc3ee1c233 at the SlotManager.
> 2018-03-08 16:48:39,876 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
> 2018-03-08 16:48:39,910 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager a35b0690fdc6ec38bbcbe18a965000fd at the SlotManager.
> 2018-03-08 16:48:39,942 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Register TaskManager 5175cabe428bea19230ac056ff2a17bb at the SlotManager.
> 2018-03-08 16:48:39,974 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:46511
> 2018-03-08 16:48:39,975 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting down BLOB cache
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)