You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Diana Carroll (JIRA)" <ji...@apache.org> on 2015/08/07 16:01:46 UTC

[jira] [Commented] (OOZIE-2326) oozie/yarn/spark: active container remains after failed job

    [ https://issues.apache.org/jira/browse/OOZIE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661858#comment-14661858 ] 

Diana Carroll commented on OOZIE-2326:
--------------------------------------

Actually with further testing, this seems to occur (sometimes? all the time?) even when the local Spark job succeeds.

> oozie/yarn/spark: active container remains after failed job
> -----------------------------------------------------------
>
>                 Key: OOZIE-2326
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2326
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 4.1.0
>         Environment: pseudo-distributed (single VM), CentOS 6.6, CDH 5.4.3
>            Reporter: Diana Carroll
>         Attachments: container-logs.txt, ooziejob-logs.txt, yarnbug1.png, yarnbug2.png
>
>
> Issue occurs when I launch a Spark job (local mode) that fails.  (My example failed because I tried to read a non-existent file).  When this occur, the job fails, and YARN ends up in a weird state: the RM manager shows the launch job has completed...but a container for the job is still live on the slave node.  Because I'm running in pseudo-dist mode, this totally hangs my cluster: no other jobs can run because there are only resources for a single container, and that container is running the dead Oozie launcher.
> If I wait long enough, YARN will eventually time out and release the container and start accepting new jobs.  But until then I'm dead in the water.
> Attaching screen shots that show the state right after running the failed job:
> the RM shows no jobs running
> the node shows one container running
> Also attaching a log file for the oozie job and the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)