You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Diana Carroll (JIRA)" <ji...@apache.org> on 2015/08/07 15:46:45 UTC
[jira] [Updated] (OOZIE-2326) oozie/yarn/spark: active container
remains after failed job
[ https://issues.apache.org/jira/browse/OOZIE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Diana Carroll updated OOZIE-2326:
---------------------------------
Attachment: yarnbug1.png
ooziejob-logs.txt
yarnbug2.png
container-logs.txt
> oozie/yarn/spark: active container remains after failed job
> -----------------------------------------------------------
>
> Key: OOZIE-2326
> URL: https://issues.apache.org/jira/browse/OOZIE-2326
> Project: Oozie
> Issue Type: Bug
> Components: workflow
> Affects Versions: 4.1.0
> Environment: pseudo-distributed (single VM), CentOS 6.6, CDH 5.4.3
> Reporter: Diana Carroll
> Attachments: container-logs.txt, ooziejob-logs.txt, yarnbug1.png, yarnbug2.png
>
>
> Issue occurs when I launch a Spark job (local mode) that fails. (My example failed because I tried to read a non-existent file). When this occur, the job fails, and YARN ends up in a weird state: the RM manager shows the launch job has completed...but a container for the job is still live on the slave node. Because I'm running in pseudo-dist mode, this totally hangs my cluster: no other jobs can run because there are only resources for a single container, and that container is running the dead Oozie launcher.
> If I wait long enough, YARN will eventually time out and release the container and start accepting new jobs. But until then I'm dead in the water.
> Attaching screen shots that show the state right after running the failed job:
> the RM shows no jobs running
> the node shows one container running
> Also attaching a log file for the oozie job and the container.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)