You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2014/12/15 12:56:13 UTC

[jira] [Commented] (HBASE-12691) sweep job needs to exit non-zero if job fails for any reason.

    [ https://issues.apache.org/jira/browse/HBASE-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246589#comment-14246589 ] 

Jonathan Hsieh commented on HBASE-12691:
----------------------------------------

before 

{code}
2014-12-15 00:08:12,385 INFO  [main] mapreduce.Job: Job job_1412751131866_0006 failed with state FAILED due to: Application application_1412751131866_0006 failed 2 times due to AM Container for appattempt_1412751131866_0006_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
org.apache.hadoop.util.Shell$ExitCodeException: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
	at org.apache.hadoop.util.Shell.run(Shell.java:424)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)


Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2014-12-15 00:08:12,437 INFO  [main] mapreduce.Job: Counters: 0
2014-12-15 00:08:12,468 INFO  [main] zookeeper.ZooKeeper: Session: 0x14a2c192a110424 closed
2014-12-15 00:08:12,468 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
Finished: SUCCESS
{code}

after

{code}
2014-12-15 03:45:47,111 INFO  [main] mapreduce.Job: Job job_1412751131866_0008 failed with state FAILED due to: Application application_1412751131866_0008 failed 2 times due to AM Container for appattempt_1412751131866_0008_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
org.apache.hadoop.util.Shell$ExitCodeException: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
	at org.apache.hadoop.util.Shell.run(Shell.java:424)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)


Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2014-12-15 03:45:47,163 INFO  [main] mapreduce.Job: Counters: 0
Job Failed
2014-12-15 03:45:47,210 INFO  [main] zookeeper.ZooKeeper: Session: 0x14a4dc35dac0017 closed
2014-12-15 03:45:47,210 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
Build step 'Execute shell' marked build as failure
Finished: FAILURE
{code}


> sweep job needs to exit non-zero if job fails for any reason.
> -------------------------------------------------------------
>
>                 Key: HBASE-12691
>                 URL: https://issues.apache.org/jira/browse/HBASE-12691
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mob
>    Affects Versions: hbase-11339
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-11339
>
>         Attachments: hbase-12691.patch
>
>
> When buliding up automated testing  I noticed that th sweepjob would not "fail" because it exited 0 even if the job failed.  This add the proper exit hygiene adding non-zero exit codes on failure events so that we can rely upon the job in automation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)