You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Yuri Pradkin (JIRA)" <ji...@apache.org> on 2008/03/21 21:51:24 UTC

[jira] Created: (HADOOP-3068) hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and reduce processes exit with non zero status

hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and reduce processes exit with non zero status
-------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-3068
                 URL: https://issues.apache.org/jira/browse/HADOOP-3068
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/streaming
         Environment: Java(TM) SE Runtime Environment (build 1.6.0_04-b12); Hadoop version 0.17.0-dev, r639662
            Reporter: Yuri Pradkin
             Fix For: 0.17.0


When I set *stream.non.zero.exit.is.failure* to true and run a streaming job with reducers that exit with a non-zero status, those tasks fail apparently waiting for something.

...
2008-03-21 13:33:53,715 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=65501/1/0 in:334=65501/196 [rec/s] out:0=1/196 [rec/s]
2008-03-21 13:33:53,719 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2008-03-21 13:34:11,228 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=65536/2
2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exitted with code 1
2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2008-03-21 13:34:11,238 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
2008-03-21 13:34:11,245 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:331)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:475)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:110)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2113)

After that the task still shows up with status:Running, but it just hangs there and when/if all tasks get into this state, the whole cluster hangs.

BTW, may I suggest that we make *stream.non.zero.exit.is.failure* default to true after this is fixed?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3068) hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and reduce processes exit with non zero status

Posted by "Rick Cox (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Cox resolved HADOOP-3068.
------------------------------

    Resolution: Duplicate

I haven't been able to recreate this since HADOOP-3039 was fixed.

> hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and reduce processes exit with non zero status
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3068
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3068
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>         Environment: Java(TM) SE Runtime Environment (build 1.6.0_04-b12); Hadoop version 0.17.0-dev, r639662
>            Reporter: Yuri Pradkin
>
> When I set *stream.non.zero.exit.is.failure* to true and run a streaming job with reducers that exit with a non-zero status, those tasks fail apparently waiting for something.
> ...
> 2008-03-21 13:33:53,715 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=65501/1/0 in:334=65501/196 [rec/s] out:0=1/196 [rec/s]
> 2008-03-21 13:33:53,719 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
> 2008-03-21 13:34:11,228 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=65536/2
> 2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exitted with code 1
> 2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
> 2008-03-21 13:34:11,238 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
> 2008-03-21 13:34:11,245 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
>         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:331)
>         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:475)
>         at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:110)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2113)
> After that the task still shows up with status:Running, but it just hangs there and when/if all tasks get into this state, the whole cluster hangs.
> BTW, may I suggest that we make *stream.non.zero.exit.is.failure* default to true after this is fixed?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3068) hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and reduce processes exit with non zero status

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3068:
------------------------------------

    Fix Version/s:     (was: 0.17.0)

> hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and reduce processes exit with non zero status
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3068
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3068
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>         Environment: Java(TM) SE Runtime Environment (build 1.6.0_04-b12); Hadoop version 0.17.0-dev, r639662
>            Reporter: Yuri Pradkin
>
> When I set *stream.non.zero.exit.is.failure* to true and run a streaming job with reducers that exit with a non-zero status, those tasks fail apparently waiting for something.
> ...
> 2008-03-21 13:33:53,715 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=65501/1/0 in:334=65501/196 [rec/s] out:0=1/196 [rec/s]
> 2008-03-21 13:33:53,719 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
> 2008-03-21 13:34:11,228 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=65536/2
> 2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exitted with code 1
> 2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
> 2008-03-21 13:34:11,238 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
> 2008-03-21 13:34:11,245 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
>         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:331)
>         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:475)
>         at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:110)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2113)
> After that the task still shows up with status:Running, but it just hangs there and when/if all tasks get into this state, the whole cluster hangs.
> BTW, may I suggest that we make *stream.non.zero.exit.is.failure* default to true after this is fixed?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.