You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Matt Kennedy <st...@gmail.com> on 2012/08/21 21:15:25 UTC

Map Reduce "Child Error" task failure

I'm encountering a sporadic error while running MapReduce jobs, it
shows up in the console output as follows:

12/08/21 14:56:05 INFO mapred.JobClient: Task Id :
attempt_201208211430_0001_m_003538_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

12/08/21 14:56:05 WARN mapred.JobClient: Error reading task
outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout
12/08/21 14:56:05 WARN mapred.JobClient: Error reading task
outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr

The conditions look exactly like those described in:
https://issues.apache.org/jira/browse/MAPREDUCE-4003

Unfortunately, this issue is marked as closed for Apache Hadoop
version 1.0.3, but that's the version that I'm running into this issue
with.

There does seem to be a correlation between the frequency of these
errors and the number of concurrent map tasks being executed, however
the hardware resources on the cluster do not appear to be near their
limits. I'm assuming that there is a knob somewhere that is
maladjusted that is causing this error, however I haven't found it.

I did find this discussion
(https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion)
on CDH users list describing the exact same problem and the advice was
to increase the value of the mapred.child.ulimit setting. However, I
had this value initially unset, which should mean that the value is
unlimited if my research is correct. Then I set the value to 3 GB (3x
my setting for mapred.map.child.java.opts) and it still did not
resolve the problem. Finally, out of frustration, I just added a zero
at the end and now the value is 31457280 (the unit for the setting is
in KB) which is 30GB. I'm still having the problem.

Is anybody else seeing this issue or have an idea for a workaround?
Right now my workaround is to set the allowed failures to be very high
before a tasktracker is blacklisted, but this has the unintended side
effect of taking a very long time to evict legitimately messed up
tasktrackers. If this error is indicative of some other configuration
problem, I'd like to try to resolve it.

Ideas? Or should I re-open the JIRA?

Thank you for your time,
Matt

RE: Map Reduce "Child Error" task failure

Posted by "Joshi, Shrinivas" <Sh...@amd.com>.

Hi Matt,

You are most probably seeing this https://issues.apache.org/jira/browse/MAPREDUCE-2374 

There is a single line fix for this issue. See the latest patch attached to the above JIRA entry.

-Shrinivas

-----Original Message-----
From: Matt Kennedy [mailto:stinkymatt@gmail.com] 
Sent: Tuesday, August 21, 2012 2:15 PM
To: user@hadoop.apache.org
Subject: Map Reduce "Child Error" task failure

I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows:

12/08/21 14:56:05 INFO mapred.JobClient: Task Id :
attempt_201208211430_0001_m_003538_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout
12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr

The conditions look exactly like those described in:
https://issues.apache.org/jira/browse/MAPREDUCE-4003

Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with.

There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it.

I did find this discussion
(https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion)
on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem.

Is anybody else seeing this issue or have an idea for a workaround?
Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it.

Ideas? Or should I re-open the JIRA?

Thank you for your time,
Matt

RE: Map Reduce "Child Error" task failure

Posted by "Joshi, Shrinivas" <Sh...@amd.com>.

Hi Matt,

You are most probably seeing this https://issues.apache.org/jira/browse/MAPREDUCE-2374 

There is a single line fix for this issue. See the latest patch attached to the above JIRA entry.

-Shrinivas

-----Original Message-----
From: Matt Kennedy [mailto:stinkymatt@gmail.com] 
Sent: Tuesday, August 21, 2012 2:15 PM
To: user@hadoop.apache.org
Subject: Map Reduce "Child Error" task failure

I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows:

12/08/21 14:56:05 INFO mapred.JobClient: Task Id :
attempt_201208211430_0001_m_003538_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout
12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr

The conditions look exactly like those described in:
https://issues.apache.org/jira/browse/MAPREDUCE-4003

Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with.

There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it.

I did find this discussion
(https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion)
on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem.

Is anybody else seeing this issue or have an idea for a workaround?
Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it.

Ideas? Or should I re-open the JIRA?

Thank you for your time,
Matt

RE: Map Reduce "Child Error" task failure

Posted by "Joshi, Shrinivas" <Sh...@amd.com>.

Hi Matt,

You are most probably seeing this https://issues.apache.org/jira/browse/MAPREDUCE-2374 

There is a single line fix for this issue. See the latest patch attached to the above JIRA entry.

-Shrinivas

-----Original Message-----
From: Matt Kennedy [mailto:stinkymatt@gmail.com] 
Sent: Tuesday, August 21, 2012 2:15 PM
To: user@hadoop.apache.org
Subject: Map Reduce "Child Error" task failure

I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows:

12/08/21 14:56:05 INFO mapred.JobClient: Task Id :
attempt_201208211430_0001_m_003538_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout
12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr

The conditions look exactly like those described in:
https://issues.apache.org/jira/browse/MAPREDUCE-4003

Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with.

There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it.

I did find this discussion
(https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion)
on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem.

Is anybody else seeing this issue or have an idea for a workaround?
Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it.

Ideas? Or should I re-open the JIRA?

Thank you for your time,
Matt

RE: Map Reduce "Child Error" task failure

Posted by "Joshi, Shrinivas" <Sh...@amd.com>.

Hi Matt,

You are most probably seeing this https://issues.apache.org/jira/browse/MAPREDUCE-2374 

There is a single line fix for this issue. See the latest patch attached to the above JIRA entry.

-Shrinivas

-----Original Message-----
From: Matt Kennedy [mailto:stinkymatt@gmail.com] 
Sent: Tuesday, August 21, 2012 2:15 PM
To: user@hadoop.apache.org
Subject: Map Reduce "Child Error" task failure

I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows:

12/08/21 14:56:05 INFO mapred.JobClient: Task Id :
attempt_201208211430_0001_m_003538_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout
12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr

The conditions look exactly like those described in:
https://issues.apache.org/jira/browse/MAPREDUCE-4003

Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with.

There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it.

I did find this discussion
(https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion)
on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem.

Is anybody else seeing this issue or have an idea for a workaround?
Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it.

Ideas? Or should I re-open the JIRA?

Thank you for your time,
Matt