You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ben Kim <be...@gmail.com> on 2012/07/27 11:38:32 UTC

Re: Error reading task output

Hi
I'm having a similar problem so I'll continue on this mailing to describe
my issue.

I ran a MR job that takes 70GB of input and creates 1098 mappers and 100
Reducers to process tasks. (on 9 node Hadoop cluster)
but the job fails and 4 datanode dies after few min (processes are still
running, but the master  recognize them as dead).
When I investigate the job, it looks like 20 mappers fail with these errors

ProcfsBasedProcessTree: java.io.IOException: Cannot run program "getconf":
> java.io.IOException: error=11, Resource temporarily unavailable
> ..
> OutOfMemoryError: unable to create new native thread
> ..
> # There is insufficient memory for the Java Runtime Environment to
> continue.
> # Cannot create GC thread. Out of system resources.


Reducers also fail because they weren't able to retrieve the failed mapper
outputs.
I'm guessing for somehow a JVM memory reaches its max and tasktrackers and
datanodes aren't able to create new threads, so they die.

But as lack of my experience in hadoop, I don't know what's actually
causing it. And of course I dun have answers to it yet.

here are some *configurations*
HADOOP_HEAPSIZE=4096
HADOOP_NAMENODE_OPTS = .. -Xmx2g ..
HADOOP_DATANODE_OPTS = .. -Xmx4g ..
HADOOP_JOBTRACKER_OPTS = .. -Xmx4g ..

dfs.datanode.max.xcievers = 60000
mapred.child.java.opts = -Xmx400m
mapred.tasktracker.map.tasks.maximum = 14
mapred.tasktracker.reduce.tasks.maximum = 14

also attached the* logs*

If anyone knows answers to it please please let me know.
I will appreciate anyone help on this.

Best regards,
Ben

On Fri, Jun 15, 2012 at 1:05 PM, Harsh J <ha...@cloudera.com> wrote:

> Do you ship a lot of dist-cache files or perhaps have a bad
> mapred.child.java.opts parameter?
>
> On Fri, Jun 15, 2012 at 1:39 AM, Shamshad Ansari <sa...@apixio.com>
> wrote:
> > Hi All,
> > When I run hadoop jobs, I observe the following errors. Also, I notice
> that
> > data node dies every time  the job is initiated.
> >
> > Does any one know what may be causing this and how to solve this?
> >
> > ======================
> >
> > 12/06/14 19:57:17 INFO input.FileInputFormat: Total input paths to
> process :
> > 1
> > 12/06/14 19:57:17 INFO mapred.JobClient: Running job:
> job_201206141136_0002
> > 12/06/14 19:57:18 INFO mapred.JobClient:  map 0% reduce 0%
> > 12/06/14 19:57:27 INFO mapred.JobClient: Task Id :
> > attempt_201206141136_0002_m_000001_0, Status : FAILED
> > java.lang.Throwable: Child Error
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> > Caused by: java.io.IOException: Task process exit with nonzero status of
> 1.
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> >
> > 12/06/14 19:57:27 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stdout
> > 12/06/14 19:57:27 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stderr
> > 12/06/14 19:57:33 INFO mapred.JobClient: Task Id :
> > attempt_201206141136_0002_r_000002_0, Status : FAILED
> > java.lang.Throwable: Child Error
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> > Caused by: java.io.IOException: Task process exit with nonzero status of
> 1.
> >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> >
> > 12/06/14 19:57:33 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_r_000002_0&filter=stdout
> > 12/06/14 19:57:33 WARN mapred.JobClient: Error reading task
> >
> outputhttp://node1:50060/tasklog?plaintext=true&attemptid=attempt_201206141136_0002_r_000002_0&filter=stderr
> > ^Chadoop@ip-10-174-87-251:~/apixio-pipeline/pipeline-trigger$ 12/06/14
> > 19:57:27 WARN mapred.JobClient: Error reading task
> >
> outputhttp:/node1:50060/sklog?plaintext=true&attemptid=attempt_201206141136_0002_m_000001_0&filter=stdout
> >
> > Thank you,
> > --Shamshad
> >
>
>
>
> --
> Harsh J
>



-- 

*Benjamin Kim*
*benkimkimben at gmail*

Re: Error reading task output

Posted by Ben Kim <be...@gmail.com>.
Bejoy,
Thanks alot for your response. You are right. The problem is with the
misconfigured nproc on the OS.
Originally, my limits.conf file was something like this:

* hard    nofile    1000000
 * soft    nofile    1000000
 * hard    nproc    320000
 * soft    nproc    320000

but for some reason linux hadn't taken the * wild-card for nproc.
(nofile has been applied correctly). It's a very absurd thing, but I
changed it like following

*   hard    nofile    1000000
*   soft    nofile    1000000
root soft nproc 320000
root hard nproc 320000
hadoop soft nproc 320000
hadoop hard nproc 320000


and the problem is solved!

Thanks again for your help!

Ben


On Fri, Jul 27, 2012 at 9:03 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi Ben
>
> This error happens when the mapreduce job triggers more number of
> process than allowed by the underlying OS. You need to increase the
> nproc value if it is the default one.
>
> You  can get the current values from linux using
> ulimit -u
> The default is 1024 I guess. Check that for the user that runs
> mapreduce jobs, for a non security enabled cluster it is mapred.
>
> You need to increase this to a laarge value using
> mapred soft nproc 10000
> mapred hard nproc 10000
>
> If you are running on a security enabled cluster, this value should be
> raised for the user who submits the job.
>
> Regards
> Bejoy KS
>



-- 

*Benjamin Kim*
*benkimkimben at gmail*

Re: Error reading task output

Posted by Bejoy Ks <be...@gmail.com>.
Hi Ben

This error happens when the mapreduce job triggers more number of
process than allowed by the underlying OS. You need to increase the
nproc value if it is the default one.

You  can get the current values from linux using
ulimit -u
The default is 1024 I guess. Check that for the user that runs
mapreduce jobs, for a non security enabled cluster it is mapred.

You need to increase this to a laarge value using
mapred soft nproc 10000
mapred hard nproc 10000

If you are running on a security enabled cluster, this value should be
raised for the user who submits the job.

Regards
Bejoy KS