You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 姚吉龙 <ge...@gmail.com> on 2013/04/17 11:42:00 UTC

How to improve performance of this cluster

Hi everyone

We have a cluster of 31 datanodes with 1 namenode,each with 8-core cpu and
8G RAM
I am studying the approach to improve the performance of this cluster.Now
we have a datafile of 100G as the test case.
when I add the reduce number form 100 to 200, I did not see larger
improvment from 23m52s to 19m44s. Besides there two failed task appear in
this process:

java.lang.Throwable: Child Error at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by:
java.io.IOException: Task process exit with nonzero status of 126. at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

Here is my conf in mapred-site.xml:[image: 内嵌图片 1]

1.Could any help about the failed task? Why would this happen?
2.How can I continue to speed up the process of this case.
Any suggestion is welcome


BRs
Geelong

-- 
>From Good To Great

Re: How to improve performance of this cluster

Posted by Bejoy Ks <be...@gmail.com>.
Hi Geelong

Let me just put in my thoughts here

You have 8G of RAM. But you have 8+8 = 16 slots with task jvm size of 1G.
This means if all slots are utilized simultaneously then tasks need 16G but
only 8G is available, hence high chances of OOM errors.

When you decide on slots you need to consider the memory utilized by OS,
hadoop daemons etc, only the remaining memory has to be divided among task
slots.

Increasing the number of reduce tasks alone won't give too much of a
performance improvement. In MR the sort and shuffle is the most expensive
phase, try doing your tweaking there, some things i can think of are
1. Use map output compression
2. Use combiner if possible
3. reduce spills by adjusting io.sort.mb and io.sort.factor etc

Apart from this if you are having some custom code running,
controlling/filtering the data volume at initial stages of a multi stage MR
could bring in considerable performance improvement.



On Wed, Apr 17, 2013 at 3:12 PM, Ҧ���� <ge...@gmail.com> wrote:

> Hi everyone
>
> We have a cluster of 31 datanodes with 1 namenode,each with 8-core cpu and
> 8G RAM
> I am studying the approach to improve the performance of this cluster.Now
> we have a datafile of 100G as the test case.
> when I add the reduce number form 100 to 200, I did not see larger
> improvment from 23m52s to 19m44s. Besides there two failed task appear in
> this process:
>
> java.lang.Throwable: Child Error at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by:
> java.io.IOException: Task process exit with nonzero status of 126. at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>
> Here is my conf in mapred-site.xml:[image: ��ǶͼƬ 1]
>
> 1.Could any help about the failed task? Why would this happen?
>  2.How can I continue to speed up the process of this case.
> Any suggestion is welcome
>
>
> BRs
> Geelong
>
> --
> From Good To Great
>

Re: How to improve performance of this cluster

Posted by Bejoy Ks <be...@gmail.com>.
Hi Geelong

Let me just put in my thoughts here

You have 8G of RAM. But you have 8+8 = 16 slots with task jvm size of 1G.
This means if all slots are utilized simultaneously then tasks need 16G but
only 8G is available, hence high chances of OOM errors.

When you decide on slots you need to consider the memory utilized by OS,
hadoop daemons etc, only the remaining memory has to be divided among task
slots.

Increasing the number of reduce tasks alone won't give too much of a
performance improvement. In MR the sort and shuffle is the most expensive
phase, try doing your tweaking there, some things i can think of are
1. Use map output compression
2. Use combiner if possible
3. reduce spills by adjusting io.sort.mb and io.sort.factor etc

Apart from this if you are having some custom code running,
controlling/filtering the data volume at initial stages of a multi stage MR
could bring in considerable performance improvement.



On Wed, Apr 17, 2013 at 3:12 PM, 姚吉龙 <ge...@gmail.com> wrote:

> Hi everyone
>
> We have a cluster of 31 datanodes with 1 namenode,each with 8-core cpu and
> 8G RAM
> I am studying the approach to improve the performance of this cluster.Now
> we have a datafile of 100G as the test case.
> when I add the reduce number form 100 to 200, I did not see larger
> improvment from 23m52s to 19m44s. Besides there two failed task appear in
> this process:
>
> java.lang.Throwable: Child Error at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by:
> java.io.IOException: Task process exit with nonzero status of 126. at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>
> Here is my conf in mapred-site.xml:[image: 内嵌图片 1]
>
> 1.Could any help about the failed task? Why would this happen?
>  2.How can I continue to speed up the process of this case.
> Any suggestion is welcome
>
>
> BRs
> Geelong
>
> --
> From Good To Great
>

Re: How to improve performance of this cluster

Posted by Bejoy Ks <be...@gmail.com>.
Hi Geelong

Let me just put in my thoughts here

You have 8G of RAM. But you have 8+8 = 16 slots with task jvm size of 1G.
This means if all slots are utilized simultaneously then tasks need 16G but
only 8G is available, hence high chances of OOM errors.

When you decide on slots you need to consider the memory utilized by OS,
hadoop daemons etc, only the remaining memory has to be divided among task
slots.

Increasing the number of reduce tasks alone won't give too much of a
performance improvement. In MR the sort and shuffle is the most expensive
phase, try doing your tweaking there, some things i can think of are
1. Use map output compression
2. Use combiner if possible
3. reduce spills by adjusting io.sort.mb and io.sort.factor etc

Apart from this if you are having some custom code running,
controlling/filtering the data volume at initial stages of a multi stage MR
could bring in considerable performance improvement.



On Wed, Apr 17, 2013 at 3:12 PM, 姚吉龙 <ge...@gmail.com> wrote:

> Hi everyone
>
> We have a cluster of 31 datanodes with 1 namenode,each with 8-core cpu and
> 8G RAM
> I am studying the approach to improve the performance of this cluster.Now
> we have a datafile of 100G as the test case.
> when I add the reduce number form 100 to 200, I did not see larger
> improvment from 23m52s to 19m44s. Besides there two failed task appear in
> this process:
>
> java.lang.Throwable: Child Error at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by:
> java.io.IOException: Task process exit with nonzero status of 126. at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>
> Here is my conf in mapred-site.xml:[image: 内嵌图片 1]
>
> 1.Could any help about the failed task? Why would this happen?
>  2.How can I continue to speed up the process of this case.
> Any suggestion is welcome
>
>
> BRs
> Geelong
>
> --
> From Good To Great
>

Re: How to improve performance of this cluster

Posted by Bejoy Ks <be...@gmail.com>.
Hi Geelong

Let me just put in my thoughts here

You have 8G of RAM. But you have 8+8 = 16 slots with task jvm size of 1G.
This means if all slots are utilized simultaneously then tasks need 16G but
only 8G is available, hence high chances of OOM errors.

When you decide on slots you need to consider the memory utilized by OS,
hadoop daemons etc, only the remaining memory has to be divided among task
slots.

Increasing the number of reduce tasks alone won't give too much of a
performance improvement. In MR the sort and shuffle is the most expensive
phase, try doing your tweaking there, some things i can think of are
1. Use map output compression
2. Use combiner if possible
3. reduce spills by adjusting io.sort.mb and io.sort.factor etc

Apart from this if you are having some custom code running,
controlling/filtering the data volume at initial stages of a multi stage MR
could bring in considerable performance improvement.



On Wed, Apr 17, 2013 at 3:12 PM, Ҧ���� <ge...@gmail.com> wrote:

> Hi everyone
>
> We have a cluster of 31 datanodes with 1 namenode,each with 8-core cpu and
> 8G RAM
> I am studying the approach to improve the performance of this cluster.Now
> we have a datafile of 100G as the test case.
> when I add the reduce number form 100 to 200, I did not see larger
> improvment from 23m52s to 19m44s. Besides there two failed task appear in
> this process:
>
> java.lang.Throwable: Child Error at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by:
> java.io.IOException: Task process exit with nonzero status of 126. at
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>
> Here is my conf in mapred-site.xml:[image: ��ǶͼƬ 1]
>
> 1.Could any help about the failed task? Why would this happen?
>  2.How can I continue to speed up the process of this case.
> Any suggestion is welcome
>
>
> BRs
> Geelong
>
> --
> From Good To Great
>