You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by stephen mulcahy <st...@deri.org> on 2009/06/10 16:39:19 UTC

Hadoop benchmarking

Hi,

I'm currently doing some testing of different configurations using the 
Hadoop Sort as follows,

bin/hadoop jar hadoop-*-examples.jar randomwriter 
-Dtest.randomwrite.total_bytes=107374182400 /benchmark100

bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort

The only changes I've made from the standard config are the following in 
conf/mapred-site.xml

    <property>
      <name>mapred.child.java.opts</name>
      <value>-Xmx1024M</value>
    </property>

    <property>
      <name>mapred.tasktracker.map.tasks.maximum</name>
      <value>8</value>
    </property>

    <property>
      <name>mapred.tasktracker.reduce.tasks.maximum</name>
      <value>4</value>
    </property>

I'm running this on 4 systems, each with 8 processor cores and 4 
separate disks.

Is there anything else I should change to stress memory more? The 
systems in questions have 16GB of memory but the most thats getting used 
during a run of this benchmark is about 2GB (and most of that seems to 
be os caching).

Thanks,

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: Hadoop benchmarking

Posted by Matei Zaharia <ma...@cloudera.com>.
Owen, one problem with Arun's slide deck is that while it lists the
parameters that matter, it doesn't list suggested values for them. Do you
have any guide about that? In particular, the only places I know that talk
about how to set these parameters are
http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/and
http://wiki.apache.org/hadoop/FAQ#3.

On Wed, Jun 10, 2009 at 12:14 PM, Owen O'Malley <om...@apache.org> wrote:

> Take a look at Arun's slide deck on Hadoop performance:
>
> http://bit.ly/EDCg3
>
> It is important to get io.sort.mb large enough, the io.sort.factor should
> be closer to 100 instead of 10. I'd also use large block sizes to reduce the
> number of maps. Please see the deck for other important factors.
>
> -- Owen
>

Re: Hadoop benchmarking

Posted by Owen O'Malley <om...@apache.org>.
Take a look at Arun's slide deck on Hadoop performance:

http://bit.ly/EDCg3

It is important to get io.sort.mb large enough, the io.sort.factor  
should be closer to 100 instead of 10. I'd also use large block sizes  
to reduce the number of maps. Please see the deck for other important  
factors.

-- Owen

Re: Hadoop benchmarking

Posted by Aaron Kimball <aa...@cloudera.com>.
Hi Stephen,

That will set the maximum heap allowable, but doesn't tell Hadoop's internal
systems necessarily to take advantage of it. There's a number of other
settings that adjust performance. At Cloudera we have a config tool that
generates Hadoop configurations with reasonable first-approximation values
for your cluster -- check out http://my.cloudera.com and look at the
hadoop-site.xml it generates. If you start from there you might find a
better parameter space to explore. Please share back your findings -- we'd
love to tweak the tool even more with some external feedback :)

- Aaron


On Wed, Jun 10, 2009 at 7:39 AM, stephen mulcahy
<st...@deri.org>wrote:

> Hi,
>
> I'm currently doing some testing of different configurations using the
> Hadoop Sort as follows,
>
> bin/hadoop jar hadoop-*-examples.jar randomwriter
> -Dtest.randomwrite.total_bytes=107374182400 /benchmark100
>
> bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort
>
> The only changes I've made from the standard config are the following in
> conf/mapred-site.xml
>
>   <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx1024M</value>
>   </property>
>
>   <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>8</value>
>   </property>
>
>   <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>4</value>
>   </property>
>
> I'm running this on 4 systems, each with 8 processor cores and 4 separate
> disks.
>
> Is there anything else I should change to stress memory more? The systems
> in questions have 16GB of memory but the most thats getting used during a
> run of this benchmark is about 2GB (and most of that seems to be os
> caching).
>
> Thanks,
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>