You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by stephen mulcahy <st...@deri.org> on 2009/06/10 16:39:19 UTC
Hadoop benchmarking
Hi,
I'm currently doing some testing of different configurations using the
Hadoop Sort as follows,
bin/hadoop jar hadoop-*-examples.jar randomwriter
-Dtest.randomwrite.total_bytes=107374182400 /benchmark100
bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort
The only changes I've made from the standard config are the following in
conf/mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
</property>
I'm running this on 4 systems, each with 8 processor cores and 4
separate disks.
Is there anything else I should change to stress memory more? The
systems in questions have 16GB of memory but the most thats getting used
during a run of this benchmark is about 2GB (and most of that seems to
be os caching).
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: Hadoop benchmarking
Posted by Matei Zaharia <ma...@cloudera.com>.
Owen, one problem with Arun's slide deck is that while it lists the
parameters that matter, it doesn't list suggested values for them. Do you
have any guide about that? In particular, the only places I know that talk
about how to set these parameters are
http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/and
http://wiki.apache.org/hadoop/FAQ#3.
On Wed, Jun 10, 2009 at 12:14 PM, Owen O'Malley <om...@apache.org> wrote:
> Take a look at Arun's slide deck on Hadoop performance:
>
> http://bit.ly/EDCg3
>
> It is important to get io.sort.mb large enough, the io.sort.factor should
> be closer to 100 instead of 10. I'd also use large block sizes to reduce the
> number of maps. Please see the deck for other important factors.
>
> -- Owen
>
Re: Hadoop benchmarking
Posted by Owen O'Malley <om...@apache.org>.
Take a look at Arun's slide deck on Hadoop performance:
http://bit.ly/EDCg3
It is important to get io.sort.mb large enough, the io.sort.factor
should be closer to 100 instead of 10. I'd also use large block sizes
to reduce the number of maps. Please see the deck for other important
factors.
-- Owen
Re: Hadoop benchmarking
Posted by Aaron Kimball <aa...@cloudera.com>.
Hi Stephen,
That will set the maximum heap allowable, but doesn't tell Hadoop's internal
systems necessarily to take advantage of it. There's a number of other
settings that adjust performance. At Cloudera we have a config tool that
generates Hadoop configurations with reasonable first-approximation values
for your cluster -- check out http://my.cloudera.com and look at the
hadoop-site.xml it generates. If you start from there you might find a
better parameter space to explore. Please share back your findings -- we'd
love to tweak the tool even more with some external feedback :)
- Aaron
On Wed, Jun 10, 2009 at 7:39 AM, stephen mulcahy
<st...@deri.org>wrote:
> Hi,
>
> I'm currently doing some testing of different configurations using the
> Hadoop Sort as follows,
>
> bin/hadoop jar hadoop-*-examples.jar randomwriter
> -Dtest.randomwrite.total_bytes=107374182400 /benchmark100
>
> bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort
>
> The only changes I've made from the standard config are the following in
> conf/mapred-site.xml
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx1024M</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>8</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> </property>
>
> I'm running this on 4 systems, each with 8 processor cores and 4 separate
> disks.
>
> Is there anything else I should change to stress memory more? The systems
> in questions have 16GB of memory but the most thats getting used during a
> run of this benchmark is about 2GB (and most of that seems to be os
> caching).
>
> Thanks,
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.ie http://webstar.deri.ie http://sindice.com
>