You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Gaurav Dasgupta <gd...@gmail.com> on 2012/08/28 09:01:40 UTC

Suggestions/Info required regarding Hadoop Benchmarking

Hi Users,

I have a 12 node CDH3 cluster where I am planning to run some benchmark
tests. My main intension is to run the benchmarks first with the default
Hadoop configuration and then analyze the outcomes and tune the Hadoop
metrics accordingly to increase the performance of my cluster.

Can some one provide me some suggestions that which are the important
Hadoop metrics that I should observe during benchmarking?
Also, I have seen somewhere that the ratio of "Avg Map Tasks" and "Avg
Reduce Tasks" Execution Time is recorded for various benchmarks. How
significant is that information for me to judge the cluster performance?
How the ratios will help me to analyze and tune the Hadoop cluster
accordingly for increase in performance.

Till now I have run the following benchmarks without tuning the cluster
(with default Hadoop configuration):

   - Sort
   - WordCount
   - TeraSort
   - TestDFSIO

Please provide suggestion that which are the other benchmarks that I should
run, especially from "hadoop-test.jar" in $HADOOP_HOME directory and what
are the usage of those jobs.

Thanks,
Gaurav Dasgupta