You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Omar Schiaratura <sc...@libero.it> on 2008/05/27 15:09:32 UTC

performance test

Hi all, i made some test on a 66 node cluster(each node has 4 GB RAM and  two double core opteron) with hadoop 0.6.2
The algorithm i used to test is a version of blast (a bioinformatics algorithm for pattern recognition of protheine) launced with a python
program that uses map reduce hadoop api.
In a previous test of only 16 nodes, the algorithm scale well, but with more machine i reached a low speed-up.
Is the result 
someone knows how to modify environments and configuration to obtain a better performance on a large cluster?

thanks

I used the following configuration:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>ostro02:9000</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>ostro04:9001</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>/mnt/hd/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/mnt/hd/hdfs/data</value>
  </property>
  <property>
    <name>mapred.system.dir</name>
    <value>/mnt/hd/hdfs/system</value>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>/mnt/hd/hdfs/tmp</value>
  </property>
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>4</value>
  </property>
  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>4</value>
  </property>
  <property>
    <name>dfs.block.size</name>
    <value>157286400</value>
  </property>
  <property>
    <name>fs.inmemory.size.mb</name>
    <value>75</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>655360</value>
  </property>
</configuration>

I had parsed the log file of job tracker with the following results:
JOB_ID   N_NODES  DIR   N_TASKS ELAPSED_TIME(s)  TASK_INPUT_SIZE(MB)   MEAN_TASK_TIME(s)  STDEV_TASK_TIME
200805161411_0004   66      nt_input        272     50.0    79      18.5    
200805161416_0004   62      nt_input        272     49.4    79      18.4    
200805161425_0002   55      nt_input        272     58.0    79      18.5    
200805161425_0004   54      nt_input        272     49.0    79      18.5    
200805161444_0001   51      nt_input        272     59.5    79      18.7    
200805161444_0002   51      nt_input        272     59.2    79      18.4    
200805161444_0003   51      nt_input        272     57.7    79      18.4    
200805161444_0004   51      nt_input        272     59.3    79      18.6    
200805161450_0004   47      nt_input        272     60.2    79      18.4    
200805161457_0004   44      nt_input        272     62.0    79      18.5    
200805161504_0004   40      nt_input        272     60.7    79      18.7    
200805161511_0004   36      nt_input        272     67.0    79      18.7    
200805161518_0004   33      nt_input        272     62.4    79      18.7    
200805161526_0004   30      nt_input        272     73.3    79      18.6    
200805161534_0004   26      nt_input        272     79.7    79      18.6    
200805161616_0004   22      nt_input        272     91.7    79      18.7    
200805161625_0004   19      nt_input        272     98.5    79      18.8    
200805161759_0004   15      nt_input        272     118.0   79      18.9   
200805161812_0004   12      nt_input        272     136.0   79      18.9   
200805161826_0004   8       nt_input        272     188.6   79      18.8    
200805161844_0004   4       nt_input        272     337.7   79      18.9