You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/21 19:29:06 UTC

[jira] Resolved: (HADOOP-72) hadoop doesn't take advatage of distributed compiting in TestDFSIO

     [ http://issues.apache.org/jira/browse/HADOOP-72?page=all ]
     
Doug Cutting resolved HADOOP-72:
--------------------------------

    Fix Version: 0.2
     Resolution: Won't Fix

This was caused by a misunderstanding.

> hadoop doesn't take advatage of distributed compiting in TestDFSIO
> ------------------------------------------------------------------
>
>          Key: HADOOP-72
>          URL: http://issues.apache.org/jira/browse/HADOOP-72
>      Project: Hadoop
>         Type: Test

>   Components: dfs, fs, mapred
>  Environment: 200 node cluster
>     Reporter: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: TestDFSIO.java, TestDFSIO_results.log, TestDFSIO_results_200_node_cluster.log, TestDFSIO_results_sequential.log
>
> TestDFSIO runs N map jobs, each either writing to or reading from a separate file of the same size, 
> and collects statistical information on its performance. 
> The reducer further calculates the overall statistics for all maps. 
> It outputs the following data:
> - read or write test
> - date and time the test finished   
> - number of files
> - total number of bytes processed
> - overall throughput in mb/sec
> - average IO rate in mb/sec per file
> __Results__
> I run 7 iterations of the test one after another on a cluster of ~200 nodes. 
> The file size is the same in all cases 320Mb. 
> The number of files tried is 1,2,4,8,16,32,64.
> The log file with statistics is attached.
> It looks like we don't have any distributed computing here at all.
> The total execution time increases proportionally to the total size of data both for writes and reads.
> Another thing is that the io ratio for read is higher than the write rate just gradually.
> For comparison I attach time measuring for the same ios performed on the same cluster but sequentially in a simple loop.
> This is the summary:
> Files	map/red time	sequential time
>  1		49			  34 
>  2		86			  69
>  4		158			131
>  8		299			266
> 16		569			532
> 32		1131
> 64		2218
> This doesn't look good, unless there is something wrong with my test (attached) or the cluster settings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira