You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Kunsheng Chen <ke...@yahoo.com> on 2009/11/30 22:16:26 UTC

Scaling inference on Hadoop DFS

Hi everyone,


Currently I got a MapReduce program that soring input records and Map-Reduce them to output records with priority information for each of them. So far the program is running on 1 mainnode and 3 datanodes. 

And I got data something like following:

--------------------------------------
number of records:   1000000 records
time to process:     100 seconds
input bytes :         20MB
number of datanodes:   3
-------------------------------------


I am wondering if I could make some assumption like "giving me 2000000 records" and the program could finish that in "200 seconds" ?

Just any kind of feasibility of scability will be helpful, as it is important to my analysis on the master thesis.

Any idea is well appreciated!

Thanks,

-Kun