You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kunsheng Chen <ke...@yahoo.com> on 2009/11/30 22:16:26 UTC
Scaling inference on Hadoop DFS
Hi everyone,
Currently I got a MapReduce program that soring input records and Map-Reduce them to output records with priority information for each of them. So far the program is running on 1 mainnode and 3 datanodes.
And I got data something like following:
--------------------------------------
number of records: 1000000 records
time to process: 100 seconds
input bytes : 20MB
number of datanodes: 3
-------------------------------------
I am wondering if I could make some assumption like "giving me 2000000 records" and the program could finish that in "200 seconds" ?
Just any kind of feasibility of scability will be helpful, as it is important to my analysis on the master thesis.
Any idea is well appreciated!
Thanks,
-Kun