You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pedro Vivancos <pe...@vocali.net> on 2009/02/27 19:51:49 UTC

How to improve my map & reduce application

Dear friends,

I am new at Hadoop and I must say I just want to use it as a map & reduce
framework.

I've developed an application to be run in a server with 8 CPU and
everything seems to work properly but the performance. It doesn't use all
the CPU power.

I'm trying to process 200.000 documents and get some annotations of each
document (first and last names in the mapper) and merge it in the reduce
task (if I find a first name and a last name together => a name).

I've developed my own record reader because I want to get the URI of each
document I process. So for that record reader I have the URI as key and the
content as value. Here is the most import method (in my opinion):

I also must say that I'm not running the applications by using the
bin/hadoop script but using java command directly because I wasn't able to
do it.

So could you help me to use all the power of my CPU?

Thanks in advance.
Pedro