You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by John Clarke <cl...@gmail.com> on 2009/09/10 18:07:38 UTC

MapRunnable app questions

Hi there,

I have three questions:

1) I have written a MapReduce app that implements MapRunnable as we needed
to be able to control the threads and share information between them.

What number of map tasks should I specify in my conf file? Should it be the
same as the number of nodes?



2) When we run this application the % complete for the map tasks seems to be
tied to the inputs read. As such, it can report 100% complete even if a
MapRunnable is still chewing through it's threads. Is this normal?



3) Currently we have a simple Reducer that aggregates some totals and
outputs both the totals and the results from each MapRunnable thread. The
problem is, if the job were to fail for some reason no output is available
unless at least one MapRunnable task is complete and the Reducer has started
collecting and outputting to the dfs.

If we were to have no Reducers will the output from the MapRunnable be
output to the dfs straight away? Of course we would lose the ability to
aggregate totals but we could calculate these from the MapRunnable's output.


Thanks,
John