You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by preethi ganeshan <pr...@gmail.com> on 2013/03/23 19:30:58 UTC

the part of the intermediate output fed to a reducer

Hey all,
I am working on project that schedules data local reduce tasks. However , i
wanted to know if there is a way using MapTask.java to keep track of the
inputs and size of the input to every reducer. In other words what code do
i add to get the size of the intermediate output that is fed to a reduce
task before a reduce task begins.

Thank you in advance.

Re: the part of the intermediate output fed to a reducer

Posted by Harsh J <ha...@cloudera.com>.

Hi,

On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan
<pr...@gmail.com> wrote:
> Hey all,
> I am working on project that schedules data local reduce tasks.

Great, are you planning to contribute it upstream too? See
https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping
you're working on trunk and not the maintenance branch branch-1, which
is very outdated with where MR is today.

> However , i wanted to know if there is a way using MapTask.java to keep track of the
> inputs and size of the input to every reducer. In other words what code do
> i add to get the size of the intermediate output that is fed to a reduce
> task before a reduce task begins.

Change the thinking here a bit: A map does not feed a reduce (i.e. its
not a push). A reduce consumes a map output after its completion (they
map task JVM may terminate for all it cares). Upon a map's completion,
its counters are available at the central (i.e. the ApplicationMaster)
which the reduce task can poll for sizes (it may already be doing
this).

--
Harsh J