You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2007/12/04 20:55:27 UTC

Question about reduce copy speed.

Our machines are on a GigE switch. With SCP for compressed data, I see 
50MB/sec transfer rates.

When my reduce is running, on the status page I see the following for 
the incomplete reduce's

reduce > copy (643 of 789 at 0.12 MB/s) >
reduce > copy (656 of 789 at 0.12 MB/s) >
reduce > copy (644 of 789 at 0.12 MB/s) >
reduce > copy (644 of 789 at 0.12 MB/s) >
reduce > copy (656 of 789 at 0.12 MB/s) >
reduce > copy (656 of 789 at 0.12 MB/s) >
reduce > copy (643 of 789 at 0.12 MB/s) >
reduce > copy (623 of 789 at 0.12 MB/s) >
reduce > copy (621 of 789 at 0.12 MB/s) >

Is that the actual transfer rate between machines, or is that a 
misleading number?
We are running with replication set to 3.


Re: Question about reduce copy speed.

Posted by Doug Cutting <cu...@apache.org>.
Jason Venner wrote:
> When my reduce is running, on the status page I see the following for 
> the incomplete reduce's
> 
> reduce > copy (643 of 789 at 0.12 MB/s) >

Reducers cannot copy any faster than mappers can generate output.  When 
all maps are complete, how long does it take before copying is complete? 
  If that delay is small, then copying is keeping up with map output.

> Is that the actual transfer rate between machines, or is that a 
> misleading number?

It's the rate that a given reduce task is able to get output.  If you're 
running multiple reduce tasks per node, then that node's rate will be 
higher.  As mentioned above, it's limited by the rate that maps generate 
output.  And copying competes with map input for disk and network bandwidth.

Doug