You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Brandon <bm...@upstreamsoftware.com> on 2012/10/01 19:53:32 UTC
Reduce Copy Speed
What speed do people typically see for the copy during a reduce?
>From tasktracker here is an average on:
reduce > copy (500 of 504 at 1.52 MB/s) >
We have seen it range from .5 to 4 MB/s.
That seems a bit slow.
Does anyone else have other benchmark numbers to share?
Re: Reduce Copy Speed
Posted by Harsh J <ha...@cloudera.com>.
Hi Brandon,
On Mon, Oct 1, 2012 at 11:23 PM, Brandon <bm...@upstreamsoftware.com> wrote:
> What speed do people typically see for the copy during a reduce?
It varies due to a few factors. But there's highly improved
netty-based transfers in Hadoop 2.x that you can use for even faster,
and more reliable transfers.
> From tasktracker here is an average on:
> reduce > copy (500 of 504 at 1.52 MB/s) >
>
> We have seen it range from .5 to 4 MB/s.
> That seems a bit slow.
Slow compared to what exactly? How many concurrent reducers fetch at
the same time from a single machine? And what is your slowstart
threshold, which would dictate that the reducers wait for many more
maps than 5% to finish before beginning to pull data from other
tasktrackers - leading to continuous large transfers rather than
small, frequent transfers of a few tasks at a time - saving resources
and improving speed.
> Does anyone else have other benchmark numbers to share?
>
--
Harsh J
Re: Reduce Copy Speed
Posted by Harsh J <ha...@cloudera.com>.
Hi Brandon,
On Mon, Oct 1, 2012 at 11:23 PM, Brandon <bm...@upstreamsoftware.com> wrote:
> What speed do people typically see for the copy during a reduce?
It varies due to a few factors. But there's highly improved
netty-based transfers in Hadoop 2.x that you can use for even faster,
and more reliable transfers.
> From tasktracker here is an average on:
> reduce > copy (500 of 504 at 1.52 MB/s) >
>
> We have seen it range from .5 to 4 MB/s.
> That seems a bit slow.
Slow compared to what exactly? How many concurrent reducers fetch at
the same time from a single machine? And what is your slowstart
threshold, which would dictate that the reducers wait for many more
maps than 5% to finish before beginning to pull data from other
tasktrackers - leading to continuous large transfers rather than
small, frequent transfers of a few tasks at a time - saving resources
and improving speed.
> Does anyone else have other benchmark numbers to share?
>
--
Harsh J
Re: Reduce Copy Speed
Posted by Harsh J <ha...@cloudera.com>.
Hi Brandon,
On Mon, Oct 1, 2012 at 11:23 PM, Brandon <bm...@upstreamsoftware.com> wrote:
> What speed do people typically see for the copy during a reduce?
It varies due to a few factors. But there's highly improved
netty-based transfers in Hadoop 2.x that you can use for even faster,
and more reliable transfers.
> From tasktracker here is an average on:
> reduce > copy (500 of 504 at 1.52 MB/s) >
>
> We have seen it range from .5 to 4 MB/s.
> That seems a bit slow.
Slow compared to what exactly? How many concurrent reducers fetch at
the same time from a single machine? And what is your slowstart
threshold, which would dictate that the reducers wait for many more
maps than 5% to finish before beginning to pull data from other
tasktrackers - leading to continuous large transfers rather than
small, frequent transfers of a few tasks at a time - saving resources
and improving speed.
> Does anyone else have other benchmark numbers to share?
>
--
Harsh J
Re: Reduce Copy Speed
Posted by Harsh J <ha...@cloudera.com>.
Hi Brandon,
On Mon, Oct 1, 2012 at 11:23 PM, Brandon <bm...@upstreamsoftware.com> wrote:
> What speed do people typically see for the copy during a reduce?
It varies due to a few factors. But there's highly improved
netty-based transfers in Hadoop 2.x that you can use for even faster,
and more reliable transfers.
> From tasktracker here is an average on:
> reduce > copy (500 of 504 at 1.52 MB/s) >
>
> We have seen it range from .5 to 4 MB/s.
> That seems a bit slow.
Slow compared to what exactly? How many concurrent reducers fetch at
the same time from a single machine? And what is your slowstart
threshold, which would dictate that the reducers wait for many more
maps than 5% to finish before beginning to pull data from other
tasktrackers - leading to continuous large transfers rather than
small, frequent transfers of a few tasks at a time - saving resources
and improving speed.
> Does anyone else have other benchmark numbers to share?
>
--
Harsh J