You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by Olivier Grisel <ol...@ensta.org> on 2011/01/14 00:34:48 UTC

EC2 vs RackSpace for CPU bound and IO bound MapReduce jobs?

Hi all,

Has anyone tried to compare the speed / $ ratio for CPU bounds and IO
Hadoop MapReduce jobs?

I have the impression that IO on EC2 is not very good: the duration of
"distcp" command from and to S3 to and from HDFS over local disk on
the same amount of data can vary a lot from one run to another. Is it
the case on the rackspace cloud?

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: EC2 vs RackSpace for CPU bound and IO bound MapReduce jobs?

Posted by Tom White <to...@gmail.com>.
Hi Olivier,

This would be an interesting experiment to run! There are benchmarks
in Whirr that make this straightforward, see
https://cwiki.apache.org/confluence/display/WHIRR/Running+Benchmarks.

Cheers,
Tom

On Thu, Jan 13, 2011 at 3:34 PM, Olivier Grisel
<ol...@ensta.org> wrote:
> Hi all,
>
> Has anyone tried to compare the speed / $ ratio for CPU bounds and IO
> Hadoop MapReduce jobs?
>
> I have the impression that IO on EC2 is not very good: the duration of
> "distcp" command from and to S3 to and from HDFS over local disk on
> the same amount of data can vary a lot from one run to another. Is it
> the case on the rackspace cloud?
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>