You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Tom White <to...@gmail.com> on 2008/11/22 10:26:25 UTC

Google Terasort Benchmark

>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html

"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."

Something for the Hadoop community to aim for: a threefold performance increase.

Tom

Re: Google Terasort Benchmark

Posted by Patrick McCormack <pn...@yahoo.com>.
I reckon it's all about spindles - I took a quick look at the pretty detailed hardware config that Owen released with the Hadoop benchmark and it was run on nodes with 4 Sata drives - the Google blog hints at 12 disks per node (the number of disks/nodes was only given for their 1Pb experiement). Google got 3 times performance increase with 3 times the number of disks. 

Patrick.




________________________________
From: Tom White <to...@gmail.com>
To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark

>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html

"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."

Something for the Hadoop community to aim for: a threefold performance increase.

Tom



      

Re: Google Terasort Benchmark

Posted by Patrick McCormack <pn...@yahoo.com>.
I reckon it's all about spindles - I took a quick look at the pretty detailed hardware config that Owen released with the Hadoop benchmark and it was run on nodes with 4 Sata drives - the Google blog hints at 12 disks per node (the number of disks/nodes was only given for their 1Pb experiement). Google got 3 times performance increase with 3 times the number of disks. 

Patrick.




________________________________
From: Tom White <to...@gmail.com>
To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark

>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html

"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."

Something for the Hadoop community to aim for: a threefold performance increase.

Tom