You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Tom White <to...@gmail.com> on 2008/11/22 10:26:25 UTC
Google Terasort Benchmark
>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html
"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."
Something for the Hadoop community to aim for: a threefold performance increase.
Tom
Re: Google Terasort Benchmark
Posted by Patrick McCormack <pn...@yahoo.com>.
I reckon it's all about spindles - I took a quick look at the pretty detailed hardware config that Owen released with the Hadoop benchmark and it was run on nodes with 4 Sata drives - the Google blog hints at 12 disks per node (the number of disks/nodes was only given for their 1Pb experiement). Google got 3 times performance increase with 3 times the number of disks.
Patrick.
________________________________
From: Tom White <to...@gmail.com>
To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark
>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html
"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."
Something for the Hadoop community to aim for: a threefold performance increase.
Tom
Re: Google Terasort Benchmark
Posted by Patrick McCormack <pn...@yahoo.com>.
I reckon it's all about spindles - I took a quick look at the pretty detailed hardware config that Owen released with the Hadoop benchmark and it was run on nodes with 4 Sata drives - the Google blog hints at 12 disks per node (the number of disks/nodes was only given for their 1Pb experiement). Google got 3 times performance increase with 3 times the number of disks.
Patrick.
________________________________
From: Tom White <to...@gmail.com>
To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark
>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html
"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."
Something for the Hadoop community to aim for: a threefold performance increase.
Tom