You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Arun C Murthy <ac...@yahoo-inc.com> on 2010/08/02 20:47:12 UTC

Re: Data World Record Falls as Computer Scientists Break Terabyte Sort Barrier

The UCSD results are very impressive, especially given their hardware  
budget.

I may be wrong, but I'm pretty sure there were no Hadoop based entries  
this year - I know we at Yahoo! didn't enter.

Couple of points:
# The Indy category is a benchmark to sort fixed length records, not a  
_general_ sort benchmark i.e. Daytona.
# Our _best_ result missed the deadline by a whisker last year, but we  
eventually did 100Tb sort in 95 mins and a 1000TB (1PB) in 975 mins  
(16.25 hrs) - which worked out to be just over 1.0 TB/min, which was  
nearly twice as fast as the record attributed to us. (http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html 
)

Arun

On Aug 2, 2010, at 10:34 AM, Abhishek Verma wrote:

> Hi Maxim,
>
> Hadoop was not involved. You can find more details here :
> http://sortbenchmark.org/tritonsort_2010_May_15.pdf
> and all the records and their information here : http://sortbenchmark.org/
>
> <http://sortbenchmark.org/>-Abhishek.
>
> On Mon, Aug 2, 2010 at 9:52 AM, Maxim Veksler <ma...@vekslers.org>  
> wrote:
>
>> Hi,
>>
>> Anyone knows if Hadoop is involved? And if so what is the  
>> configuration for
>> such cluster?
>>
>> http://ucsdnews.ucsd.edu/newsrel/science/07-27DataWorld.asp
>>
>>
>> Thank you,
>> Maxim.
>>

Re: Data World Record Falls as Computer Scientists Break Terabyte Sort Barrier

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

You might recognize the name of the person who filed HDFS-347 from the press
release about the new sorting world record...

On Tue, Aug 3, 2010 at 8:02 PM, Ted Yu <yu...@gmail.com> wrote:

> Maybe this would give us a little more incentive to resolve HDFS-347
>
> Cheers
>
> On Mon, Aug 2, 2010 at 1:50 PM, Abhishek Verma <vermaabhishekp@gmail.com
> >wrote:
>
> > It shows how further behind Hadoop is in terms of performance. Are there
> > people working on finding the bottlenecks and making it more efficient?
> Are
> > there any JIRA issues related to this?
> >
> > -Abhishek.
> >
> > On Mon, Aug 2, 2010 at 11:47 AM, Arun C Murthy <ac...@yahoo-inc.com>
> wrote:
> >
> > > The UCSD results are very impressive, especially given their hardware
> > > budget.
> > >
> > > I may be wrong, but I'm pretty sure there were no Hadoop based entries
> > this
> > > year - I know we at Yahoo! didn't enter.
> > >
> > > Couple of points:
> > > # The Indy category is a benchmark to sort fixed length records, not a
> > > _general_ sort benchmark i.e. Daytona.
> > > # Our _best_ result missed the deadline by a whisker last year, but we
> > > eventually did 100Tb sort in 95 mins and a 1000TB (1PB) in 975 mins
> > (16.25
> > > hrs) - which worked out to be just over 1.0 TB/min, which was nearly
> > twice
> > > as fast as the record attributed to us. (
> > >
> >
> http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html
> > > )
> > >
> > > Arun
> > >
> > >
> > > On Aug 2, 2010, at 10:34 AM, Abhishek Verma wrote:
> > >
> > >  Hi Maxim,
> > >>
> > >> Hadoop was not involved. You can find more details here :
> > >> http://sortbenchmark.org/tritonsort_2010_May_15.pdf
> > >> and all the records and their information here :
> > >> http://sortbenchmark.org/
> > >>
> > >> <http://sortbenchmark.org/>-Abhishek.
> > >>
> > >> On Mon, Aug 2, 2010 at 9:52 AM, Maxim Veksler <ma...@vekslers.org>
> > wrote:
> > >>
> > >>  Hi,
> > >>>
> > >>> Anyone knows if Hadoop is involved? And if so what is the
> configuration
> > >>> for
> > >>> such cluster?
> > >>>
> > >>> http://ucsdnews.ucsd.edu/newsrel/science/07-27DataWorld.asp
> > >>>
> > >>>
> > >>> Thank you,
> > >>> Maxim.
> > >>>
> > >>>
> > >
> >
>

Re: Data World Record Falls as Computer Scientists Break Terabyte Sort Barrier

Posted by Ted Yu <yu...@gmail.com>.

Maybe this would give us a little more incentive to resolve HDFS-347

Cheers

On Mon, Aug 2, 2010 at 1:50 PM, Abhishek Verma <ve...@gmail.com>wrote:

> It shows how further behind Hadoop is in terms of performance. Are there
> people working on finding the bottlenecks and making it more efficient? Are
> there any JIRA issues related to this?
>
> -Abhishek.
>
> On Mon, Aug 2, 2010 at 11:47 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
> > The UCSD results are very impressive, especially given their hardware
> > budget.
> >
> > I may be wrong, but I'm pretty sure there were no Hadoop based entries
> this
> > year - I know we at Yahoo! didn't enter.
> >
> > Couple of points:
> > # The Indy category is a benchmark to sort fixed length records, not a
> > _general_ sort benchmark i.e. Daytona.
> > # Our _best_ result missed the deadline by a whisker last year, but we
> > eventually did 100Tb sort in 95 mins and a 1000TB (1PB) in 975 mins
> (16.25
> > hrs) - which worked out to be just over 1.0 TB/min, which was nearly
> twice
> > as fast as the record attributed to us. (
> >
> http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html
> > )
> >
> > Arun
> >
> >
> > On Aug 2, 2010, at 10:34 AM, Abhishek Verma wrote:
> >
> >  Hi Maxim,
> >>
> >> Hadoop was not involved. You can find more details here :
> >> http://sortbenchmark.org/tritonsort_2010_May_15.pdf
> >> and all the records and their information here :
> >> http://sortbenchmark.org/
> >>
> >> <http://sortbenchmark.org/>-Abhishek.
> >>
> >> On Mon, Aug 2, 2010 at 9:52 AM, Maxim Veksler <ma...@vekslers.org>
> wrote:
> >>
> >>  Hi,
> >>>
> >>> Anyone knows if Hadoop is involved? And if so what is the configuration
> >>> for
> >>> such cluster?
> >>>
> >>> http://ucsdnews.ucsd.edu/newsrel/science/07-27DataWorld.asp
> >>>
> >>>
> >>> Thank you,
> >>> Maxim.
> >>>
> >>>
> >
>

Re: Data World Record Falls as Computer Scientists Break Terabyte Sort Barrier

Posted by Abhishek Verma <ve...@gmail.com>.

It shows how further behind Hadoop is in terms of performance. Are there
people working on finding the bottlenecks and making it more efficient? Are
there any JIRA issues related to this?

-Abhishek.

On Mon, Aug 2, 2010 at 11:47 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

> The UCSD results are very impressive, especially given their hardware
> budget.
>
> I may be wrong, but I'm pretty sure there were no Hadoop based entries this
> year - I know we at Yahoo! didn't enter.
>
> Couple of points:
> # The Indy category is a benchmark to sort fixed length records, not a
> _general_ sort benchmark i.e. Daytona.
> # Our _best_ result missed the deadline by a whisker last year, but we
> eventually did 100Tb sort in 95 mins and a 1000TB (1PB) in 975 mins (16.25
> hrs) - which worked out to be just over 1.0 TB/min, which was nearly twice
> as fast as the record attributed to us. (
> http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html
> )
>
> Arun
>
>
> On Aug 2, 2010, at 10:34 AM, Abhishek Verma wrote:
>
>  Hi Maxim,
>>
>> Hadoop was not involved. You can find more details here :
>> http://sortbenchmark.org/tritonsort_2010_May_15.pdf
>> and all the records and their information here :
>> http://sortbenchmark.org/
>>
>> <http://sortbenchmark.org/>-Abhishek.
>>
>> On Mon, Aug 2, 2010 at 9:52 AM, Maxim Veksler <ma...@vekslers.org> wrote:
>>
>>  Hi,
>>>
>>> Anyone knows if Hadoop is involved? And if so what is the configuration
>>> for
>>> such cluster?
>>>
>>> http://ucsdnews.ucsd.edu/newsrel/science/07-27DataWorld.asp
>>>
>>>
>>> Thank you,
>>> Maxim.
>>>
>>>
>