You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Bryan Duxbury <br...@rapleaf.com> on 2009/04/14 17:52:27 UTC

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

I thought it a conspicuous omission to not discuss the cost of  
various approaches. Hadoop is free, though you have to spend  
developer time; how much does Vertica cost on 100 nodes?

-Bryan

On Apr 14, 2009, at 7:16 AM, Guilherme Germoglio wrote:

> (Hadoop is used in the benchmarks)
>
> http://database.cs.brown.edu/sigmod09/
>
> There is currently considerable enthusiasm around the MapReduce
> (MR) paradigm for large-scale data analysis [17]. Although the
> basic control ﬂow of this framework has existed in parallel SQL
> database management systems (DBMS) for over 20 years, some
> have called MR a dramatically new computing model [8, 17]. In
> this paper, we describe and compare both paradigms. Furthermore,
> we evaluate both kinds of systems in terms of performance and de-
> velopment complexity. To this end, we deﬁne a benchmark con-
> sisting of a collection of tasks that we have run on an open source
> version of MR as well as on two parallel DBMSs. For each task,
> we measure each system’s performance for various degrees of par-
> allelism on a cluster of 100 nodes. Our results reveal some inter-
> esting trade-offs. Although the process to load data into and tune
> the execution of parallel DBMSs took much longer than the MR
> system, the observed performance of these DBMSs was strikingly
> better. We speculate about the causes of the dramatic performance
> difference and consider implementation concepts that future sys-
> tems should take from both kinds of architectures.
>
>
> -- 
> Guilherme
>
> msn: guigermoglio@hotmail.com
> homepage: http://germoglio.googlepages.com