You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Lin <ji...@umd.edu> on 2009/08/31 15:21:53 UTC
Talk in DC area: MapReduce and Parallel DBMSs: A Comparison of Approaches
to Large-Scale Data Analysis
Dear Hadoopers,
For those of you in the DC area, you might be interested in this talk at
the University of Maryland this week...
Best,
Jimmy
------
MapReduce and Parallel DBMSs: A Comparison of Approaches to Large-Scale
Data Analysis
Complete info at http://tinyurl.com/knh83k
Andy Pavlo (Brown University)
(http://www.cs.brown.edu/~pavlo/)
Thursday, September 3, 2009
4pm, AVW 3258
(Directions: http://www.umiacs.umd.edu/about/directions.htm)
= Abstract
The MapReduce (MR) paradigm has been heralded as a revolutionary new
platform for large-scale, massively parallel data access. Some
proponents claim that the extreme scalability of MR will relegate
relational database management systems (DBMS) to the status legacy
technology. In this talk, however, we discuss the results from our
recent benchmark study from that suggest that using MR systems to
perform tasks that are best suited for DBMSs yields less than
satisfactory results. This leads us to conclude that MR is more akin to
an Extract-Transform-Load (ETL) system than a DBMS, as it is quickly
able to load and analyze large amounts of data in an ad hoc manner. As
such, it is complementary to DBMS technology, rather than a competitor.
We also discuss the various differences in the architectural decisions
of MR systems and database systems, and provide insight on how the two
systems should complement one another.
= About the Speaker
Andrew Pavlo is a third year Computer Science PhD student at Brown
University's Data Management Group under the guidance of Dr. Stanley Zdonik.