You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jimmy Lin <ji...@umd.edu> on 2009/08/31 15:21:53 UTC

Talk in DC area: MapReduce and Parallel DBMSs: A Comparison of Approaches to Large-Scale Data Analysis

Dear Hadoopers,

For those of you in the DC area, you might be interested in this talk at 
the University of Maryland this week...

Best,
Jimmy

------

MapReduce and Parallel DBMSs: A Comparison of Approaches to Large-Scale 
Data Analysis

Complete info at http://tinyurl.com/knh83k

Andy Pavlo (Brown University)
(http://www.cs.brown.edu/~pavlo/)

Thursday, September 3, 2009
4pm, AVW 3258
(Directions: http://www.umiacs.umd.edu/about/directions.htm)

= Abstract

The MapReduce (MR) paradigm has been heralded as a revolutionary new 
platform for large-scale, massively parallel data access. Some 
proponents claim that the extreme scalability of MR will relegate 
relational database management systems (DBMS) to the status legacy 
technology. In this talk, however, we discuss the results from our 
recent benchmark study from that suggest that using MR systems to 
perform tasks that are best suited for DBMSs yields less than 
satisfactory results. This leads us to conclude that MR is more akin to 
an Extract-Transform-Load (ETL) system than a DBMS, as it is quickly 
able to load and analyze large amounts of data in an ad hoc manner. As 
such, it is complementary to DBMS technology, rather than a competitor. 
We also discuss the various differences in the architectural decisions 
of MR systems and database systems, and provide insight on how the two 
systems should complement one another.

= About the Speaker

Andrew Pavlo is a third year Computer Science PhD student at Brown 
University's Data Management Group under the guidance of Dr. Stanley Zdonik.