You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by "benjamin.cotton@lehman.com" <Be...@lehman.com> on 2009/09/04 18:16:56 UTC

drivers to bridge familiar SQL queries to Hadoop MapReduce internals?

I am brand new to Hadoop and have a very newbie question:  Is it a 
Hadoop community priority to  build drivers (or layers of drivers) that 
will help bridge simple, familiar SQL queries to Hadoop MapReduce 
internals  - liberating the application query developer from having to 
necessarily learn Hadoop-specific technologies, APIs, and tactics?

E.g. in   the "Hadoop - The Definitive Guide" initial example, I would 
like to STILL just be able to write

Select avg(weatherStationTable.airTemp), max(weatherStationTable.airTemp)
from   weatherStationTable
group by  weatherStationTable.year

and depend on some Driver (or layer of Drivers) to bridge that familiar 
SQL relational query to a Hadoop MapReduce job that is deployed across 
the HDFS (or other  Hadoop-specific data hostng layer) to  execute in 
Hadoop and return my result.

 is the notion of this potential capability off-the mark re: current 
Hadoop community development priorities?

Re: drivers to bridge familiar SQL queries to Hadoop MapReduce internals?

Posted by Philip Zeyliger <ph...@cloudera.com>.

Hi Benjamin,

This is actually very much on the mark.

Take a look at the Hive project -- http://hadoop.apache.org/hive/ ,
also video at http://www.cloudera.com/hadoop-training-hive-introduction.
 Hive is a SQL-like interface developed initially at Facebook for
exactly that.  Pig is also working on something similar -- see
http://issues.apache.org/jira/browse/PIG-824.

Cheers,

-- Philip



On Fri, Sep 4, 2009 at 9:16 AM,
benjamin.cotton@lehman.com<Be...@lehman.com> wrote:
>
> I am brand new to Hadoop and have a very newbie question:  Is it a Hadoop
> community priority to  build drivers (or layers of drivers) that will help
> bridge simple, familiar SQL queries to Hadoop MapReduce internals  -
> liberating the application query developer from having to necessarily learn
> Hadoop-specific technologies, APIs, and tactics?
>
> E.g. in   the "Hadoop - The Definitive Guide" initial example, I would like
> to STILL just be able to write
>
> Select avg(weatherStationTable.airTemp), max(weatherStationTable.airTemp)
> from   weatherStationTable
> group by  weatherStationTable.year
>
> and depend on some Driver (or layer of Drivers) to bridge that familiar SQL
> relational query to a Hadoop MapReduce job that is deployed across the HDFS
> (or other  Hadoop-specific data hostng layer) to  execute in Hadoop and
> return my result.
>
> is the notion of this potential capability off-the mark re: current Hadoop
> community development priorities?
>