You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by "benjamin.cotton@lehman.com" <Be...@lehman.com> on 2009/09/04 18:16:56 UTC
drivers to bridge familiar SQL queries to Hadoop MapReduce internals?
I am brand new to Hadoop and have a very newbie question: Is it a
Hadoop community priority to build drivers (or layers of drivers) that
will help bridge simple, familiar SQL queries to Hadoop MapReduce
internals - liberating the application query developer from having to
necessarily learn Hadoop-specific technologies, APIs, and tactics?
E.g. in the "Hadoop - The Definitive Guide" initial example, I would
like to STILL just be able to write
Select avg(weatherStationTable.airTemp), max(weatherStationTable.airTemp)
from weatherStationTable
group by weatherStationTable.year
and depend on some Driver (or layer of Drivers) to bridge that familiar
SQL relational query to a Hadoop MapReduce job that is deployed across
the HDFS (or other Hadoop-specific data hostng layer) to execute in
Hadoop and return my result.
is the notion of this potential capability off-the mark re: current
Hadoop community development priorities?
Re: drivers to bridge familiar SQL queries to Hadoop MapReduce
internals?
Posted by Philip Zeyliger <ph...@cloudera.com>.
Hi Benjamin,
This is actually very much on the mark.
Take a look at the Hive project -- http://hadoop.apache.org/hive/ ,
also video at http://www.cloudera.com/hadoop-training-hive-introduction.
Hive is a SQL-like interface developed initially at Facebook for
exactly that. Pig is also working on something similar -- see
http://issues.apache.org/jira/browse/PIG-824.
Cheers,
-- Philip
On Fri, Sep 4, 2009 at 9:16 AM,
benjamin.cotton@lehman.com<Be...@lehman.com> wrote:
>
> I am brand new to Hadoop and have a very newbie question: Is it a Hadoop
> community priority to build drivers (or layers of drivers) that will help
> bridge simple, familiar SQL queries to Hadoop MapReduce internals -
> liberating the application query developer from having to necessarily learn
> Hadoop-specific technologies, APIs, and tactics?
>
> E.g. in the "Hadoop - The Definitive Guide" initial example, I would like
> to STILL just be able to write
>
> Select avg(weatherStationTable.airTemp), max(weatherStationTable.airTemp)
> from weatherStationTable
> group by weatherStationTable.year
>
> and depend on some Driver (or layer of Drivers) to bridge that familiar SQL
> relational query to a Hadoop MapReduce job that is deployed across the HDFS
> (or other Hadoop-specific data hostng layer) to execute in Hadoop and
> return my result.
>
> is the notion of this potential capability off-the mark re: current Hadoop
> community development priorities?
>