You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ashish Thusoo <at...@facebook.com> on 2008/07/09 09:47:27 UTC

FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

Hi Folks,

We recently opened up a JIRA in order to bring Hive into the open source
fold with the aim of contributing back to hadoop - which has really made
large scale data processing so much easier for us at Facebook. We have
also uploaded a small tutorial as part of that JIRA that gives a flavor
of what kind of capabilities the system has. We would love to get
feedback on this, so please check out the described functionality and
post any comments, criticisms, wish lists etc. on the JIRA at

https://issues.apache.org/jira/browse/HADOOP-3601

We are planning on an initial release of hive as a contrib project in
0.19 version of hadoop and are really excited about the open source
possibilities that it can enable, specially in the data warehousing/ETL
space. So please stay tunned to the JIRA for future updates on Hive.

Thanks,
Ashish for Hive@Facebook

-----Original Message-----
From: Ashish Thusoo (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, July 08, 2008 4:15 PM
To: Ashish Thusoo
Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project


     [
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jir
a.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-3601:
----------------------------------

    Attachment: HiveTutorial.pdf

Tutorial on the capabilities of Hive. This is a pdf of internal
documentation and contains query, dml and ddl examples as well as the
overview of the system. A formal language spec, architecture documents
and roadmaps will follow. This document gives the initial preview of the
system and hopefully will seed a lot of interesting discussion/questions
etc. around this system.

> Hive as a contrib project
> -------------------------
>
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily
in HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables Hive's 
> query language is executed using Hadoop map-reduce as the execution
engine. Queries can use either single stage or multi-stage map-reduce.
Hive has a native format for tables - but can handle any data set (for
example json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression
evaluation and may use Apache Derby as an embedded database for
MetaStore. Antlr has a BSD license and should be compatible with Apache
license.
> We are currently thinking of contributing to the 0.17 branch as a
contrib project (since that is the version under which it will get
tested internally) - but looking for advice on the best release path.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

Posted by tim robertson <ti...@gmail.com>.

Thanks Ashish, I am happy to build and try and run from svn/cvs and just try
loading in data, querying etc whenever you have something.

Cheers

Tim

On Wed, Jul 9, 2008 at 8:46 PM, Ashish Thusoo <at...@facebook.com> wrote:

> Hi Tim,
>
> Point well taken. We are trying to get this out as soon as possible.
> Thanks for the offer for helping us test this things out. We will get
> something out to you (an early version) as soon as we have a logical
> feature checkpoint.
>
> Cheers,
> Ashish
>
> -----Original Message-----
> From: tim robertson [mailto:timrobertson100@gmail.com]
> Sent: Wednesday, July 09, 2008 1:25 AM
> To: core-user@hadoop.apache.org
> Subject: Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project
>
> Hi Ashish
>
> I am very excited to try this, having been evaluating Hadoop, HBase,
> Cascading etc recently to process 100 millions of Biodiversity records
> (expecting billions soon), with a view for data mining purposes (species
> that are critically endangered and observed outside of protected areas
> within the last 2 years).  All open access to Biodiversity information.
> It is difficult to comment on the paper, as it looks to offer pretty
> much most of what I am looking for, but without running it, it's
> difficult...
>
> If you would like a tester, I would happily fill this role and offer
> sample code and input files which could go into "getting started" guides
> on wiki etc.
>
> Cheers,
>
> Tim
>
>
>
>
>
> On Wed, Jul 9, 2008 at 9:47 AM, Ashish Thusoo <at...@facebook.com>
> wrote:
>
> > Hi Folks,
> >
> > We recently opened up a JIRA in order to bring Hive into the open
> > source fold with the aim of contributing back to hadoop - which has
> > really made large scale data processing so much easier for us at
> > Facebook. We have also uploaded a small tutorial as part of that JIRA
> > that gives a flavor of what kind of capabilities the system has. We
> > would love to get feedback on this, so please check out the described
> > functionality and post any comments, criticisms, wish lists etc. on
> > the JIRA at
> >
> > https://issues.apache.org/jira/browse/HADOOP-3601
> >
> > We are planning on an initial release of hive as a contrib project in
> > 0.19 version of hadoop and are really excited about the open source
> > possibilities that it can enable, specially in the data
> > warehousing/ETL space. So please stay tunned to the JIRA for future
> updates on Hive.
> >
> > Thanks,
> > Ashish for Hive@Facebook
> >
> > -----Original Message-----
> > From: Ashish Thusoo (JIRA) [mailto:jira@apache.org]
> > Sent: Tuesday, July 08, 2008 4:15 PM
> > To: Ashish Thusoo
> > Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project
> >
> >
> >     [
> > https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.j
>  > ir a.plugin.system.issuetabpanels:all-tabpanel ]
> >
> > Ashish Thusoo updated HADOOP-3601:
> > ----------------------------------
> >
> >    Attachment: HiveTutorial.pdf
> >
> > Tutorial on the capabilities of Hive. This is a pdf of internal
> > documentation and contains query, dml and ddl examples as well as the
> > overview of the system. A formal language spec, architecture documents
>
> > and roadmaps will follow. This document gives the initial preview of
> > the system and hopefully will seed a lot of interesting
> > discussion/questions etc. around this system.
> >
> > > Hive as a contrib project
> > > -------------------------
> > >
> > >                 Key: HADOOP-3601
> > >                 URL:
> https://issues.apache.org/jira/browse/HADOOP-3601
> > >             Project: Hadoop Core
> > >          Issue Type: New Feature
> > >    Affects Versions: 0.17.0
> > >            Reporter: Joydeep Sen Sarma
> > >            Priority: Minor
> > >         Attachments: HiveTutorial.pdf
> > >
> > >   Original Estimate: 1080h
> > >  Remaining Estimate: 1080h
> > >
> > > Hive is a data warehouse built on top of flat files (stored
> > > primarily
> > in HDFS). It includes:
> > > - Data Organization into Tables with logical and hash partitioning
> > > - A Metastore to store metadata about Tables/Partitions etc
> > > - A SQL like query language over object data stored in Tables
> > > - DDL commands to define and load external data into tables Hive's
> > > query language is executed using Hadoop map-reduce as the execution
> > engine. Queries can use either single stage or multi-stage map-reduce.
> > Hive has a native format for tables - but can handle any data set (for
>
> > example json/thrift/xml) using an IO library framework.
> > > Hive uses Antlr for query parsing, Apache JEXL for expression
> > evaluation and may use Apache Derby as an embedded database for
> > MetaStore. Antlr has a BSD license and should be compatible with
> > Apache license.
> > > We are currently thinking of contributing to the 0.17 branch as a
> > contrib project (since that is the version under which it will get
> > tested internally) - but looking for advice on the best release path.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>

RE: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

Posted by Ashish Thusoo <at...@facebook.com>.

Hi Tim,

Point well taken. We are trying to get this out as soon as possible.
Thanks for the offer for helping us test this things out. We will get
something out to you (an early version) as soon as we have a logical
feature checkpoint.

Cheers,
Ashish 

-----Original Message-----
From: tim robertson [mailto:timrobertson100@gmail.com] 
Sent: Wednesday, July 09, 2008 1:25 AM
To: core-user@hadoop.apache.org
Subject: Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

Hi Ashish

I am very excited to try this, having been evaluating Hadoop, HBase,
Cascading etc recently to process 100 millions of Biodiversity records
(expecting billions soon), with a view for data mining purposes (species
that are critically endangered and observed outside of protected areas
within the last 2 years).  All open access to Biodiversity information.
It is difficult to comment on the paper, as it looks to offer pretty
much most of what I am looking for, but without running it, it's
difficult...

If you would like a tester, I would happily fill this role and offer
sample code and input files which could go into "getting started" guides
on wiki etc.

Cheers,

Tim





On Wed, Jul 9, 2008 at 9:47 AM, Ashish Thusoo <at...@facebook.com>
wrote:

> Hi Folks,
>
> We recently opened up a JIRA in order to bring Hive into the open 
> source fold with the aim of contributing back to hadoop - which has 
> really made large scale data processing so much easier for us at 
> Facebook. We have also uploaded a small tutorial as part of that JIRA 
> that gives a flavor of what kind of capabilities the system has. We 
> would love to get feedback on this, so please check out the described 
> functionality and post any comments, criticisms, wish lists etc. on 
> the JIRA at
>
> https://issues.apache.org/jira/browse/HADOOP-3601
>
> We are planning on an initial release of hive as a contrib project in
> 0.19 version of hadoop and are really excited about the open source 
> possibilities that it can enable, specially in the data 
> warehousing/ETL space. So please stay tunned to the JIRA for future
updates on Hive.
>
> Thanks,
> Ashish for Hive@Facebook
>
> -----Original Message-----
> From: Ashish Thusoo (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, July 08, 2008 4:15 PM
> To: Ashish Thusoo
> Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project
>
>
>     [
> https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.j
> ir a.plugin.system.issuetabpanels:all-tabpanel ]
>
> Ashish Thusoo updated HADOOP-3601:
> ----------------------------------
>
>    Attachment: HiveTutorial.pdf
>
> Tutorial on the capabilities of Hive. This is a pdf of internal 
> documentation and contains query, dml and ddl examples as well as the 
> overview of the system. A formal language spec, architecture documents

> and roadmaps will follow. This document gives the initial preview of 
> the system and hopefully will seed a lot of interesting 
> discussion/questions etc. around this system.
>
> > Hive as a contrib project
> > -------------------------
> >
> >                 Key: HADOOP-3601
> >                 URL:
https://issues.apache.org/jira/browse/HADOOP-3601
> >             Project: Hadoop Core
> >          Issue Type: New Feature
> >    Affects Versions: 0.17.0
> >            Reporter: Joydeep Sen Sarma
> >            Priority: Minor
> >         Attachments: HiveTutorial.pdf
> >
> >   Original Estimate: 1080h
> >  Remaining Estimate: 1080h
> >
> > Hive is a data warehouse built on top of flat files (stored 
> > primarily
> in HDFS). It includes:
> > - Data Organization into Tables with logical and hash partitioning
> > - A Metastore to store metadata about Tables/Partitions etc
> > - A SQL like query language over object data stored in Tables
> > - DDL commands to define and load external data into tables Hive's 
> > query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce.
> Hive has a native format for tables - but can handle any data set (for

> example json/thrift/xml) using an IO library framework.
> > Hive uses Antlr for query parsing, Apache JEXL for expression
> evaluation and may use Apache Derby as an embedded database for 
> MetaStore. Antlr has a BSD license and should be compatible with 
> Apache license.
> > We are currently thinking of contributing to the 0.17 branch as a
> contrib project (since that is the version under which it will get 
> tested internally) - but looking for advice on the best release path.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

Posted by tim robertson <ti...@gmail.com>.

Hi Ashish

I am very excited to try this, having been evaluating Hadoop, HBase,
Cascading etc recently to process 100 millions of Biodiversity records
(expecting billions soon), with a view for data mining purposes (species
that are critically endangered and observed outside of protected areas
within the last 2 years).  All open access to Biodiversity information.  It
is difficult to comment on the paper, as it looks to offer pretty much most
of what I am looking for, but without running it, it's difficult...

If you would like a tester, I would happily fill this role and offer sample
code and input files which could go into "getting started" guides on wiki
etc.

Cheers,

Tim





On Wed, Jul 9, 2008 at 9:47 AM, Ashish Thusoo <at...@facebook.com> wrote:

> Hi Folks,
>
> We recently opened up a JIRA in order to bring Hive into the open source
> fold with the aim of contributing back to hadoop - which has really made
> large scale data processing so much easier for us at Facebook. We have
> also uploaded a small tutorial as part of that JIRA that gives a flavor
> of what kind of capabilities the system has. We would love to get
> feedback on this, so please check out the described functionality and
> post any comments, criticisms, wish lists etc. on the JIRA at
>
> https://issues.apache.org/jira/browse/HADOOP-3601
>
> We are planning on an initial release of hive as a contrib project in
> 0.19 version of hadoop and are really excited about the open source
> possibilities that it can enable, specially in the data warehousing/ETL
> space. So please stay tunned to the JIRA for future updates on Hive.
>
> Thanks,
> Ashish for Hive@Facebook
>
> -----Original Message-----
> From: Ashish Thusoo (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, July 08, 2008 4:15 PM
> To: Ashish Thusoo
> Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project
>
>
>     [
> https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jir
> a.plugin.system.issuetabpanels:all-tabpanel ]
>
> Ashish Thusoo updated HADOOP-3601:
> ----------------------------------
>
>    Attachment: HiveTutorial.pdf
>
> Tutorial on the capabilities of Hive. This is a pdf of internal
> documentation and contains query, dml and ddl examples as well as the
> overview of the system. A formal language spec, architecture documents
> and roadmaps will follow. This document gives the initial preview of the
> system and hopefully will seed a lot of interesting discussion/questions
> etc. around this system.
>
> > Hive as a contrib project
> > -------------------------
> >
> >                 Key: HADOOP-3601
> >                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
> >             Project: Hadoop Core
> >          Issue Type: New Feature
> >    Affects Versions: 0.17.0
> >            Reporter: Joydeep Sen Sarma
> >            Priority: Minor
> >         Attachments: HiveTutorial.pdf
> >
> >   Original Estimate: 1080h
> >  Remaining Estimate: 1080h
> >
> > Hive is a data warehouse built on top of flat files (stored primarily
> in HDFS). It includes:
> > - Data Organization into Tables with logical and hash partitioning
> > - A Metastore to store metadata about Tables/Partitions etc
> > - A SQL like query language over object data stored in Tables
> > - DDL commands to define and load external data into tables Hive's
> > query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce.
> Hive has a native format for tables - but can handle any data set (for
> example json/thrift/xml) using an IO library framework.
> > Hive uses Antlr for query parsing, Apache JEXL for expression
> evaluation and may use Apache Derby as an embedded database for
> MetaStore. Antlr has a BSD license and should be compatible with Apache
> license.
> > We are currently thinking of contributing to the 0.17 branch as a
> contrib project (since that is the version under which it will get
> tested internally) - but looking for advice on the best release path.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>