You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2010/10/13 22:07:54 UTC
HBase as input AND output?
Hi,
I was wondering how I can query data stored in HBase and remembered Hive's HBase
integration:
http://wiki.apache.org/hadoop/Hive/HBaseIntegration
After watching John Sichi's video
(http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
) I have a better idea about what functionality this integration provides, but
I still have some questions.
Would it be correct to say that Hive-HBase integration makes the following data
flow possible:
0) Hive or Files => Custom HQL statement that aggregates data ==> HBase
1) HBase ==> Custom HQL statement that aggregates data ==> HBase
2) HBase ==> Custom HQL statement that aggregates data ==> output (console?)
Of the above, 1) is what I'm wondering the most about right now.
In other words, it seems to me that Hive may be able to look at *just* data
stored in HBase *without* the typical data/files in HDFS that Hive normally runs
its MR jobs against.
Is this correct?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
Re: HBase as input AND output?
Posted by John Sichi <js...@facebook.com>.
If your query only accesses HBase tables, then yes, Hive does not access any source data directly from HDFS (although of course it may put intermediate results in HDFS, e.g. for the result of a join).
However, if your query does something like join a HBase table with a native Hive table, then it will read data from both HBase and HDFS.
Likewise, on the write side, it depends whether your INSERT targets an HBase table vs a native Hive table.
The read and write sides are independent.
JVS
On Oct 13, 2010, at 2:24 PM, Otis Gospodnetic wrote:
> Thanks Tim.
> (and sorry for the duplicate email - need to fix my Hive email filter)
>
>
> Just to clarify one bit, though.
> When using Hive without HBase one has data stored in the appropriate directories
> on HDFS and runs MR jobs against those data.
>
> But, when using Hive *with* HBase, does Hive require any such data to be present
> in the HDFS?
> In other words, when using Hive with HBase, one really uses only Hive's ability
> to translate a Hive QL statement to a set of MR jobs (and read from/write to
> HBase) and execute them against only data stored in HBase. Is this correct?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> ----- Original Message ----
>> From: Tim Robertson <ti...@gmail.com>
>> To: user@hive.apache.org
>> Sent: Wed, October 13, 2010 4:45:31 PM
>> Subject: Re: HBase as input AND output?
>>
>> That's right. Hive can use an HBase table as an input format to the
>> hive query regardless of output format, and can also write the output
>> to an HBase table regardless of the input format. You can also
>> supposedly do a join in Hive that uses 1 side of the join from an
>> HBase table, and the other side a text file, which is very powerful.
>> I haven't done it myself, but intend to shortly.
>>
>> HTH,
>> Tim
>>
>>
>> On Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
>> <ot...@yahoo.com> wrote:
>>> Hi,
>>>
>>> I was wondering how I can query data stored in HBase and remembered Hive's
>> HBase
>>> integration:
>>> http://wiki.apache.org/hadoop/Hive/HBaseIntegration
>>>
>>> After watching John Sichi's video
>>>
>> (http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
>>
>>> ) I have a better idea about what functionality this integration provides,
>> but
>>> I still have some questions.
>>>
>>> Would it be correct to say that Hive-HBase integration makes the following
>> data
>>> flow possible:
>>>
>>> 0) Hive or Files => Custom HQL statement that aggregates data ==> HBase
>>> 1) HBase ==> Custom HQL statement that aggregates data ==> HBase
>>> 2) HBase ==> Custom HQL statement that aggregates data ==> output
>> (console?)
>>>
>>> Of the above, 1) is what I'm wondering the most about right now.
>>>
>>> In other words, it seems to me that Hive may be able to look at *just* data
>>> stored in HBase *without* the typical data/files in HDFS that Hive normally
>> runs
>>> its MR jobs against.
>>>
>>> Is this correct?
>>>
>>> Thanks,
>>> Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Hadoop ecosystem search :: http://search-hadoop.com/
>>>
>>>
>>
Re: HBase as input AND output?
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Thanks Tim.
(and sorry for the duplicate email - need to fix my Hive email filter)
Just to clarify one bit, though.
When using Hive without HBase one has data stored in the appropriate directories
on HDFS and runs MR jobs against those data.
But, when using Hive *with* HBase, does Hive require any such data to be present
in the HDFS?
In other words, when using Hive with HBase, one really uses only Hive's ability
to translate a Hive QL statement to a set of MR jobs (and read from/write to
HBase) and execute them against only data stored in HBase. Is this correct?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
----- Original Message ----
> From: Tim Robertson <ti...@gmail.com>
> To: user@hive.apache.org
> Sent: Wed, October 13, 2010 4:45:31 PM
> Subject: Re: HBase as input AND output?
>
> That's right. Hive can use an HBase table as an input format to the
> hive query regardless of output format, and can also write the output
> to an HBase table regardless of the input format. You can also
> supposedly do a join in Hive that uses 1 side of the join from an
> HBase table, and the other side a text file, which is very powerful.
> I haven't done it myself, but intend to shortly.
>
> HTH,
> Tim
>
>
> On Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
> <ot...@yahoo.com> wrote:
> > Hi,
> >
> > I was wondering how I can query data stored in HBase and remembered Hive's
>HBase
> > integration:
> > http://wiki.apache.org/hadoop/Hive/HBaseIntegration
> >
> > After watching John Sichi's video
> >
>(http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
>
> > ) I have a better idea about what functionality this integration provides,
>but
> > I still have some questions.
> >
> > Would it be correct to say that Hive-HBase integration makes the following
>data
> > flow possible:
> >
> > 0) Hive or Files => Custom HQL statement that aggregates data ==> HBase
> > 1) HBase ==> Custom HQL statement that aggregates data ==> HBase
> > 2) HBase ==> Custom HQL statement that aggregates data ==> output
>(console?)
> >
> > Of the above, 1) is what I'm wondering the most about right now.
> >
> > In other words, it seems to me that Hive may be able to look at *just* data
> > stored in HBase *without* the typical data/files in HDFS that Hive normally
>runs
> > its MR jobs against.
> >
> > Is this correct?
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >
>
Re: HBase as input AND output?
Posted by Tim Robertson <ti...@gmail.com>.
That's right. Hive can use an HBase table as an input format to the
hive query regardless of output format, and can also write the output
to an HBase table regardless of the input format. You can also
supposedly do a join in Hive that uses 1 side of the join from an
HBase table, and the other side a text file, which is very powerful.
I haven't done it myself, but intend to shortly.
HTH,
Tim
On Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hi,
>
> I was wondering how I can query data stored in HBase and remembered Hive's HBase
> integration:
> http://wiki.apache.org/hadoop/Hive/HBaseIntegration
>
> After watching John Sichi's video
> (http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
> ) I have a better idea about what functionality this integration provides, but
> I still have some questions.
>
> Would it be correct to say that Hive-HBase integration makes the following data
> flow possible:
>
> 0) Hive or Files => Custom HQL statement that aggregates data ==> HBase
> 1) HBase ==> Custom HQL statement that aggregates data ==> HBase
> 2) HBase ==> Custom HQL statement that aggregates data ==> output (console?)
>
> Of the above, 1) is what I'm wondering the most about right now.
>
> In other words, it seems to me that Hive may be able to look at *just* data
> stored in HBase *without* the typical data/files in HDFS that Hive normally runs
> its MR jobs against.
>
> Is this correct?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>