You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by SingoWong <si...@gmail.com> on 2010/05/19 11:27:57 UTC

How to use Hive for HBase

Hi,

I got a confused for Hive and HBase.
HBase to be a database, and Hive to be a warehouse, if i wanna wanna
to statistics and analysis the data from warehouse, and my source data is
put on HBase, so, should i move my data from HBase to Hive?

Thanks & Regards,
Singo

Re: How to use Hive for HBase

Posted by SingoWong <si...@gmail.com>.

Thx John, i know how to do that.

Regards,
Singo

On Thu, May 20, 2010 at 4:11 AM, John Sichi <js...@facebook.com> wrote:

> Currently you need to tell Hive about the column information (what names to
> use in Hive, and how they map into colfamily:colname in HBase) as part of
> your CREATE EXTERNAL TABLE statement.
>
> We could support some kind of default mapping in Hive for CREATE EXTERNAL
> TABLE, but that might not get what you want correctly.  Instead, you can
> write a Java utility to read HBase metadata and construct a CREATE EXTERNAL
> TABLE string exactly the way you want it.
>
> JVS
> ________________________________________
> From: Ray Duong [ray.duong@gmail.com]
> Sent: Wednesday, May 19, 2010 1:02 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: How to use Hive for HBase
>
> Hi John,
>
> Is there any easy way to dump the HBase data into Hive, (via HBase export)
> and have Hive read it without knowing all the column qualifier?
>
> Thanks,
> -ray
>
> On Wed, May 19, 2010 at 11:10 AM, John Sichi <jsichi@facebook.com<mailto:
> jsichi@facebook.com>> wrote:
> It's the usual tradeoff.
>
> One approach is ETL (pump the data from HBase into Hive and then analyze it
> there).  The benefit is that once the data is in Hive, queries against it
> will typically run faster (since Hive is optimized for warehousing).  The
> drawback is staleness:  you won't be querying the very latest data.
>
> The other approach is direct queries against the latest data in HBase:
>  up-to-date data, but slower query performance (and adding load to your
> HBase cluster).
>
> You may consider using both approaches:  do ETL, and for most queries, run
> against the Hive data, but when you need the latest, hit HBase.
>
> JVS
>
> ________________________________________
> From: SingoWong [singochina@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, May 19, 2010 2:27 AM
> To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: How to use Hive for HBase
>
> Hi,
>
> I got a confused for Hive and HBase.
> HBase to be a database, and Hive to be a warehouse, if i wanna wanna to
> statistics and analysis the data from warehouse, and my source data is put
> on HBase, so, should i move my data from HBase to Hive?
>
> Thanks & Regards,
> Singo
>
>

RE: How to use Hive for HBase

Posted by John Sichi <js...@facebook.com>.

Currently you need to tell Hive about the column information (what names to use in Hive, and how they map into colfamily:colname in HBase) as part of your CREATE EXTERNAL TABLE statement.

We could support some kind of default mapping in Hive for CREATE EXTERNAL TABLE, but that might not get what you want correctly.  Instead, you can write a Java utility to read HBase metadata and construct a CREATE EXTERNAL TABLE string exactly the way you want it. 

JVS
________________________________________
From: Ray Duong [ray.duong@gmail.com]
Sent: Wednesday, May 19, 2010 1:02 PM
To: hive-user@hadoop.apache.org
Subject: Re: How to use Hive for HBase

Hi John,

Is there any easy way to dump the HBase data into Hive, (via HBase export) and have Hive read it without knowing all the column qualifier?

Thanks,
-ray

On Wed, May 19, 2010 at 11:10 AM, John Sichi <js...@facebook.com>> wrote:
It's the usual tradeoff.

One approach is ETL (pump the data from HBase into Hive and then analyze it there).  The benefit is that once the data is in Hive, queries against it will typically run faster (since Hive is optimized for warehousing).  The drawback is staleness:  you won't be querying the very latest data.

The other approach is direct queries against the latest data in HBase:  up-to-date data, but slower query performance (and adding load to your HBase cluster).

You may consider using both approaches:  do ETL, and for most queries, run against the Hive data, but when you need the latest, hit HBase.

JVS

________________________________________
From: SingoWong [singochina@gmail.com<ma...@gmail.com>]
Sent: Wednesday, May 19, 2010 2:27 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to use Hive for HBase

Hi,

I got a confused for Hive and HBase.
HBase to be a database, and Hive to be a warehouse, if i wanna wanna to statistics and analysis the data from warehouse, and my source data is put on HBase, so, should i move my data from HBase to Hive?

Thanks & Regards,
Singo

Re: How to use Hive for HBase

Posted by Ray Duong <ra...@gmail.com>.

Hi John,

Is there any easy way to dump the HBase data into Hive, (via HBase export)
and have Hive read it without knowing all the column qualifier?

Thanks,
-ray

On Wed, May 19, 2010 at 11:10 AM, John Sichi <js...@facebook.com> wrote:

> It's the usual tradeoff.
>
> One approach is ETL (pump the data from HBase into Hive and then analyze it
> there).  The benefit is that once the data is in Hive, queries against it
> will typically run faster (since Hive is optimized for warehousing).  The
> drawback is staleness:  you won't be querying the very latest data.
>
> The other approach is direct queries against the latest data in HBase:
>  up-to-date data, but slower query performance (and adding load to your
> HBase cluster).
>
> You may consider using both approaches:  do ETL, and for most queries, run
> against the Hive data, but when you need the latest, hit HBase.
>
> JVS
>
> ________________________________________
> From: SingoWong [singochina@gmail.com]
> Sent: Wednesday, May 19, 2010 2:27 AM
> To: hive-user@hadoop.apache.org
> Subject: How to use Hive for HBase
>
> Hi,
>
> I got a confused for Hive and HBase.
> HBase to be a database, and Hive to be a warehouse, if i wanna wanna to
> statistics and analysis the data from warehouse, and my source data is put
> on HBase, so, should i move my data from HBase to Hive?
>
> Thanks & Regards,
> Singo
>

RE: How to use Hive for HBase

Posted by John Sichi <js...@facebook.com>.

It's the usual tradeoff.  

One approach is ETL (pump the data from HBase into Hive and then analyze it there).  The benefit is that once the data is in Hive, queries against it will typically run faster (since Hive is optimized for warehousing).  The drawback is staleness:  you won't be querying the very latest data.

The other approach is direct queries against the latest data in HBase:  up-to-date data, but slower query performance (and adding load to your HBase cluster).

You may consider using both approaches:  do ETL, and for most queries, run against the Hive data, but when you need the latest, hit HBase.

JVS

________________________________________
From: SingoWong [singochina@gmail.com]
Sent: Wednesday, May 19, 2010 2:27 AM
To: hive-user@hadoop.apache.org
Subject: How to use Hive for HBase

Hi,

I got a confused for Hive and HBase.
HBase to be a database, and Hive to be a warehouse, if i wanna wanna to statistics and analysis the data from warehouse, and my source data is put on HBase, so, should i move my data from HBase to Hive?

Thanks & Regards,
Singo