You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Yang <te...@gmail.com> on 2014/07/24 23:03:42 UTC

does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

if I do a join of a table based on txt file and a table based on HBase, and
say the latter is very large, is HIVE smart enough to utilize the HBase
table's index to do the join, instead of implementing this as a regular map
reduce job, where each table is scanned fully, bucketed on join keys, and
then the matching items found out through the reducer?


thanks
Yang

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

Posted by Yang <te...@gmail.com>.

kind of found this
http://hortonworks.com/blog/hbase-via-hive-part-1/


"
>From a performance perspective, there are things Hive can do today (ie,
not dependent on data types) to take advantage of HBase. There’s also
the possibility of an HBase-aware Hive to make use of HBase tables as
intermediate storage location (HIVE-3565
<https://issues.apache.org/jira/browse/HIVE-3565>), facilitating map-side
joins against dimension tables loaded into HBase. Hive could make use of
HBase’s natural indexed structure (HIVE-3634
<https://issues.apache.org/jira/browse/HIVE-3634>, HIVE-3727
<https://issues.apache.org/jira/browse/HIVE-3727>), potentially saving huge
scans. Currently, the user doesn’t have (any?) control over the scans which
are executed. Configuration on a per-job, or at least per-table basis
should be enabled (HIVE-1233
<https://issues.apache.org/jira/browse/HIVE-1233>). That would enable
an HBase-savy user to provide Hive with hints regarding how it should
interact with HBase. Support for simple split sampling of HBase tables (
HIVE-3399 <https://issues.apache.org/jira/browse/HIVE-3399>) could also be
easily done because HBase manages table partitions already.


On Thu, Jul 24, 2014 at 2:03 PM, Yang <te...@gmail.com> wrote:

> if I do a join of a table based on txt file and a table based on HBase,
> and say the latter is very large, is HIVE smart enough to utilize the HBase
> table's index to do the join, instead of implementing this as a regular map
> reduce job, where each table is scanned fully, bucketed on join keys, and
> then the matching items found out through the reducer?
>
>
> thanks
> Yang
>

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

Posted by Juan Martin Pampliega <jp...@gmail.com>.

The following article about using Klout's Brickhouse library to access an
HBase table as a map through its key might be useful.
http://brickhouseconfessions.wordpress.com/2013/08/06/squash-the-long-tail-with-brickhouses-hbase-udfs/
On Jul 24, 2014 8:56 PM, "Andrew Mains" <an...@kontagent.com> wrote:

>  Agreed--as far as I can tell there isn't any support for this currently.
>
> This JIRA (https://issues.apache.org/jira/browse/HIVE-3727, referenced in
> http://hortonworks.com/blog/hbase-via-hive-part-1/) seems relevant, but
> there's no recent work on it, and I imagine the patch included is out of
> date with trunk.  Perhaps it's worth resurrecting?
>
> Andrew
>
> On 7/24/14, 4:45 PM, java8964 wrote:
>
> I don't think Hbase-Hive integration part is that smart, be able to
> utilize the index existing in the HBase. But I think it depends on the
> version you are using.
>
>  From my experience, there are a lot of improvement space in the
> Hbase-hive integration, especially "push down" logic into HBase engine.
>
>  Yong
>
>  ------------------------------
> From: teddyyyy123@gmail.com
> Date: Thu, 24 Jul 2014 14:03:42 -0700
> Subject: does the HBase-Hive integration support using HBase index
> (primary key or secondary index) in the JOIN implementatoin?
> To: user@hive.apache.org
>
> if I do a join of a table based on txt file and a table based on HBase,
> and say the latter is very large, is HIVE smart enough to utilize the HBase
> table's index to do the join, instead of implementing this as a regular map
> reduce job, where each table is scanned fully, bucketed on join keys, and
> then the matching items found out through the reducer?
>
>
>  thanks
> Yang
>
>
>

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

Posted by Andrew Mains <an...@kontagent.com>.

Agreed--as far as I can tell there isn't any support for this currently.

This JIRA (https://issues.apache.org/jira/browse/HIVE-3727, referenced 
in http://hortonworks.com/blog/hbase-via-hive-part-1/) seems relevant, 
but there's no recent work on it, and I imagine the patch included is 
out of date with trunk.  Perhaps it's worth resurrecting?

Andrew

On 7/24/14, 4:45 PM, java8964 wrote:
> I don't think Hbase-Hive integration part is that smart, be able to 
> utilize the index existing in the HBase. But I think it depends on the 
> version you are using.
>
> From my experience, there are a lot of improvement space in the 
> Hbase-hive integration, especially "push down" logic into HBase engine.
>
> Yong
>
> ------------------------------------------------------------------------
> From: teddyyyy123@gmail.com
> Date: Thu, 24 Jul 2014 14:03:42 -0700
> Subject: does the HBase-Hive integration support using HBase index 
> (primary key or secondary index) in the JOIN implementatoin?
> To: user@hive.apache.org
>
> if I do a join of a table based on txt file and a table based on 
> HBase, and say the latter is very large, is HIVE smart enough to 
> utilize the HBase table's index to do the join, instead of 
> implementing this as a regular map reduce job, where each table is 
> scanned fully, bucketed on join keys, and then the matching items 
> found out through the reducer?
>
>
> thanks
> Yang

RE: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

Posted by java8964 <ja...@hotmail.com>.

I don't think Hbase-Hive integration part is that smart, be able to utilize the index existing in the HBase. But I think it depends on the version you are using.
>From my experience, there are a lot of improvement space in the Hbase-hive integration, especially "push down" logic into HBase engine.
Yong

From: teddyyyy123@gmail.com
Date: Thu, 24 Jul 2014 14:03:42 -0700
Subject: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?
To: user@hive.apache.org

if I do a join of a table based on txt file and a table based on HBase, and say the latter is very large, is HIVE smart enough to utilize the HBase table's index to do the join, instead of implementing this as a regular map reduce job, where each table is scanned fully, bucketed on join keys, and then the matching items found out through the reducer?

thanksYang