You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by David Koch <og...@googlemail.com> on 2012/12/06 19:56:27 UTC

Mapping existing HBase table with many columns to Hive.

Hello,

How can I map an HBase table with the following layout to Hive using the
"CREATE EXTERNAL TABLE" command from shell (or another programmatic way):

The HBase table's layout is as follows:
Rowkey=16 bytes, a UUID that had the "-" removed, and the 32hex chars
converted into two 8byte longs.
Columns (qualifiers): timestamps, i.e the bytes of a long which were
converted using Hadoop's Bytes.toBytes(long). There can be many of those in
a single row.
Values: The bytes of a Java string.

I am unsure of which datatypes to use. I am pretty sure there is no way I
can sensible map the row key to anything other than "binary" but maybe the
columns - which are longs and the values which are strings can be mapped to
their according Hive datatypes.

I include an extract of what a row looks like in HBase shell below:

Thank you,

/David

hbase(main):009:0> scan "hits"
ROW
              COLUMN+CELL

\x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC
 column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha
\x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC
 column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967, value=testtest

Re: Mapping existing HBase table with many columns to Hive.

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.
The data type corresponds to the type of data that the qualifiers holds and
not the data type of the qualifier. So, with the above query, for the
column value map<string,string>, the type is map<string,string> which means
that your qualifiers mapped to keys are of type "string" and underlying
values of type "string". The patch will just allow hive to read data which
is stored under a binary qualifier like you have and once the mapping is
done, treat the binary qualifier as a "string" as well since that is the
only type hive allows for map keys.

Hope that helps.


On Sun, Dec 9, 2012 at 6:03 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> I tried the shell command which Swarnim kindly provided and it allows me
> to map an existing HBase table into Hive. However, since my qualifiers are
> long but map only accepts string as a key, the result is garbled. Even with
> the suggested patch which allows binary keys, the resulting datatype in
> Hive would not be long but binary, making it hard to query from shell. It
> seems there is no API for now, right?
>
> Currently, is there any way to map HBase byte[] to Hive datatypes?
>
> The assumption is, that all byte[] were generated using Hadoop's
> Byte.toBytes(<type>) method and that either all row keys, qualifiers and
> values share the same data type respectively (for example: row keys are
> ints, qualifiers are longs and values are strings).
>
> Thank you,
>
> /David
>
>
> On Thu, Dec 6, 2012 at 9:23 PM, David Koch <og...@googlemail.com> wrote:
>
>> Hello Swarnim,
>>
>> Thank you for your answer. I will try the options you pointed out.
>>
>> /David
>>
>>
>> On Thu, Dec 6, 2012 at 9:10 PM, kulkarni.swarnim@gmail.com <
>> kulkarni.swarnim@gmail.com> wrote:
>>
>>> map
>>
>>
>>
>


-- 
Swarnim

Re: Mapping existing HBase table with many columns to Hive.

Posted by David Koch <og...@googlemail.com>.
Hello,

I tried the shell command which Swarnim kindly provided and it allows me to
map an existing HBase table into Hive. However, since my qualifiers are
long but map only accepts string as a key, the result is garbled. Even with
the suggested patch which allows binary keys, the resulting datatype in
Hive would not be long but binary, making it hard to query from shell. It
seems there is no API for now, right?

Currently, is there any way to map HBase byte[] to Hive datatypes?

The assumption is, that all byte[] were generated using Hadoop's
Byte.toBytes(<type>) method and that either all row keys, qualifiers and
values share the same data type respectively (for example: row keys are
ints, qualifiers are longs and values are strings).

Thank you,

/David


On Thu, Dec 6, 2012 at 9:23 PM, David Koch <og...@googlemail.com> wrote:

> Hello Swarnim,
>
> Thank you for your answer. I will try the options you pointed out.
>
> /David
>
>
> On Thu, Dec 6, 2012 at 9:10 PM, kulkarni.swarnim@gmail.com <
> kulkarni.swarnim@gmail.com> wrote:
>
>> map
>
>
>

Re: Mapping existing HBase table with many columns to Hive.

Posted by David Koch <og...@googlemail.com>.
Hello Swarnim,

Thank you for your answer. I will try the options you pointed out.

/David


On Thu, Dec 6, 2012 at 9:10 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> map

Re: Mapping existing HBase table with many columns to Hive.

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.
Hi David,

First of all, you columns are not "long". They are binary as well.
Currently as hive stands, there is no support for binary qualifiers.
However, I recently submitted a patch for that[1]. Feel free to give it a
shot and let me know if you see any issues. With that patch, you can
directly give your qualifiers to hive as they look here (
\x00\x00\x01;2\xE6Q\x06).

Until then, the only option you have is to use a map to map all your
columns under the column family "t". An example to do that would be:


CREATE EXTERNAL TABLE hbase_table_1(key int, value map<string,string>)


STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,t:")
TBLPROPERTIES("hbase.table.name" = "some_existing_table");


Also as far as your key goes, it is a composite key. There is also an
existing patch for the support of that here[2].


Hope that helps.


[1] https://issues.apache.org/jira/browse/HIVE-3553
[2] https://issues.apache.org/jira/browse/HIVE-2599


On Thu, Dec 6, 2012 at 12:56 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> How can I map an HBase table with the following layout to Hive using the
> "CREATE EXTERNAL TABLE" command from shell (or another programmatic way):
>
> The HBase table's layout is as follows:
> Rowkey=16 bytes, a UUID that had the "-" removed, and the 32hex chars
> converted into two 8byte longs.
> Columns (qualifiers): timestamps, i.e the bytes of a long which were
> converted using Hadoop's Bytes.toBytes(long). There can be many of those in
> a single row.
> Values: The bytes of a Java string.
>
> I am unsure of which datatypes to use. I am pretty sure there is no way I
> can sensible map the row key to anything other than "binary" but maybe the
> columns - which are longs and the values which are strings can be mapped to
> their according Hive datatypes.
>
> I include an extract of what a row looks like in HBase shell below:
>
> Thank you,
>
> /David
>
> hbase(main):009:0> scan "hits"
> ROW
>                 COLUMN+CELL
>
> \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC
>  column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha
> \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC
>  column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967,
> value=testtest
>



-- 
Swarnim