You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ray Duong <ra...@gmail.com> on 2010/06/15 20:57:05 UTC

Hive-Hbase with large number of columns

Hi,

I'm trying to map a Hbase table in Hive that contains large number of
columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
integration have any set limitation on the number of columns it can map in
one table?  I seem to hit a limit at 10 columns.

Thanks,
-ray

create external table hbase_t1
(
key string,
f1_a string,
f2_a string,
f1_b string,
f2_b string,
...
...
f1_m string,
f2_m string,

 )
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
)
 TBLPROPERTIES("hbase.table.name" = "t1");

Error Message:

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
VALUES (?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask

Re: Hive-Hbase with large number of columns

Posted by Ray Duong <ra...@gmail.com>.
Thanks Guys,

I found a workaround for the size limitation.  It seems Mysql does support
upto 65,535 bytes for varchar columns.  I manually modified the column
property without the patch code and it seems to ignore the size limit.

-ray



On Wed, Jun 16, 2010 at 8:02 PM, Carl Steinbach <ca...@cloudera.com> wrote:

> I updated the patch:
> http://issues.apache.org/jira/secure/attachment/12447307/HIVE-1364.2.patch.txt
>
> Thanks.
>
> Carl
>
>
> On Wed, Jun 16, 2010 at 7:31 PM, John Sichi <js...@facebook.com> wrote:
>
>> Looks like that patch has a bug.  It should not be changing
>> PARTITIONS.PART_NAME, which is an indexed column.  Try again, undoing that
>> line of the patch first.
>>
>> JVS
>> ________________________________________
>> From: Ray Duong [ray.duong@gmail.com]
>> Sent: Wednesday, June 16, 2010 5:24 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Re: Hive-Hbase with large number of columns
>>
>> Hi,
>>
>> I applied the patch HIVE-1364 and rebuilt the metastore.  I was able to
>> create an external table in Hive for a large number of columns ( upto
>> 4000bytes).
>>
>> Now when I tried to drop the external table I get the following error
>> message.  Is there another file that I need to modify in order to drop the
>> table?
>>
>> hive> drop table hbase_test;
>>
>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error(s) were
>> found while auto-creating/validating the datastore for classes. The errors
>> are printed in the log, and are attached to this exception.
>> NestedThrowables:
>> com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too
>> long; max key length is 767 bytes
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>> Thanks,
>> -ray
>>
>>
>> On Tue, Jun 15, 2010 at 4:05 PM, John Sichi <jsichi@facebook.com<mailto:
>> jsichi@facebook.com>> wrote:
>> Column-level properties are attractive for other reasons, but I don't
>> think we should do it as a workaround for underlying limits.  I've noted in
>> JIRA that I think a LOB would be more appropriate here.
>>
>> Note that while you're waiting for a permanent resolution, you can use
>> ALTER TABLE on your metastore DBMS to widen the precision.
>>
>> JVS
>>
>> On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:
>>
>> Yes, I think I might have to do that. I was trying to avoid multiple Hbase
>> scans with one big table.
>>
>> BTW, would it better to store the column SERDE properties for each column
>> versus at the table level to avoid the 767 or 4000 byte limitation?
>>
>> Thanks again,
>> -ray
>>
>>
>> On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <edlinuxguru@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>
>> On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ray.duong@gmail.com<mailto:
>> ray.duong@gmail.com>> wrote:
>> Thanks for all the help.
>>
>> -ray
>>
>>
>> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <carl@cloudera.com
>> <ma...@cloudera.com>> wrote:
>> Hi Ray,
>>
>> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As
>> far as I know this is the smallest maximum VARCHAR size out of the databases
>> we currently try to support (MySQL, Oracle, Derby, etc).
>>
>> Carl
>>
>>
>> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ray.duong@gmail.com<mailto:
>> ray.duong@gmail.com>> wrote:
>> Thank John/Carl,
>>
>> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
>> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
>> is a limit beyond 4000 bytes?
>>
>> -ray
>>
>> Error:
>> Caused by: ERROR 22001: A truncation error was encountered trying to
>> shrink VARCHAR
>> 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length
>> 767.
>>
>>
>>
>>
>> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <carl@cloudera.com
>> <ma...@cloudera.com>> wrote:
>> Hi Ray,
>>
>> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
>> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that
>> you're bumping into this limitation (assuming you abbreviated the column
>> names in your example).
>>
>>
>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <jsichi@facebook.com<mailto:
>> jsichi@facebook.com>> wrote:
>> That exception is coming from the metastore (trying to write the table
>> definition).  Could you dig down into the Hive logs to see if you can get
>> the underlying cause?
>>
>> You can get the logs to spew on console by adding "-hiveconf
>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>
>> JVS
>>
>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>
>> Hi,
>>
>> I'm trying to map a Hbase table in Hive that contains large number of
>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>> integration have any set limitation on the number of columns it can map in
>> one table?  I seem to hit a limit at 10 columns.
>>
>> Thanks,
>> -ray
>>
>> create external table hbase_t1
>> (
>> key string,
>> f1_a string,
>> f2_a string,
>> f1_b string,
>> f2_b string,
>> ...
>> ...
>> f1_m string,
>> f2_m string,
>>
>>  )
>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>> )
>>  TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");
>>
>> Error Message:
>>
>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
>> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
>> VALUES (?,?,?)
>> NestedThrowables:
>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
>> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> You have probably thought of this, but in the short term you can create
>> two vertically partitioned tables and do a 1 to 1 join on their key.
>> Edward
>>
>>
>>
>>
>>
>

Re: Hive-Hbase with large number of columns

Posted by Carl Steinbach <ca...@cloudera.com>.
I updated the patch:
http://issues.apache.org/jira/secure/attachment/12447307/HIVE-1364.2.patch.txt

Thanks.

Carl

On Wed, Jun 16, 2010 at 7:31 PM, John Sichi <js...@facebook.com> wrote:

> Looks like that patch has a bug.  It should not be changing
> PARTITIONS.PART_NAME, which is an indexed column.  Try again, undoing that
> line of the patch first.
>
> JVS
> ________________________________________
> From: Ray Duong [ray.duong@gmail.com]
> Sent: Wednesday, June 16, 2010 5:24 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Hive-Hbase with large number of columns
>
> Hi,
>
> I applied the patch HIVE-1364 and rebuilt the metastore.  I was able to
> create an external table in Hive for a large number of columns ( upto
> 4000bytes).
>
> Now when I tried to drop the external table I get the following error
> message.  Is there another file that I need to modify in order to drop the
> table?
>
> hive> drop table hbase_test;
>
> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error(s) were
> found while auto-creating/validating the datastore for classes. The errors
> are printed in the log, and are attached to this exception.
> NestedThrowables:
> com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too
> long; max key length is 767 bytes
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
> Thanks,
> -ray
>
>
> On Tue, Jun 15, 2010 at 4:05 PM, John Sichi <jsichi@facebook.com<mailto:
> jsichi@facebook.com>> wrote:
> Column-level properties are attractive for other reasons, but I don't think
> we should do it as a workaround for underlying limits.  I've noted in JIRA
> that I think a LOB would be more appropriate here.
>
> Note that while you're waiting for a permanent resolution, you can use
> ALTER TABLE on your metastore DBMS to widen the precision.
>
> JVS
>
> On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:
>
> Yes, I think I might have to do that. I was trying to avoid multiple Hbase
> scans with one big table.
>
> BTW, would it better to store the column SERDE properties for each column
> versus at the table level to avoid the 767 or 4000 byte limitation?
>
> Thanks again,
> -ray
>
>
> On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <edlinuxguru@gmail.com
> <ma...@gmail.com>> wrote:
>
>
> On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ray.duong@gmail.com<mailto:
> ray.duong@gmail.com>> wrote:
> Thanks for all the help.
>
> -ray
>
>
> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <carl@cloudera.com<mailto:
> carl@cloudera.com>> wrote:
> Hi Ray,
>
> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far
> as I know this is the smallest maximum VARCHAR size out of the databases we
> currently try to support (MySQL, Oracle, Derby, etc).
>
> Carl
>
>
> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ray.duong@gmail.com<mailto:
> ray.duong@gmail.com>> wrote:
> Thank John/Carl,
>
> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
> is a limit beyond 4000 bytes?
>
> -ray
>
> Error:
> Caused by: ERROR 22001: A truncation error was encountered trying to shrink
> VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to
> length 767.
>
>
>
>
> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <carl@cloudera.com
> <ma...@cloudera.com>> wrote:
> Hi Ray,
>
> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're
> bumping into this limitation (assuming you abbreviated the column names in
> your example).
>
>
> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <jsichi@facebook.com<mailto:
> jsichi@facebook.com>> wrote:
> That exception is coming from the metastore (trying to write the table
> definition).  Could you dig down into the Hive logs to see if you can get
> the underlying cause?
>
> You can get the logs to spew on console by adding "-hiveconf
> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>
> JVS
>
> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>
> Hi,
>
> I'm trying to map a Hbase table in Hive that contains large number of
> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
> integration have any set limitation on the number of columns it can map in
> one table?  I seem to hit a limit at 10 columns.
>
> Thanks,
> -ray
>
> create external table hbase_t1
> (
> key string,
> f1_a string,
> f2_a string,
> f1_b string,
> f2_b string,
> ...
> ...
> f1_m string,
> f2_m string,
>
>  )
>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
> )
>  TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");
>
> Error Message:
>
> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
> VALUES (?,?,?)
> NestedThrowables:
> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>
>
>
>
>
> You have probably thought of this, but in the short term you can create two
> vertically partitioned tables and do a 1 to 1 join on their key.
> Edward
>
>
>
>
>

RE: Hive-Hbase with large number of columns

Posted by John Sichi <js...@facebook.com>.
Looks like that patch has a bug.  It should not be changing PARTITIONS.PART_NAME, which is an indexed column.  Try again, undoing that line of the patch first.

JVS
________________________________________
From: Ray Duong [ray.duong@gmail.com]
Sent: Wednesday, June 16, 2010 5:24 PM
To: hive-user@hadoop.apache.org
Subject: Re: Hive-Hbase with large number of columns

Hi,

I applied the patch HIVE-1364 and rebuilt the metastore.  I was able to create an external table in Hive for a large number of columns ( upto 4000bytes).

Now when I tried to drop the external table I get the following error message.  Is there another file that I need to modify in order to drop the table?

hive> drop table hbase_test;

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception.
NestedThrowables:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Thanks,
-ray


On Tue, Jun 15, 2010 at 4:05 PM, John Sichi <js...@facebook.com>> wrote:
Column-level properties are attractive for other reasons, but I don't think we should do it as a workaround for underlying limits.  I've noted in JIRA that I think a LOB would be more appropriate here.

Note that while you're waiting for a permanent resolution, you can use ALTER TABLE on your metastore DBMS to widen the precision.

JVS

On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:

Yes, I think I might have to do that. I was trying to avoid multiple Hbase scans with one big table.

BTW, would it better to store the column SERDE properties for each column versus at the table level to avoid the 767 or 4000 byte limitation?

Thanks again,
-ray


On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <ed...@gmail.com>> wrote:


On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ra...@gmail.com>> wrote:
Thanks for all the help.

-ray


On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <ca...@cloudera.com>> wrote:
Hi Ray,

4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far as I know this is the smallest maximum VARCHAR size out of the databases we currently try to support (MySQL, Oracle, Derby, etc).

Carl


On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com>> wrote:
Thank John/Carl,

Yep, there seems to be a limit on the 767 byte size.  So I see the patch HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there is a limit beyond 4000 bytes?

-ray

Error:
Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length 767.




On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>> wrote:
Hi Ray,

There is currently a 767 byte size limit on SERDEPROPERTIES values (see http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're bumping into this limitation (assuming you abbreviated the column names in your example).


On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com>> wrote:
That exception is coming from the metastore (trying to write the table definition).  Could you dig down into the Hive logs to see if you can get the underlying cause?

You can get the logs to spew on console by adding "-hiveconf hive.root.logger=DEBUG,console" to your Hive CLI invocation.

JVS

On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:

Hi,

I'm trying to map a Hbase table in Hive that contains large number of columns.  Since Hbase is designed to be a wide table, does Hive/Hbase integration have any set limitation on the number of columns it can map in one table?  I seem to hit a limit at 10 columns.

Thanks,
-ray

create external table hbase_t1
(
key string,
f1_a string,
f2_a string,
f1_b string,
f2_b string,
...
...
f1_m string,
f2_m string,

 )
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m" )
 TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");

Error Message:

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask









You have probably thought of this, but in the short term you can create two vertically partitioned tables and do a 1 to 1 join on their key.
Edward





Re: Hive-Hbase with large number of columns

Posted by Ray Duong <ra...@gmail.com>.
Hi,

I applied the patch HIVE-1364 and rebuilt the metastore.  I was able to
create an external table in Hive for a large number of columns ( upto
4000bytes).

Now when I tried to drop the external table I get the following error
message.  Is there another file that I need to modify in order to drop the
table?

hive> drop table hbase_test;

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error(s) were
found while auto-creating/validating the datastore for classes. The errors
are printed in the log, and are attached to this exception.
NestedThrowables:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was too
long; max key length is 767 bytes
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask

Thanks,
-ray


On Tue, Jun 15, 2010 at 4:05 PM, John Sichi <js...@facebook.com> wrote:

> Column-level properties are attractive for other reasons, but I don't think
> we should do it as a workaround for underlying limits.  I've noted in JIRA
> that I think a LOB would be more appropriate here.
>
> Note that while you're waiting for a permanent resolution, you can use
> ALTER TABLE on your metastore DBMS to widen the precision.
>
> JVS
>
> On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:
>
> Yes, I think I might have to do that. I was trying to avoid multiple Hbase
> scans with one big table.
>
> BTW, would it better to store the column SERDE properties for each column
> versus at the table level to avoid the 767 or 4000 byte limitation?
>
> Thanks again,
> -ray
>
>
> On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>>
>>
>> On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ra...@gmail.com> wrote:
>>
>>> Thanks for all the help.
>>>
>>> -ray
>>>
>>>
>>> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>>>
>>>> Hi Ray,
>>>>
>>>> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As
>>>> far as I know this is the smallest maximum VARCHAR size out of the databases
>>>> we currently try to support (MySQL, Oracle, Derby, etc).
>>>>
>>>> Carl
>>>>
>>>>
>>>> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com> wrote:
>>>>
>>>>> Thank John/Carl,
>>>>>
>>>>> Yep, there seems to be a limit on the 767 byte size.  So I see the
>>>>> patch HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if
>>>>> there is a limit beyond 4000 bytes?
>>>>>
>>>>> -ray
>>>>>
>>>>> Error:
>>>>> Caused by: ERROR 22001: A truncation error was encountered trying to
>>>>> shrink VARCHAR
>>>>> 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length
>>>>> 767.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>>>>>
>>>>>> Hi Ray,
>>>>>>
>>>>>> There is currently a 767 byte size limit on SERDEPROPERTIES values
>>>>>> (see http://issues.apache.org/jira/browse/HIVE-1364). It's possible
>>>>>> that you're bumping into this limitation (assuming you abbreviated the
>>>>>> column names in your example).
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com>wrote:
>>>>>>
>>>>>>> That exception is coming from the metastore (trying to write the
>>>>>>> table definition).  Could you dig down into the Hive logs to see if you can
>>>>>>> get the underlying cause?
>>>>>>>
>>>>>>>  You can get the logs to spew on console by adding "-hiveconf
>>>>>>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>>>>>>
>>>>>>> JVS
>>>>>>>
>>>>>>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm trying to map a Hbase table in Hive that contains large number of
>>>>>>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>>>>>>> integration have any set limitation on the number of columns it can map in
>>>>>>> one table?  I seem to hit a limit at 10 columns.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -ray
>>>>>>>
>>>>>>> create external table hbase_t1
>>>>>>> (
>>>>>>> key string,
>>>>>>> f1_a string,
>>>>>>> f2_a string,
>>>>>>> f1_b string,
>>>>>>> f2_b string,
>>>>>>> ...
>>>>>>> ...
>>>>>>> f1_m string,
>>>>>>> f2_m string,
>>>>>>>
>>>>>>>  )
>>>>>>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>>>>>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>>>>>>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>>>>>>> )
>>>>>>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>>>>>>
>>>>>>> Error Message:
>>>>>>>
>>>>>>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put
>>>>>>> request failed : INSERT INTO `SERDE_PARAMS`
>>>>>>> (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
>>>>>>> NestedThrowables:
>>>>>>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException:
>>>>>>> INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES
>>>>>>> (?,?,?)
>>>>>>> FAILED: Execution Error, return code 1 from
>>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> You have probably thought of this, but in the short term you can create
>> two vertically partitioned tables and do a 1 to 1 join on their key.
>> Edward
>>
>>
>
>

Re: Hive-Hbase with large number of columns

Posted by John Sichi <js...@facebook.com>.
Column-level properties are attractive for other reasons, but I don't think we should do it as a workaround for underlying limits.  I've noted in JIRA that I think a LOB would be more appropriate here.

Note that while you're waiting for a permanent resolution, you can use ALTER TABLE on your metastore DBMS to widen the precision.

JVS

On Jun 15, 2010, at 3:37 PM, Ray Duong wrote:

Yes, I think I might have to do that. I was trying to avoid multiple Hbase scans with one big table.

BTW, would it better to store the column SERDE properties for each column versus at the table level to avoid the 767 or 4000 byte limitation?

Thanks again,
-ray


On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <ed...@gmail.com>> wrote:


On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ra...@gmail.com>> wrote:
Thanks for all the help.

-ray


On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <ca...@cloudera.com>> wrote:
Hi Ray,

4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far as I know this is the smallest maximum VARCHAR size out of the databases we currently try to support (MySQL, Oracle, Derby, etc).

Carl


On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com>> wrote:
Thank John/Carl,

Yep, there seems to be a limit on the 767 byte size.  So I see the patch HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there is a limit beyond 4000 bytes?

-ray

Error:
Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length 767.




On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>> wrote:
Hi Ray,

There is currently a 767 byte size limit on SERDEPROPERTIES values (see http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're bumping into this limitation (assuming you abbreviated the column names in your example).


On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com>> wrote:
That exception is coming from the metastore (trying to write the table definition).  Could you dig down into the Hive logs to see if you can get the underlying cause?

You can get the logs to spew on console by adding "-hiveconf hive.root.logger=DEBUG,console" to your Hive CLI invocation.

JVS

On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:

Hi,

I'm trying to map a Hbase table in Hive that contains large number of columns.  Since Hbase is designed to be a wide table, does Hive/Hbase integration have any set limitation on the number of columns it can map in one table?  I seem to hit a limit at 10 columns.

Thanks,
-ray

create external table hbase_t1
(
key string,
f1_a string,
f2_a string,
f1_b string,
f2_b string,
...
...
f1_m string,
f2_m string,

 )
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m" )
 TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");

Error Message:

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask









You have probably thought of this, but in the short term you can create two vertically partitioned tables and do a 1 to 1 join on their key.
Edward




Re: Hive-Hbase with large number of columns

Posted by Ray Duong <ra...@gmail.com>.
Yes, I think I might have to do that. I was trying to avoid multiple Hbase
scans with one big table.

BTW, would it better to store the column SERDE properties for each column
versus at the table level to avoid the 767 or 4000 byte limitation?

Thanks again,
-ray


On Tue, Jun 15, 2010 at 2:42 PM, Edward Capriolo <ed...@gmail.com>wrote:

>
>
> On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ra...@gmail.com> wrote:
>
>> Thanks for all the help.
>>
>> -ray
>>
>>
>> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>>
>>> Hi Ray,
>>>
>>> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As
>>> far as I know this is the smallest maximum VARCHAR size out of the databases
>>> we currently try to support (MySQL, Oracle, Derby, etc).
>>>
>>> Carl
>>>
>>>
>>> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com> wrote:
>>>
>>>> Thank John/Carl,
>>>>
>>>> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
>>>> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
>>>> is a limit beyond 4000 bytes?
>>>>
>>>> -ray
>>>>
>>>> Error:
>>>> Caused by: ERROR 22001: A truncation error was encountered trying to
>>>> shrink VARCHAR
>>>> 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length
>>>> 767.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>>>>
>>>>> Hi Ray,
>>>>>
>>>>> There is currently a 767 byte size limit on SERDEPROPERTIES values
>>>>> (see http://issues.apache.org/jira/browse/HIVE-1364). It's possible
>>>>> that you're bumping into this limitation (assuming you abbreviated the
>>>>> column names in your example).
>>>>>
>>>>>
>>>>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com>wrote:
>>>>>
>>>>>> That exception is coming from the metastore (trying to write the table
>>>>>> definition).  Could you dig down into the Hive logs to see if you can get
>>>>>> the underlying cause?
>>>>>>
>>>>>>  You can get the logs to spew on console by adding "-hiveconf
>>>>>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>>>>>
>>>>>> JVS
>>>>>>
>>>>>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to map a Hbase table in Hive that contains large number of
>>>>>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>>>>>> integration have any set limitation on the number of columns it can map in
>>>>>> one table?  I seem to hit a limit at 10 columns.
>>>>>>
>>>>>> Thanks,
>>>>>> -ray
>>>>>>
>>>>>> create external table hbase_t1
>>>>>> (
>>>>>> key string,
>>>>>> f1_a string,
>>>>>> f2_a string,
>>>>>> f1_b string,
>>>>>> f2_b string,
>>>>>> ...
>>>>>> ...
>>>>>> f1_m string,
>>>>>> f2_m string,
>>>>>>
>>>>>>  )
>>>>>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>>>>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>>>>>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>>>>>> )
>>>>>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>>>>>
>>>>>> Error Message:
>>>>>>
>>>>>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put
>>>>>> request failed : INSERT INTO `SERDE_PARAMS`
>>>>>> (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
>>>>>> NestedThrowables:
>>>>>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException:
>>>>>> INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES
>>>>>> (?,?,?)
>>>>>> FAILED: Execution Error, return code 1 from
>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> You have probably thought of this, but in the short term you can create two
> vertically partitioned tables and do a 1 to 1 join on their key.
> Edward
>
>

Re: Hive-Hbase with large number of columns

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Jun 15, 2010 at 5:04 PM, Ray Duong <ra...@gmail.com> wrote:

> Thanks for all the help.
>
> -ray
>
>
> On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>
>> Hi Ray,
>>
>> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As
>> far as I know this is the smallest maximum VARCHAR size out of the databases
>> we currently try to support (MySQL, Oracle, Derby, etc).
>>
>> Carl
>>
>>
>> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com> wrote:
>>
>>> Thank John/Carl,
>>>
>>> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
>>> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
>>> is a limit beyond 4000 bytes?
>>>
>>> -ray
>>>
>>> Error:
>>> Caused by: ERROR 22001: A truncation error was encountered trying to
>>> shrink VARCHAR
>>> 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length
>>> 767.
>>>
>>>
>>>
>>>
>>> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>>>
>>>> Hi Ray,
>>>>
>>>> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
>>>> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that
>>>> you're bumping into this limitation (assuming you abbreviated the column
>>>> names in your example).
>>>>
>>>>
>>>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com>wrote:
>>>>
>>>>> That exception is coming from the metastore (trying to write the table
>>>>> definition).  Could you dig down into the Hive logs to see if you can get
>>>>> the underlying cause?
>>>>>
>>>>>  You can get the logs to spew on console by adding "-hiveconf
>>>>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>>>>
>>>>> JVS
>>>>>
>>>>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to map a Hbase table in Hive that contains large number of
>>>>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>>>>> integration have any set limitation on the number of columns it can map in
>>>>> one table?  I seem to hit a limit at 10 columns.
>>>>>
>>>>> Thanks,
>>>>> -ray
>>>>>
>>>>> create external table hbase_t1
>>>>> (
>>>>> key string,
>>>>> f1_a string,
>>>>> f2_a string,
>>>>> f1_b string,
>>>>> f2_b string,
>>>>> ...
>>>>> ...
>>>>> f1_m string,
>>>>> f2_m string,
>>>>>
>>>>>  )
>>>>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>>>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>>>>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>>>>> )
>>>>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>>>>
>>>>> Error Message:
>>>>>
>>>>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
>>>>> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
>>>>> VALUES (?,?,?)
>>>>> NestedThrowables:
>>>>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException:
>>>>> INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES
>>>>> (?,?,?)
>>>>> FAILED: Execution Error, return code 1 from
>>>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
You have probably thought of this, but in the short term you can create two
vertically partitioned tables and do a 1 to 1 join on their key.
Edward

Re: Hive-Hbase with large number of columns

Posted by Ray Duong <ra...@gmail.com>.
Thanks for all the help.

-ray

On Tue, Jun 15, 2010 at 1:26 PM, Carl Steinbach <ca...@cloudera.com> wrote:

> Hi Ray,
>
> 4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far
> as I know this is the smallest maximum VARCHAR size out of the databases we
> currently try to support (MySQL, Oracle, Derby, etc).
>
> Carl
>
>
> On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com> wrote:
>
>> Thank John/Carl,
>>
>> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
>> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
>> is a limit beyond 4000 bytes?
>>
>> -ray
>>
>> Error:
>> Caused by: ERROR 22001: A truncation error was encountered trying to
>> shrink VARCHAR
>> 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to length
>> 767.
>>
>>
>>
>>
>> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>>
>>> Hi Ray,
>>>
>>> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
>>> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that
>>> you're bumping into this limitation (assuming you abbreviated the column
>>> names in your example).
>>>
>>>
>>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com>wrote:
>>>
>>>> That exception is coming from the metastore (trying to write the table
>>>> definition).  Could you dig down into the Hive logs to see if you can get
>>>> the underlying cause?
>>>>
>>>>  You can get the logs to spew on console by adding "-hiveconf
>>>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>>>
>>>> JVS
>>>>
>>>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to map a Hbase table in Hive that contains large number of
>>>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>>>> integration have any set limitation on the number of columns it can map in
>>>> one table?  I seem to hit a limit at 10 columns.
>>>>
>>>> Thanks,
>>>> -ray
>>>>
>>>> create external table hbase_t1
>>>> (
>>>> key string,
>>>> f1_a string,
>>>> f2_a string,
>>>> f1_b string,
>>>> f2_b string,
>>>> ...
>>>> ...
>>>> f1_m string,
>>>> f2_m string,
>>>>
>>>>  )
>>>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>>>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>>>> )
>>>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>>>
>>>> Error Message:
>>>>
>>>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
>>>> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
>>>> VALUES (?,?,?)
>>>> NestedThrowables:
>>>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
>>>> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
>>>> FAILED: Execution Error, return code 1 from
>>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hive-Hbase with large number of columns

Posted by Carl Steinbach <ca...@cloudera.com>.
Hi Ray,

4000 bytes is the maximum VARCHAR size allowed on Oracle 9i/10g/11g. As far
as I know this is the smallest maximum VARCHAR size out of the databases we
currently try to support (MySQL, Oracle, Derby, etc).

Carl

On Tue, Jun 15, 2010 at 1:15 PM, Ray Duong <ra...@gmail.com> wrote:

> Thank John/Carl,
>
> Yep, there seems to be a limit on the 767 byte size.  So I see the patch
> HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
> is a limit beyond 4000 bytes?
>
> -ray
>
> Error:
> Caused by: ERROR 22001: A truncation error was encountered trying to shrink
> VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to
> length 767.
>
>
>
>
> On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com>wrote:
>
>> Hi Ray,
>>
>> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
>> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that
>> you're bumping into this limitation (assuming you abbreviated the column
>> names in your example).
>>
>>
>> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com> wrote:
>>
>>> That exception is coming from the metastore (trying to write the table
>>> definition).  Could you dig down into the Hive logs to see if you can get
>>> the underlying cause?
>>>
>>>  You can get the logs to spew on console by adding "-hiveconf
>>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>>
>>> JVS
>>>
>>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to map a Hbase table in Hive that contains large number of
>>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>>> integration have any set limitation on the number of columns it can map in
>>> one table?  I seem to hit a limit at 10 columns.
>>>
>>> Thanks,
>>> -ray
>>>
>>> create external table hbase_t1
>>> (
>>> key string,
>>> f1_a string,
>>> f2_a string,
>>> f1_b string,
>>> f2_b string,
>>> ...
>>> ...
>>> f1_m string,
>>> f2_m string,
>>>
>>>  )
>>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>>> )
>>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>>
>>> Error Message:
>>>
>>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
>>> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
>>> VALUES (?,?,?)
>>> NestedThrowables:
>>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
>>> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Hive-Hbase with large number of columns

Posted by Ray Duong <ra...@gmail.com>.
Thank John/Carl,

Yep, there seems to be a limit on the 767 byte size.  So I see the patch
HIVE-1364 to set it to 4000 bytes.  I'm using Db-derby, do you know if there
is a limit beyond 4000 bytes?

-ray

Error:
Caused by: ERROR 22001: A truncation error was encountered trying to shrink
VARCHAR 'segment:ITC_10#ITC_1001,segment:CITC_10#ITC_1001,segment:ITC&' to
length 767.



On Tue, Jun 15, 2010 at 12:26 PM, Carl Steinbach <ca...@cloudera.com> wrote:

> Hi Ray,
>
> There is currently a 767 byte size limit on SERDEPROPERTIES values (see
> http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're
> bumping into this limitation (assuming you abbreviated the column names in
> your example).
>
>
> On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com> wrote:
>
>> That exception is coming from the metastore (trying to write the table
>> definition).  Could you dig down into the Hive logs to see if you can get
>> the underlying cause?
>>
>> You can get the logs to spew on console by adding "-hiveconf
>> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>>
>> JVS
>>
>> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>>
>> Hi,
>>
>> I'm trying to map a Hbase table in Hive that contains large number of
>> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
>> integration have any set limitation on the number of columns it can map in
>> one table?  I seem to hit a limit at 10 columns.
>>
>> Thanks,
>> -ray
>>
>> create external table hbase_t1
>> (
>> key string,
>> f1_a string,
>> f2_a string,
>> f1_b string,
>> f2_b string,
>> ...
>> ...
>> f1_m string,
>> f2_m string,
>>
>>  )
>>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
>> )
>>  TBLPROPERTIES("hbase.table.name" = "t1");
>>
>> Error Message:
>>
>> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
>> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
>> VALUES (?,?,?)
>> NestedThrowables:
>> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
>> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>>
>>
>>
>

Re: Hive-Hbase with large number of columns

Posted by Carl Steinbach <ca...@cloudera.com>.
Hi Ray,

There is currently a 767 byte size limit on SERDEPROPERTIES values (see
http://issues.apache.org/jira/browse/HIVE-1364). It's possible that you're
bumping into this limitation (assuming you abbreviated the column names in
your example).

On Tue, Jun 15, 2010 at 12:03 PM, John Sichi <js...@facebook.com> wrote:

> That exception is coming from the metastore (trying to write the table
> definition).  Could you dig down into the Hive logs to see if you can get
> the underlying cause?
>
> You can get the logs to spew on console by adding "-hiveconf
> hive.root.logger=DEBUG,console" to your Hive CLI invocation.
>
> JVS
>
> On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:
>
> Hi,
>
> I'm trying to map a Hbase table in Hive that contains large number of
> columns.  Since Hbase is designed to be a wide table, does Hive/Hbase
> integration have any set limitation on the number of columns it can map in
> one table?  I seem to hit a limit at 10 columns.
>
> Thanks,
> -ray
>
> create external table hbase_t1
> (
> key string,
> f1_a string,
> f2_a string,
> f1_b string,
> f2_b string,
> ...
> ...
> f1_m string,
> f2_m string,
>
>  )
>  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m"
> )
>  TBLPROPERTIES("hbase.table.name" = "t1");
>
> Error Message:
>
> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request
> failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`)
> VALUES (?,?,?)
> NestedThrowables:
> org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT
> INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>

Re: Hive-Hbase with large number of columns

Posted by John Sichi <js...@facebook.com>.
That exception is coming from the metastore (trying to write the table definition).  Could you dig down into the Hive logs to see if you can get the underlying cause?

You can get the logs to spew on console by adding "-hiveconf hive.root.logger=DEBUG,console" to your Hive CLI invocation.

JVS

On Jun 15, 2010, at 11:57 AM, Ray Duong wrote:

Hi,

I'm trying to map a Hbase table in Hive that contains large number of columns.  Since Hbase is designed to be a wide table, does Hive/Hbase integration have any set limitation on the number of columns it can map in one table?  I seem to hit a limit at 10 columns.

Thanks,
-ray

create external table hbase_t1
(
key string,
f1_a string,
f2_a string,
f1_b string,
f2_b string,
...
...
f1_m string,
f2_m string,

 )
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:a,f2:a,f1:b,f2:b,f1:c,f2:c,f1:d,f2:d,f1:e,f2:e,f1:f,f2:f,f1:g,f2:g,f1:h,f2:h,f1:i,f2:i,f1:j,f2:j,f1:k,f2:k,f1:l,f2:l,f1:m,f2:m" )
 TBLPROPERTIES("hbase.table.name<http://hbase.table.name/>" = "t1");

Error Message:

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request failed : INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask