You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Connell, Chuck" <Ch...@nuance.com> on 2012/10/08 22:37:09 UTC

Any advice about complex Hive tables?

(Follow up to the thread below...)

I have a complex Hive table -- many fields, many nested structs. Hive fails to create the table at all. I can't even start to load data or query the data. Anyone else run into this? It seems to be a showstopper to using Hive for us, which is a shame since everything else about Hive is just what we want.

Any suggestions? Workarounds?

Thanks,
Chuck


________________________________________
From: Connell, Chuck
Sent: Thursday, October 04, 2012 4:31 PM
To: user@hive.apache.org
Subject: RE: Limit to columns or nesting of Hive table?

The issue apparently is not just the number of levels of nesting. I just created a Hive table with 20 levels of structs within each other. It created fine. This is more levels than the table that was failing for me. The failing table had many more fields throughout the levels.

Chuck



-----Original Message-----
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Thursday, October 04, 2012 12:09 PM
To: user@hive.apache.org
Subject: RE: Limit to columns or nesting of Hive table?

Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that this limit cannot easily be raised?

Chuck

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Thursday, October 04, 2012 11:57 AM
To: user@hive.apache.org
Subject: Re: Limit to columns or nesting of Hive table?

There is an open jira ticket on this. There is a hard coded limit but it could be raised with some mostly minor code changes.

One of the bigger problems is that hive stores the definition of a column in JDBC "column" and for some databases larger nested structs can case issues.

Edward

On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com> wrote:
> I am trying to create a large Hive table, with many columns and deeply
> nested structs. It is failing with java.lang.ArrayIndexOutOfBoundsException:
> 10.
>
>
>
> Before I spend a lot of time debugging my table declaration, is there
> some limit here I should know about? Max number of columns? Max depth
> of struct nesting?
>
>
>
> Thanks,
>
> Chuck
>
>

Re: Any advice about complex Hive tables?

Posted by Sadananda Hegde <sa...@gmail.com>.
Thanks, Chuck.

On Sat, Oct 13, 2012 at 7:50 PM, Connell, Chuck <Ch...@nuance.com>wrote:

>  You can get the jar from the "Downloads" link on the serde's home page.
> It is on the right side of the page.
>
>
>  ------------------------------
> *From:* Sadananda Hegde [saduhegde@gmail.com]
> *Sent:* Saturday, October 13, 2012 8:31 PM
> *To:* user@hive.apache.org
>
> *Subject:* Re: Any advice about complex Hive tables?
>
>   Thanks, Chuck.
>
>
> But I am getting the following error when I execute the CREATE TABLE statement.
>
>  'FAILED: Error in metadata: Cannot validate serde: org.openx.data.jsonserde.JsonSerDe"
>
>
> Am I supposed to down load / add any jar file to hive server?
>
> Thanks for your help.
>
> Sadu
>
>  On Fri, Oct 12, 2012 at 9:17 AM, Connell, Chuck <Chuck.Connell@nuance.com
> > wrote:
>
>>  Sadu,****
>>
>> ** **
>>
>> I am using JSON as the input format, with the JSON SerDe from
>> https://github.com/rcongiu/Hive-JSON-Serde. ****
>>
>> ** **
>>
>> A sample JSON record is:  (in actual use each JSON record must be on one
>> line only).****
>>
>> ** **
>>
>> {****
>>
>> "field1":"hello",****
>>
>> "field2":123456,****
>>
>> "field3":1234.5678,****
>>
>> "field4":true,****
>>
>> "field5":{"field5a":"embedded 1", "field5b":44, "field5c":4.44,
>> "field5d":false, "field5e":[12,13,14]},****
>>
>> "field6":[2, 3, 4, 5, 6],****
>>
>> "field7":[4.33, 5.33, 6.33],****
>>
>> "field8":["one", "two", "three"],****
>>
>> "field9":[[1,2,3], [4,5], [6,7,8]],****
>>
>> "field10":[["smith","jones"], ["bob", "bill"]],****
>>
>> "field11":[{"f11a":"one", "f11b":"two"}, {"f11a":"three", "f11b":"four"}]
>> ****
>>
>> }****
>>
>> ** **
>>
>> My table definition is:  (ignore the fact that the fields are listed out
>> of order, this does not matter)****
>>
>> ** **
>>
>> CREATE TABLE tt1****
>>
>> (****
>>
>> field8 ARRAY<STRING>,****
>>
>> field9 ARRAY<ARRAY<INT>>,****
>>
>> field2 INT,****
>>
>> field3 DOUBLE,****
>>
>> field1 STRING,****
>>
>> field6 ARRAY<INT>,****
>>
>> field7 ARRAY<DOUBLE>,****
>>
>> field4 BOOLEAN,****
>>
>> field5 STRUCT****
>>
>> <** **
>>
>> field5d:BOOLEAN,****
>>
>> field5e:ARRAY<INT>,****
>>
>> field5a:STRING,****
>>
>> field5b:INT,****
>>
>> field5c:DOUBLE****
>>
>> >,****
>>
>> field10 ARRAY<ARRAY<STRING>>,****
>>
>> field11 ARRAY<STRUCT****
>>
>> <** **
>>
>> f11a:STRING,****
>>
>> f11b:STRING****
>>
>> >>** **
>>
>> )****
>>
>> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'****
>>
>> WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")****
>>
>> STORED AS TEXTFILE;****
>>
>> ** **
>>
>> This small sample actually works great!  The problem is when I try to
>> scale up to larger more complex tables.****
>>
>> ** **
>>
>> (Note the latest version of this SerDe has a bug related to number
>> formats. You should use 1.1.3 and use only INTs and DOUBLEs.)****
>>
>> ** **
>>
>> Chuck****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Sadananda Hegde [mailto:saduhegde@gmail.com]
>> *Sent:* Thursday, October 11, 2012 11:47 PM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Any advice about complex Hive tables?****
>>
>> ** **
>>
>> Hi Chuck,****
>>
>>  ****
>>
>> I have a similar complex hive tables with many fields and some are nested
>> like array of structs (but only upto 3 levels). How did you define you ROW
>> FORMAT as to separate the items? The COLLECTION ITEMS TERMINATED BY clause
>> works only for the first level.How did you handle level 2 , 3, etc? Is it
>> through SERDE FORMATs?  Could you
>>  share your CREATE TABLE statement?  I am having problem correctly
>> defining my DDL to load the data file correctly.****
>>
>>  ****
>>
>> Much appreciated.****
>>
>>  ****
>>
>> Thanks,****
>>
>> Sadu****
>>
>> On Mon, Oct 8, 2012 at 3:37 PM, Connell, Chuck <Ch...@nuance.com>
>> wrote:****
>>
>> (Follow up to the thread below...)
>>
>> I have a complex Hive table -- many fields, many nested structs. Hive
>> fails to create the table at all. I can't even start to load data or query
>> the data. Anyone else run into this? It seems to be a showstopper to using
>> Hive for us, which is a shame since everything else about Hive is just what
>> we want.
>>
>> Any suggestions? Workarounds?
>>
>> Thanks,
>> Chuck
>>
>>
>> ________________________________________
>> From: Connell, Chuck
>> Sent: Thursday, October 04, 2012 4:31 PM
>> To: user@hive.apache.org
>> Subject: RE: Limit to columns or nesting of Hive table?
>>
>> The issue apparently is not just the number of levels of nesting. I just
>> created a Hive table with 20 levels of structs within each other. It
>> created fine. This is more levels than the table that was failing for me.
>> The failing table had many more fields throughout the levels.
>>
>> Chuck
>>
>>
>>
>> -----Original Message-----
>> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
>> Sent: Thursday, October 04, 2012 12:09 PM
>> To: user@hive.apache.org
>> Subject: RE: Limit to columns or nesting of Hive table?
>>
>> Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that
>> this limit cannot easily be raised?
>>
>> Chuck
>>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> Sent: Thursday, October 04, 2012 11:57 AM
>> To: user@hive.apache.org
>> Subject: Re: Limit to columns or nesting of Hive table?
>>
>> There is an open jira ticket on this. There is a hard coded limit but it
>> could be raised with some mostly minor code changes.
>>
>> One of the bigger problems is that hive stores the definition of a column
>> in JDBC "column" and for some databases larger nested structs can case
>> issues.
>>
>> Edward
>>
>> On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com>
>> wrote:
>> > I am trying to create a large Hive table, with many columns and deeply
>> > nested structs. It is failing with
>> java.lang.ArrayIndexOutOfBoundsException:
>> > 10.
>> >
>> >
>> >
>> > Before I spend a lot of time debugging my table declaration, is there
>> > some limit here I should know about? Max number of columns? Max depth
>> > of struct nesting?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Chuck
>> >
>> >****
>>
>> ** **
>>
>
>

RE: Any advice about complex Hive tables?

Posted by "Connell, Chuck" <Ch...@nuance.com>.
You can get the jar from the "Downloads" link on the serde's home page. It is on the right side of the page.


________________________________
From: Sadananda Hegde [saduhegde@gmail.com]
Sent: Saturday, October 13, 2012 8:31 PM
To: user@hive.apache.org
Subject: Re: Any advice about complex Hive tables?

Thanks, Chuck.


But I am getting the following error when I execute the CREATE TABLE statement.

 'FAILED: Error in metadata: Cannot validate serde: org.openx.data.jsonserde.JsonSerDe"


Am I supposed to down load / add any jar file to hive server?

Thanks for your help.

Sadu

On Fri, Oct 12, 2012 at 9:17 AM, Connell, Chuck <Ch...@nuance.com>> wrote:
Sadu,

I am using JSON as the input format, with the JSON SerDe from https://github.com/rcongiu/Hive-JSON-Serde.

A sample JSON record is:  (in actual use each JSON record must be on one line only).

{
"field1":"hello",
"field2":123456,
"field3":1234.5678,
"field4":true,
"field5":{"field5a":"embedded 1", "field5b":44, "field5c":4.44, "field5d":false, "field5e":[12,13,14]},
"field6":[2, 3, 4, 5, 6],
"field7":[4.33, 5.33, 6.33],
"field8":["one", "two", "three"],
"field9":[[1,2,3], [4,5], [6,7,8]],
"field10":[["smith","jones"], ["bob", "bill"]],
"field11":[{"f11a":"one", "f11b":"two"}, {"f11a":"three", "f11b":"four"}]
}

My table definition is:  (ignore the fact that the fields are listed out of order, this does not matter)

CREATE TABLE tt1
(
field8 ARRAY<STRING>,
field9 ARRAY<ARRAY<INT>>,
field2 INT,
field3 DOUBLE,
field1 STRING,
field6 ARRAY<INT>,
field7 ARRAY<DOUBLE>,
field4 BOOLEAN,
field5 STRUCT
<
field5d:BOOLEAN,
field5e:ARRAY<INT>,
field5a:STRING,
field5b:INT,
field5c:DOUBLE
>,
field10 ARRAY<ARRAY<STRING>>,
field11 ARRAY<STRUCT
<
f11a:STRING,
f11b:STRING
>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
STORED AS TEXTFILE;

This small sample actually works great!  The problem is when I try to scale up to larger more complex tables.

(Note the latest version of this SerDe has a bug related to number formats. You should use 1.1.3 and use only INTs and DOUBLEs.)

Chuck


From: Sadananda Hegde [mailto:saduhegde@gmail.com<ma...@gmail.com>]
Sent: Thursday, October 11, 2012 11:47 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Any advice about complex Hive tables?

Hi Chuck,

I have a similar complex hive tables with many fields and some are nested like array of structs (but only upto 3 levels). How did you define you ROW FORMAT as to separate the items? The COLLECTION ITEMS TERMINATED BY clause works only for the first level.How did you handle level 2 , 3, etc? Is it through SERDE FORMATs?  Could you
 share your CREATE TABLE statement?  I am having problem correctly defining my DDL to load the data file correctly.

Much appreciated.

Thanks,
Sadu
On Mon, Oct 8, 2012 at 3:37 PM, Connell, Chuck <Ch...@nuance.com>> wrote:
(Follow up to the thread below...)

I have a complex Hive table -- many fields, many nested structs. Hive fails to create the table at all. I can't even start to load data or query the data. Anyone else run into this? It seems to be a showstopper to using Hive for us, which is a shame since everything else about Hive is just what we want.

Any suggestions? Workarounds?

Thanks,
Chuck


________________________________________
From: Connell, Chuck
Sent: Thursday, October 04, 2012 4:31 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: Limit to columns or nesting of Hive table?

The issue apparently is not just the number of levels of nesting. I just created a Hive table with 20 levels of structs within each other. It created fine. This is more levels than the table that was failing for me. The failing table had many more fields throughout the levels.

Chuck



-----Original Message-----
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com<ma...@nuance.com>]
Sent: Thursday, October 04, 2012 12:09 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: Limit to columns or nesting of Hive table?

Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that this limit cannot easily be raised?

Chuck

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com<ma...@gmail.com>]
Sent: Thursday, October 04, 2012 11:57 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Limit to columns or nesting of Hive table?

There is an open jira ticket on this. There is a hard coded limit but it could be raised with some mostly minor code changes.

One of the bigger problems is that hive stores the definition of a column in JDBC "column" and for some databases larger nested structs can case issues.

Edward

On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com>> wrote:
> I am trying to create a large Hive table, with many columns and deeply
> nested structs. It is failing with java.lang.ArrayIndexOutOfBoundsException:
> 10.
>
>
>
> Before I spend a lot of time debugging my table declaration, is there
> some limit here I should know about? Max number of columns? Max depth
> of struct nesting?
>
>
>
> Thanks,
>
> Chuck
>
>



RE: Any advice about complex Hive tables?

Posted by "Connell, Chuck" <Ch...@nuance.com>.
Put the serde jar in some place like /usr/lib/hive/extra_libs, then start Hive as "hive --auxpath /usr/lib/hive/extra_libs"

Chuck


________________________________
From: Sadananda Hegde [saduhegde@gmail.com]
Sent: Saturday, October 13, 2012 8:31 PM
To: user@hive.apache.org
Subject: Re: Any advice about complex Hive tables?

Thanks, Chuck.


But I am getting the following error when I execute the CREATE TABLE statement.

 'FAILED: Error in metadata: Cannot validate serde: org.openx.data.jsonserde.JsonSerDe"


Am I supposed to down load / add any jar file to hive server?

Thanks for your help.

Sadu

On Fri, Oct 12, 2012 at 9:17 AM, Connell, Chuck <Ch...@nuance.com>> wrote:
Sadu,

I am using JSON as the input format, with the JSON SerDe from https://github.com/rcongiu/Hive-JSON-Serde.

A sample JSON record is:  (in actual use each JSON record must be on one line only).

{
"field1":"hello",
"field2":123456,
"field3":1234.5678,
"field4":true,
"field5":{"field5a":"embedded 1", "field5b":44, "field5c":4.44, "field5d":false, "field5e":[12,13,14]},
"field6":[2, 3, 4, 5, 6],
"field7":[4.33, 5.33, 6.33],
"field8":["one", "two", "three"],
"field9":[[1,2,3], [4,5], [6,7,8]],
"field10":[["smith","jones"], ["bob", "bill"]],
"field11":[{"f11a":"one", "f11b":"two"}, {"f11a":"three", "f11b":"four"}]
}

My table definition is:  (ignore the fact that the fields are listed out of order, this does not matter)

CREATE TABLE tt1
(
field8 ARRAY<STRING>,
field9 ARRAY<ARRAY<INT>>,
field2 INT,
field3 DOUBLE,
field1 STRING,
field6 ARRAY<INT>,
field7 ARRAY<DOUBLE>,
field4 BOOLEAN,
field5 STRUCT
<
field5d:BOOLEAN,
field5e:ARRAY<INT>,
field5a:STRING,
field5b:INT,
field5c:DOUBLE
>,
field10 ARRAY<ARRAY<STRING>>,
field11 ARRAY<STRUCT
<
f11a:STRING,
f11b:STRING
>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
STORED AS TEXTFILE;

This small sample actually works great!  The problem is when I try to scale up to larger more complex tables.

(Note the latest version of this SerDe has a bug related to number formats. You should use 1.1.3 and use only INTs and DOUBLEs.)

Chuck


From: Sadananda Hegde [mailto:saduhegde@gmail.com<ma...@gmail.com>]
Sent: Thursday, October 11, 2012 11:47 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Any advice about complex Hive tables?

Hi Chuck,

I have a similar complex hive tables with many fields and some are nested like array of structs (but only upto 3 levels). How did you define you ROW FORMAT as to separate the items? The COLLECTION ITEMS TERMINATED BY clause works only for the first level.How did you handle level 2 , 3, etc? Is it through SERDE FORMATs?  Could you
 share your CREATE TABLE statement?  I am having problem correctly defining my DDL to load the data file correctly.

Much appreciated.

Thanks,
Sadu
On Mon, Oct 8, 2012 at 3:37 PM, Connell, Chuck <Ch...@nuance.com>> wrote:
(Follow up to the thread below...)

I have a complex Hive table -- many fields, many nested structs. Hive fails to create the table at all. I can't even start to load data or query the data. Anyone else run into this? It seems to be a showstopper to using Hive for us, which is a shame since everything else about Hive is just what we want.

Any suggestions? Workarounds?

Thanks,
Chuck


________________________________________
From: Connell, Chuck
Sent: Thursday, October 04, 2012 4:31 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: Limit to columns or nesting of Hive table?

The issue apparently is not just the number of levels of nesting. I just created a Hive table with 20 levels of structs within each other. It created fine. This is more levels than the table that was failing for me. The failing table had many more fields throughout the levels.

Chuck



-----Original Message-----
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com<ma...@nuance.com>]
Sent: Thursday, October 04, 2012 12:09 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: Limit to columns or nesting of Hive table?

Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that this limit cannot easily be raised?

Chuck

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com<ma...@gmail.com>]
Sent: Thursday, October 04, 2012 11:57 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Limit to columns or nesting of Hive table?

There is an open jira ticket on this. There is a hard coded limit but it could be raised with some mostly minor code changes.

One of the bigger problems is that hive stores the definition of a column in JDBC "column" and for some databases larger nested structs can case issues.

Edward

On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com>> wrote:
> I am trying to create a large Hive table, with many columns and deeply
> nested structs. It is failing with java.lang.ArrayIndexOutOfBoundsException:
> 10.
>
>
>
> Before I spend a lot of time debugging my table declaration, is there
> some limit here I should know about? Max number of columns? Max depth
> of struct nesting?
>
>
>
> Thanks,
>
> Chuck
>
>



Re: Any advice about complex Hive tables?

Posted by Sadananda Hegde <sa...@gmail.com>.
Thanks, Chuck.


But I am getting the following error when I execute the CREATE TABLE statement.

 'FAILED: Error in metadata: Cannot validate serde:
org.openx.data.jsonserde.JsonSerDe"


Am I supposed to down load / add any jar file to hive server?

Thanks for your help.

Sadu

On Fri, Oct 12, 2012 at 9:17 AM, Connell, Chuck <Ch...@nuance.com>wrote:

>  Sadu,****
>
> ** **
>
> I am using JSON as the input format, with the JSON SerDe from
> https://github.com/rcongiu/Hive-JSON-Serde. ****
>
> ** **
>
> A sample JSON record is:  (in actual use each JSON record must be on one
> line only).****
>
> ** **
>
> {****
>
> "field1":"hello",****
>
> "field2":123456,****
>
> "field3":1234.5678,****
>
> "field4":true,****
>
> "field5":{"field5a":"embedded 1", "field5b":44, "field5c":4.44,
> "field5d":false, "field5e":[12,13,14]},****
>
> "field6":[2, 3, 4, 5, 6],****
>
> "field7":[4.33, 5.33, 6.33],****
>
> "field8":["one", "two", "three"],****
>
> "field9":[[1,2,3], [4,5], [6,7,8]],****
>
> "field10":[["smith","jones"], ["bob", "bill"]],****
>
> "field11":[{"f11a":"one", "f11b":"two"}, {"f11a":"three", "f11b":"four"}]*
> ***
>
> }****
>
> ** **
>
> My table definition is:  (ignore the fact that the fields are listed out
> of order, this does not matter)****
>
> ** **
>
> CREATE TABLE tt1****
>
> (****
>
> field8 ARRAY<STRING>,****
>
> field9 ARRAY<ARRAY<INT>>,****
>
> field2 INT,****
>
> field3 DOUBLE,****
>
> field1 STRING,****
>
> field6 ARRAY<INT>,****
>
> field7 ARRAY<DOUBLE>,****
>
> field4 BOOLEAN,****
>
> field5 STRUCT****
>
> <** **
>
> field5d:BOOLEAN,****
>
> field5e:ARRAY<INT>,****
>
> field5a:STRING,****
>
> field5b:INT,****
>
> field5c:DOUBLE****
>
> >,****
>
> field10 ARRAY<ARRAY<STRING>>,****
>
> field11 ARRAY<STRUCT****
>
> <** **
>
> f11a:STRING,****
>
> f11b:STRING****
>
> >>** **
>
> )****
>
> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'****
>
> WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")****
>
> STORED AS TEXTFILE;****
>
> ** **
>
> This small sample actually works great!  The problem is when I try to
> scale up to larger more complex tables.****
>
> ** **
>
> (Note the latest version of this SerDe has a bug related to number
> formats. You should use 1.1.3 and use only INTs and DOUBLEs.)****
>
> ** **
>
> Chuck****
>
> ** **
>
> ** **
>
> *From:* Sadananda Hegde [mailto:saduhegde@gmail.com]
> *Sent:* Thursday, October 11, 2012 11:47 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Any advice about complex Hive tables?****
>
> ** **
>
> Hi Chuck,****
>
>  ****
>
> I have a similar complex hive tables with many fields and some are nested
> like array of structs (but only upto 3 levels). How did you define you ROW
> FORMAT as to separate the items? The COLLECTION ITEMS TERMINATED BY clause
> works only for the first level.How did you handle level 2 , 3, etc? Is it
> through SERDE FORMATs?  Could you
>  share your CREATE TABLE statement?  I am having problem correctly
> defining my DDL to load the data file correctly.****
>
>  ****
>
> Much appreciated.****
>
>  ****
>
> Thanks,****
>
> Sadu****
>
> On Mon, Oct 8, 2012 at 3:37 PM, Connell, Chuck <Ch...@nuance.com>
> wrote:****
>
> (Follow up to the thread below...)
>
> I have a complex Hive table -- many fields, many nested structs. Hive
> fails to create the table at all. I can't even start to load data or query
> the data. Anyone else run into this? It seems to be a showstopper to using
> Hive for us, which is a shame since everything else about Hive is just what
> we want.
>
> Any suggestions? Workarounds?
>
> Thanks,
> Chuck
>
>
> ________________________________________
> From: Connell, Chuck
> Sent: Thursday, October 04, 2012 4:31 PM
> To: user@hive.apache.org
> Subject: RE: Limit to columns or nesting of Hive table?
>
> The issue apparently is not just the number of levels of nesting. I just
> created a Hive table with 20 levels of structs within each other. It
> created fine. This is more levels than the table that was failing for me.
> The failing table had many more fields throughout the levels.
>
> Chuck
>
>
>
> -----Original Message-----
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Thursday, October 04, 2012 12:09 PM
> To: user@hive.apache.org
> Subject: RE: Limit to columns or nesting of Hive table?
>
> Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that
> this limit cannot easily be raised?
>
> Chuck
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Thursday, October 04, 2012 11:57 AM
> To: user@hive.apache.org
> Subject: Re: Limit to columns or nesting of Hive table?
>
> There is an open jira ticket on this. There is a hard coded limit but it
> could be raised with some mostly minor code changes.
>
> One of the bigger problems is that hive stores the definition of a column
> in JDBC "column" and for some databases larger nested structs can case
> issues.
>
> Edward
>
> On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com>
> wrote:
> > I am trying to create a large Hive table, with many columns and deeply
> > nested structs. It is failing with
> java.lang.ArrayIndexOutOfBoundsException:
> > 10.
> >
> >
> >
> > Before I spend a lot of time debugging my table declaration, is there
> > some limit here I should know about? Max number of columns? Max depth
> > of struct nesting?
> >
> >
> >
> > Thanks,
> >
> > Chuck
> >
> >****
>
> ** **
>

RE: Any advice about complex Hive tables?

Posted by "Connell, Chuck" <Ch...@nuance.com>.
Sadu,

I am using JSON as the input format, with the JSON SerDe from https://github.com/rcongiu/Hive-JSON-Serde.

A sample JSON record is:  (in actual use each JSON record must be on one line only).

{
"field1":"hello",
"field2":123456,
"field3":1234.5678,
"field4":true,
"field5":{"field5a":"embedded 1", "field5b":44, "field5c":4.44, "field5d":false, "field5e":[12,13,14]},
"field6":[2, 3, 4, 5, 6],
"field7":[4.33, 5.33, 6.33],
"field8":["one", "two", "three"],
"field9":[[1,2,3], [4,5], [6,7,8]],
"field10":[["smith","jones"], ["bob", "bill"]],
"field11":[{"f11a":"one", "f11b":"two"}, {"f11a":"three", "f11b":"four"}]
}

My table definition is:  (ignore the fact that the fields are listed out of order, this does not matter)

CREATE TABLE tt1
(
field8 ARRAY<STRING>,
field9 ARRAY<ARRAY<INT>>,
field2 INT,
field3 DOUBLE,
field1 STRING,
field6 ARRAY<INT>,
field7 ARRAY<DOUBLE>,
field4 BOOLEAN,
field5 STRUCT
<
field5d:BOOLEAN,
field5e:ARRAY<INT>,
field5a:STRING,
field5b:INT,
field5c:DOUBLE
>,
field10 ARRAY<ARRAY<STRING>>,
field11 ARRAY<STRUCT
<
f11a:STRING,
f11b:STRING
>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
STORED AS TEXTFILE;

This small sample actually works great!  The problem is when I try to scale up to larger more complex tables.

(Note the latest version of this SerDe has a bug related to number formats. You should use 1.1.3 and use only INTs and DOUBLEs.)

Chuck


From: Sadananda Hegde [mailto:saduhegde@gmail.com]
Sent: Thursday, October 11, 2012 11:47 PM
To: user@hive.apache.org
Subject: Re: Any advice about complex Hive tables?

Hi Chuck,

I have a similar complex hive tables with many fields and some are nested like array of structs (but only upto 3 levels). How did you define you ROW FORMAT as to separate the items? The COLLECTION ITEMS TERMINATED BY clause works only for the first level.How did you handle level 2 , 3, etc? Is it through SERDE FORMATs?  Could you
 share your CREATE TABLE statement?  I am having problem correctly defining my DDL to load the data file correctly.

Much appreciated.

Thanks,
Sadu
On Mon, Oct 8, 2012 at 3:37 PM, Connell, Chuck <Ch...@nuance.com>> wrote:
(Follow up to the thread below...)

I have a complex Hive table -- many fields, many nested structs. Hive fails to create the table at all. I can't even start to load data or query the data. Anyone else run into this? It seems to be a showstopper to using Hive for us, which is a shame since everything else about Hive is just what we want.

Any suggestions? Workarounds?

Thanks,
Chuck


________________________________________
From: Connell, Chuck
Sent: Thursday, October 04, 2012 4:31 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: Limit to columns or nesting of Hive table?

The issue apparently is not just the number of levels of nesting. I just created a Hive table with 20 levels of structs within each other. It created fine. This is more levels than the table that was failing for me. The failing table had many more fields throughout the levels.

Chuck



-----Original Message-----
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com<ma...@nuance.com>]
Sent: Thursday, October 04, 2012 12:09 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: Limit to columns or nesting of Hive table?

Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that this limit cannot easily be raised?

Chuck

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com<ma...@gmail.com>]
Sent: Thursday, October 04, 2012 11:57 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Limit to columns or nesting of Hive table?

There is an open jira ticket on this. There is a hard coded limit but it could be raised with some mostly minor code changes.

One of the bigger problems is that hive stores the definition of a column in JDBC "column" and for some databases larger nested structs can case issues.

Edward

On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com>> wrote:
> I am trying to create a large Hive table, with many columns and deeply
> nested structs. It is failing with java.lang.ArrayIndexOutOfBoundsException:
> 10.
>
>
>
> Before I spend a lot of time debugging my table declaration, is there
> some limit here I should know about? Max number of columns? Max depth
> of struct nesting?
>
>
>
> Thanks,
>
> Chuck
>
>


Re: Any advice about complex Hive tables?

Posted by Sadananda Hegde <sa...@gmail.com>.
Hi Chuck,

I have a similar complex hive tables with many fields and some are nested
like array of structs (but only upto 3 levels). How did you define you ROW
FORMAT as to separate the items? The COLLECTION ITEMS TERMINATED BY clause
works only for the first level.How did you handle level 2 , 3, etc? Is it
through SERDE FORMATs?  Could you
 share your CREATE TABLE statement?  I am having problem correctly defining
my DDL to load the data file correctly.

Much appreciated.

Thanks,
Sadu
On Mon, Oct 8, 2012 at 3:37 PM, Connell, Chuck <Ch...@nuance.com>wrote:

> (Follow up to the thread below...)
>
> I have a complex Hive table -- many fields, many nested structs. Hive
> fails to create the table at all. I can't even start to load data or query
> the data. Anyone else run into this? It seems to be a showstopper to using
> Hive for us, which is a shame since everything else about Hive is just what
> we want.
>
> Any suggestions? Workarounds?
>
> Thanks,
> Chuck
>
>
> ________________________________________
> From: Connell, Chuck
> Sent: Thursday, October 04, 2012 4:31 PM
> To: user@hive.apache.org
> Subject: RE: Limit to columns or nesting of Hive table?
>
> The issue apparently is not just the number of levels of nesting. I just
> created a Hive table with 20 levels of structs within each other. It
> created fine. This is more levels than the table that was failing for me.
> The failing table had many more fields throughout the levels.
>
> Chuck
>
>
>
> -----Original Message-----
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Thursday, October 04, 2012 12:09 PM
> To: user@hive.apache.org
> Subject: RE: Limit to columns or nesting of Hive table?
>
> Thanks. So is the nesting limit 10 now? Does your 2nd paragraph mean that
> this limit cannot easily be raised?
>
> Chuck
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Thursday, October 04, 2012 11:57 AM
> To: user@hive.apache.org
> Subject: Re: Limit to columns or nesting of Hive table?
>
> There is an open jira ticket on this. There is a hard coded limit but it
> could be raised with some mostly minor code changes.
>
> One of the bigger problems is that hive stores the definition of a column
> in JDBC "column" and for some databases larger nested structs can case
> issues.
>
> Edward
>
> On Thu, Oct 4, 2012 at 11:48 AM, Connell, Chuck <Ch...@nuance.com>
> wrote:
> > I am trying to create a large Hive table, with many columns and deeply
> > nested structs. It is failing with
> java.lang.ArrayIndexOutOfBoundsException:
> > 10.
> >
> >
> >
> > Before I spend a lot of time debugging my table declaration, is there
> > some limit here I should know about? Max number of columns? Max depth
> > of struct nesting?
> >
> >
> >
> > Thanks,
> >
> > Chuck
> >
> >
>