You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Dilip Joseph <di...@gmail.com> on 2010/03/22 21:50:06 UTC
support for arrays, maps, structs while writing output of custom
reduce script to table
Hello,
Does Hive currently support arrays, maps, structs while using custom
reduce/map scripts? 'myreduce.py' in the example below produces an
array of structs delimited by \2s and \3s.
CREATE TABLE SS (
a INT,
b INT,
vals ARRAY<STRUCT<x:INT, y:STRING>>
);
FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
INSERT OVERWRITE TABLE SS
REDUCE *
USING 'myreduce.py'
AS
(a,b, vals)
;
However, the query is failing with the following error message, even
before the script is executed:
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from string to array<struct<x:int,y:string>>.
I saw a discussion about this in
http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
dated over a year ago. Just wondering if there have been any updates.
Thanks,
Dilip
Re: support for arrays, maps, structs while writing output of custom
reduce script to table
Posted by Dilip Joseph <di...@gmail.com>.
Opened JIRA https://issues.apache.org/jira/browse/HIVE-1271
Dilip
On Mon, Mar 22, 2010 at 3:26 PM, Zheng Shao <zs...@gmail.com> wrote:
> Great!
>
> This is a bug. Hive field names should be case-insensitive. Can you
> open a JIRA for that?
>
> Zheng
> On Mon, Mar 22, 2010 at 2:43 PM, Dilip Joseph
> <di...@gmail.com> wrote:
>> Thanks Zheng, That worked.
>>
>> It appears that the type information is converted to lower case before
>> comparison. The following statements where "userId" is used as a
>> field name failed.
>>
>> hive> CREATE TABLE SS (
>> > a INT,
>> > b INT,
>> > vals ARRAY<STRUCT<userId:INT, y:STRING>>
>> > );
>> OK
>> Time taken: 0.309 seconds
>> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> > INSERT OVERWRITE TABLE SS
>> > REDUCE *
>> > USING 'myreduce.py'
>> > AS
>> > (a INT,
>> > b INT,
>> > vals ARRAY<STRUCT<userId:INT, y:STRING>>
>> > )
>> > ;
>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>> target table because column number/types are different SS: Cannot
>> convert column 2 from array<struct<userId:int,y:string>> to
>> array<struct<userid:int,y:string>>.
>>
>> The same queries worked fine after changing "userId" to "userid".
>>
>> Dilip
>>
>> On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao <zs...@gmail.com> wrote:
>>> From 0.5 (probably), we can add type information to the column names after "AS".
>>> Note that the first level separator should be TAB, and the second
>>> separator should be ^B (and then ^C, etc)
>>>
>>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>>> INSERT OVERWRITE TABLE SS
>>>> REDUCE *
>>>> USING 'myreduce.py'
>>>> AS
>>>> (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>>>> ;
>>>
>>>
>>> On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
>>> <di...@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> Does Hive currently support arrays, maps, structs while using custom
>>>> reduce/map scripts? 'myreduce.py' in the example below produces an
>>>> array of structs delimited by \2s and \3s.
>>>>
>>>> CREATE TABLE SS (
>>>> a INT,
>>>> b INT,
>>>> vals ARRAY<STRUCT<x:INT, y:STRING>>
>>>> );
>>>>
>>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>>> INSERT OVERWRITE TABLE SS
>>>> REDUCE *
>>>> USING 'myreduce.py'
>>>> AS
>>>> (a,b, vals)
>>>> ;
>>>>
>>>> However, the query is failing with the following error message, even
>>>> before the script is executed:
>>>>
>>>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>>>> target table because column number/types are different SS: Cannot
>>>> convert column 2 from string to array<struct<x:int,y:string>>.
>>>>
>>>> I saw a discussion about this in
>>>> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
>>>> dated over a year ago. Just wondering if there have been any updates.
>>>>
>>>> Thanks,
>>>>
>>>> Dilip
>>>>
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>>
>> --
>> _________________________________________
>> Dilip Antony Joseph
>> http://www.marydilip.info
>>
>
>
>
> --
> Yours,
> Zheng
>
--
_________________________________________
Dilip Antony Joseph
http://www.marydilip.info
Re: support for arrays, maps, structs while writing output of custom
reduce script to table
Posted by Zheng Shao <zs...@gmail.com>.
Great!
This is a bug. Hive field names should be case-insensitive. Can you
open a JIRA for that?
Zheng
On Mon, Mar 22, 2010 at 2:43 PM, Dilip Joseph
<di...@gmail.com> wrote:
> Thanks Zheng, That worked.
>
> It appears that the type information is converted to lower case before
> comparison. The following statements where "userId" is used as a
> field name failed.
>
> hive> CREATE TABLE SS (
> > a INT,
> > b INT,
> > vals ARRAY<STRUCT<userId:INT, y:STRING>>
> > );
> OK
> Time taken: 0.309 seconds
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> > INSERT OVERWRITE TABLE SS
> > REDUCE *
> > USING 'myreduce.py'
> > AS
> > (a INT,
> > b INT,
> > vals ARRAY<STRUCT<userId:INT, y:STRING>>
> > )
> > ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array<struct<userId:int,y:string>> to
> array<struct<userid:int,y:string>>.
>
> The same queries worked fine after changing "userId" to "userid".
>
> Dilip
>
> On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao <zs...@gmail.com> wrote:
>> From 0.5 (probably), we can add type information to the column names after "AS".
>> Note that the first level separator should be TAB, and the second
>> separator should be ^B (and then ^C, etc)
>>
>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>> INSERT OVERWRITE TABLE SS
>>> REDUCE *
>>> USING 'myreduce.py'
>>> AS
>>> (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>>> ;
>>
>>
>> On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
>> <di...@gmail.com> wrote:
>>> Hello,
>>>
>>> Does Hive currently support arrays, maps, structs while using custom
>>> reduce/map scripts? 'myreduce.py' in the example below produces an
>>> array of structs delimited by \2s and \3s.
>>>
>>> CREATE TABLE SS (
>>> a INT,
>>> b INT,
>>> vals ARRAY<STRUCT<x:INT, y:STRING>>
>>> );
>>>
>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>> INSERT OVERWRITE TABLE SS
>>> REDUCE *
>>> USING 'myreduce.py'
>>> AS
>>> (a,b, vals)
>>> ;
>>>
>>> However, the query is failing with the following error message, even
>>> before the script is executed:
>>>
>>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>>> target table because column number/types are different SS: Cannot
>>> convert column 2 from string to array<struct<x:int,y:string>>.
>>>
>>> I saw a discussion about this in
>>> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
>>> dated over a year ago. Just wondering if there have been any updates.
>>>
>>> Thanks,
>>>
>>> Dilip
>>>
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> _________________________________________
> Dilip Antony Joseph
> http://www.marydilip.info
>
--
Yours,
Zheng
Re: support for arrays, maps, structs while writing output of custom
reduce script to table
Posted by Dilip Joseph <di...@gmail.com>.
Thanks Zheng, That worked.
It appears that the type information is converted to lower case before
comparison. The following statements where "userId" is used as a
field name failed.
hive> CREATE TABLE SS (
> a INT,
> b INT,
> vals ARRAY<STRUCT<userId:INT, y:STRING>>
> );
OK
Time taken: 0.309 seconds
hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> INSERT OVERWRITE TABLE SS
> REDUCE *
> USING 'myreduce.py'
> AS
> (a INT,
> b INT,
> vals ARRAY<STRUCT<userId:INT, y:STRING>>
> )
> ;
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from array<struct<userId:int,y:string>> to
array<struct<userid:int,y:string>>.
The same queries worked fine after changing "userId" to "userid".
Dilip
On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao <zs...@gmail.com> wrote:
> From 0.5 (probably), we can add type information to the column names after "AS".
> Note that the first level separator should be TAB, and the second
> separator should be ^B (and then ^C, etc)
>
>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>> ;
>
>
> On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
> <di...@gmail.com> wrote:
>> Hello,
>>
>> Does Hive currently support arrays, maps, structs while using custom
>> reduce/map scripts? 'myreduce.py' in the example below produces an
>> array of structs delimited by \2s and \3s.
>>
>> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY<STRUCT<x:INT, y:STRING>>
>> );
>>
>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a,b, vals)
>> ;
>>
>> However, the query is failing with the following error message, even
>> before the script is executed:
>>
>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>> target table because column number/types are different SS: Cannot
>> convert column 2 from string to array<struct<x:int,y:string>>.
>>
>> I saw a discussion about this in
>> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
>> dated over a year ago. Just wondering if there have been any updates.
>>
>> Thanks,
>>
>> Dilip
>>
>
>
>
> --
> Yours,
> Zheng
>
--
_________________________________________
Dilip Antony Joseph
http://www.marydilip.info
Re: support for arrays, maps, structs while writing output of custom
reduce script to table
Posted by Zheng Shao <zs...@gmail.com>.
>From 0.5 (probably), we can add type information to the column names after "AS".
Note that the first level separator should be TAB, and the second
separator should be ^B (and then ^C, etc)
> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> INSERT OVERWRITE TABLE SS
> REDUCE *
> USING 'myreduce.py'
> AS
> (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
> ;
On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
<di...@gmail.com> wrote:
> Hello,
>
> Does Hive currently support arrays, maps, structs while using custom
> reduce/map scripts? 'myreduce.py' in the example below produces an
> array of structs delimited by \2s and \3s.
>
> CREATE TABLE SS (
> a INT,
> b INT,
> vals ARRAY<STRUCT<x:INT, y:STRING>>
> );
>
> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
> INSERT OVERWRITE TABLE SS
> REDUCE *
> USING 'myreduce.py'
> AS
> (a,b, vals)
> ;
>
> However, the query is failing with the following error message, even
> before the script is executed:
>
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from string to array<struct<x:int,y:string>>.
>
> I saw a discussion about this in
> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
> dated over a year ago. Just wondering if there have been any updates.
>
> Thanks,
>
> Dilip
>
--
Yours,
Zheng