You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Dilip Joseph <di...@gmail.com> on 2010/03/22 21:50:06 UTC

support for arrays, maps, structs while writing output of custom reduce script to table

Hello,

Does Hive currently support arrays, maps, structs while using custom
reduce/map scripts? 'myreduce.py' in the example below produces an
array of structs delimited by \2s and \3s.

CREATE TABLE SS (
                    a INT,
                    b INT,
                    vals ARRAY<STRUCT<x:INT, y:STRING>>
                );

FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
        USING 'myreduce.py'
        AS
                (a,b, vals)
        ;

However, the query is failing with the following error message, even
before the script is executed:

FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from string to array<struct<x:int,y:string>>.

I saw a discussion about this in
http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
dated over a year ago.  Just wondering if there have been any updates.

Thanks,

Dilip

Re: support for arrays, maps, structs while writing output of custom reduce script to table

Posted by Dilip Joseph <di...@gmail.com>.

Opened JIRA https://issues.apache.org/jira/browse/HIVE-1271

Dilip

On Mon, Mar 22, 2010 at 3:26 PM, Zheng Shao <zs...@gmail.com> wrote:
> Great!
>
> This is a bug. Hive field names should be case-insensitive. Can you
> open a JIRA for that?
>
> Zheng
> On Mon, Mar 22, 2010 at 2:43 PM, Dilip Joseph
> <di...@gmail.com> wrote:
>> Thanks Zheng,  That worked.
>>
>> It appears that the type information is converted to lower case before
>> comparison.  The following statements where "userId" is used as a
>> field name failed.
>>
>> hive> CREATE TABLE SS (
>>    >                     a INT,
>>    >                     b INT,
>>    >                     vals ARRAY<STRUCT<userId:INT, y:STRING>>
>>    >                 );
>> OK
>> Time taken: 0.309 seconds
>> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>    >     INSERT OVERWRITE TABLE SS
>>    >     REDUCE *
>>    >         USING 'myreduce.py'
>>    >         AS
>>    >                     (a INT,
>>    >                     b INT,
>>    >                     vals ARRAY<STRUCT<userId:INT, y:STRING>>
>>    >                     )
>>    >         ;
>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>> target table because column number/types are different SS: Cannot
>> convert column 2 from array<struct<userId:int,y:string>> to
>> array<struct<userid:int,y:string>>.
>>
>> The same queries worked fine after changing "userId" to "userid".
>>
>> Dilip
>>
>> On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao <zs...@gmail.com> wrote:
>>> From 0.5 (probably), we can add type information to the column names after "AS".
>>> Note that the first level separator should be TAB, and the second
>>> separator should be ^B (and then ^C, etc)
>>>
>>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>>>    INSERT OVERWRITE TABLE SS
>>>>    REDUCE *
>>>>        USING 'myreduce.py'
>>>>        AS
>>>>                (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>>>>        ;
>>>
>>>
>>> On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
>>> <di...@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> Does Hive currently support arrays, maps, structs while using custom
>>>> reduce/map scripts? 'myreduce.py' in the example below produces an
>>>> array of structs delimited by \2s and \3s.
>>>>
>>>> CREATE TABLE SS (
>>>>                    a INT,
>>>>                    b INT,
>>>>                    vals ARRAY<STRUCT<x:INT, y:STRING>>
>>>>                );
>>>>
>>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>>>    INSERT OVERWRITE TABLE SS
>>>>    REDUCE *
>>>>        USING 'myreduce.py'
>>>>        AS
>>>>                (a,b, vals)
>>>>        ;
>>>>
>>>> However, the query is failing with the following error message, even
>>>> before the script is executed:
>>>>
>>>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>>>> target table because column number/types are different SS: Cannot
>>>> convert column 2 from string to array<struct<x:int,y:string>>.
>>>>
>>>> I saw a discussion about this in
>>>> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
>>>> dated over a year ago.  Just wondering if there have been any updates.
>>>>
>>>> Thanks,
>>>>
>>>> Dilip
>>>>
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>>
>> --
>> _________________________________________
>> Dilip Antony Joseph
>> http://www.marydilip.info
>>
>
>
>
> --
> Yours,
> Zheng
>



-- 
_________________________________________
Dilip Antony Joseph
http://www.marydilip.info

Re: support for arrays, maps, structs while writing output of custom reduce script to table

Posted by Zheng Shao <zs...@gmail.com>.

Great!

This is a bug. Hive field names should be case-insensitive. Can you
open a JIRA for that?

Zheng
On Mon, Mar 22, 2010 at 2:43 PM, Dilip Joseph
<di...@gmail.com> wrote:
> Thanks Zheng,  That worked.
>
> It appears that the type information is converted to lower case before
> comparison.  The following statements where "userId" is used as a
> field name failed.
>
> hive> CREATE TABLE SS (
>    >                     a INT,
>    >                     b INT,
>    >                     vals ARRAY<STRUCT<userId:INT, y:STRING>>
>    >                 );
> OK
> Time taken: 0.309 seconds
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>    >     INSERT OVERWRITE TABLE SS
>    >     REDUCE *
>    >         USING 'myreduce.py'
>    >         AS
>    >                     (a INT,
>    >                     b INT,
>    >                     vals ARRAY<STRUCT<userId:INT, y:STRING>>
>    >                     )
>    >         ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array<struct<userId:int,y:string>> to
> array<struct<userid:int,y:string>>.
>
> The same queries worked fine after changing "userId" to "userid".
>
> Dilip
>
> On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao <zs...@gmail.com> wrote:
>> From 0.5 (probably), we can add type information to the column names after "AS".
>> Note that the first level separator should be TAB, and the second
>> separator should be ^B (and then ^C, etc)
>>
>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>>    INSERT OVERWRITE TABLE SS
>>>    REDUCE *
>>>        USING 'myreduce.py'
>>>        AS
>>>                (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>>>        ;
>>
>>
>> On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
>> <di...@gmail.com> wrote:
>>> Hello,
>>>
>>> Does Hive currently support arrays, maps, structs while using custom
>>> reduce/map scripts? 'myreduce.py' in the example below produces an
>>> array of structs delimited by \2s and \3s.
>>>
>>> CREATE TABLE SS (
>>>                    a INT,
>>>                    b INT,
>>>                    vals ARRAY<STRUCT<x:INT, y:STRING>>
>>>                );
>>>
>>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>>    INSERT OVERWRITE TABLE SS
>>>    REDUCE *
>>>        USING 'myreduce.py'
>>>        AS
>>>                (a,b, vals)
>>>        ;
>>>
>>> However, the query is failing with the following error message, even
>>> before the script is executed:
>>>
>>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>>> target table because column number/types are different SS: Cannot
>>> convert column 2 from string to array<struct<x:int,y:string>>.
>>>
>>> I saw a discussion about this in
>>> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
>>> dated over a year ago.  Just wondering if there have been any updates.
>>>
>>> Thanks,
>>>
>>> Dilip
>>>
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> _________________________________________
> Dilip Antony Joseph
> http://www.marydilip.info
>



-- 
Yours,
Zheng

Re: support for arrays, maps, structs while writing output of custom reduce script to table

Posted by Dilip Joseph <di...@gmail.com>.

Thanks Zheng,  That worked.

It appears that the type information is converted to lower case before
comparison.  The following statements where "userId" is used as a
field name failed.

hive> CREATE TABLE SS (
    >                     a INT,
    >                     b INT,
    >                     vals ARRAY<STRUCT<userId:INT, y:STRING>>
    >                 );
OK
Time taken: 0.309 seconds
hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    >     INSERT OVERWRITE TABLE SS
    >     REDUCE *
    >         USING 'myreduce.py'
    >         AS
    >                     (a INT,
    >                     b INT,
    >                     vals ARRAY<STRUCT<userId:INT, y:STRING>>
    >                     )
    >         ;
FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from array<struct<userId:int,y:string>> to
array<struct<userid:int,y:string>>.

The same queries worked fine after changing "userId" to "userid".

Dilip

On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao <zs...@gmail.com> wrote:
> From 0.5 (probably), we can add type information to the column names after "AS".
> Note that the first level separator should be TAB, and the second
> separator should be ^B (and then ^C, etc)
>
>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>    INSERT OVERWRITE TABLE SS
>>    REDUCE *
>>        USING 'myreduce.py'
>>        AS
>>                (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>>        ;
>
>
> On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
> <di...@gmail.com> wrote:
>> Hello,
>>
>> Does Hive currently support arrays, maps, structs while using custom
>> reduce/map scripts? 'myreduce.py' in the example below produces an
>> array of structs delimited by \2s and \3s.
>>
>> CREATE TABLE SS (
>>                    a INT,
>>                    b INT,
>>                    vals ARRAY<STRUCT<x:INT, y:STRING>>
>>                );
>>
>> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>>    INSERT OVERWRITE TABLE SS
>>    REDUCE *
>>        USING 'myreduce.py'
>>        AS
>>                (a,b, vals)
>>        ;
>>
>> However, the query is failing with the following error message, even
>> before the script is executed:
>>
>> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
>> target table because column number/types are different SS: Cannot
>> convert column 2 from string to array<struct<x:int,y:string>>.
>>
>> I saw a discussion about this in
>> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
>> dated over a year ago.  Just wondering if there have been any updates.
>>
>> Thanks,
>>
>> Dilip
>>
>
>
>
> --
> Yours,
> Zheng
>



-- 
_________________________________________
Dilip Antony Joseph
http://www.marydilip.info

Re: support for arrays, maps, structs while writing output of custom reduce script to table

Posted by Zheng Shao <zs...@gmail.com>.

>From 0.5 (probably), we can add type information to the column names after "AS".
Note that the first level separator should be TAB, and the second
separator should be ^B (and then ^C, etc)

> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>    INSERT OVERWRITE TABLE SS
>    REDUCE *
>        USING 'myreduce.py'
>        AS
>                (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
>        ;


On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
<di...@gmail.com> wrote:
> Hello,
>
> Does Hive currently support arrays, maps, structs while using custom
> reduce/map scripts? 'myreduce.py' in the example below produces an
> array of structs delimited by \2s and \3s.
>
> CREATE TABLE SS (
>                    a INT,
>                    b INT,
>                    vals ARRAY<STRUCT<x:INT, y:STRING>>
>                );
>
> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>    INSERT OVERWRITE TABLE SS
>    REDUCE *
>        USING 'myreduce.py'
>        AS
>                (a,b, vals)
>        ;
>
> However, the query is failing with the following error message, even
> before the script is executed:
>
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from string to array<struct<x:int,y:string>>.
>
> I saw a discussion about this in
> http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
> dated over a year ago.  Just wondering if there have been any updates.
>
> Thanks,
>
> Dilip
>



-- 
Yours,
Zheng