You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by rohan monga <mo...@gmail.com> on 2014/01/14 00:22:30 UTC

casting complex data types for outputs of custom scripts

Hi,

I have a table that is of the following format

create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );

Now I have a custom script that does some computation and generates
the value for f2
like so

from (
    from randomtable r
    map r.g1, r.g2, r.g3
    using '/bin/cat' as g1, g2, g3
    cluster by g1 ) m
    insert overwrite table t1
    reduce m.g1, m.g2, m.g3
    using 'python customScript.py' as ( f1 , f2 );

however f2 is not being loaded properly into t1, it comes up broken or
null. What should I do so that f2 is loaded as an array of structs.


Thanks,

--
Rohan Monga

RE: casting complex data types for outputs of custom scripts

Posted by "Bogala, Chandra Reddy" <Ch...@gs.com>.
Can it be possible to share python script which does the conversion?

Thanks,
Chandra

-----Original Message-----
From: rohan monga [mailto:monga.rohan@gmail.com] 
Sent: Monday, January 20, 2014 6:08 AM
To: user@hive.apache.org
Subject: Re: casting complex data types for outputs of custom scripts

sorry for the delayed response.

yes the python script follows that.

--
Rohan Monga


On Tue, Jan 14, 2014 at 4:31 PM, Stephen Sprague <sp...@gmail.com> wrote:
> @OP - first thing i'd ask is does your python script obey the 
> ^A,^B,^C,^D etc. nesting delimiter pattern.  give that your create 
> table does not specify delimiters those are the defaults.  nb. ^A == 
> control-A == \001
>
> Cheers,
> Stephen.
>
>
> On Tue, Jan 14, 2014 at 3:11 PM, Andre Araujo <ar...@pythian.com> wrote:
>>
>> I had a similar issue in the past when trying to cast an empty array 
>> to array(<bigint>). By default Hive assumes it's an array(<string>).
>> I don't think there's currently a Hive syntax to cast values to 
>> complex data types. If there's one, I'd love to know what it is :)
>>
>>
>> On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a table that is of the following format
>>>
>>> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>>>
>>> Now I have a custom script that does some computation and generates 
>>> the value for f2 like so
>>>
>>> from (
>>>     from randomtable r
>>>     map r.g1, r.g2, r.g3
>>>     using '/bin/cat' as g1, g2, g3
>>>     cluster by g1 ) m
>>>     insert overwrite table t1
>>>     reduce m.g1, m.g2, m.g3
>>>     using 'python customScript.py' as ( f1 , f2 );
>>>
>>> however f2 is not being loaded properly into t1, it comes up broken 
>>> or null. What should I do so that f2 is loaded as an array of structs.
>>>
>>>
>>> Thanks,
>>>
>>> --
>>> Rohan Monga
>>
>>
>>
>>
>> --
>> André Araújo
>> Big Data Consultant/Solutions Architect The Pythian Group - Australia 
>> - www.pythian.com
>>
>> Office (calls from within Australia): 1300 366 021 x1270
>> Office (international): +61 2 8016 7000  x270 OR +1 613 565 8696   x1270
>> Mobile: +61 410 323 559
>> Fax: +61 2 9805 0544
>> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>>
>> "Success is not about standing at the top, it's the steps you leave 
>> behind." - Iker Pou (rock climber)
>>
>> --
>>
>>
>>
>

Re: casting complex data types for outputs of custom scripts

Posted by rohan monga <mo...@gmail.com>.
sorry for the delayed response.

yes the python script follows that.

--
Rohan Monga


On Tue, Jan 14, 2014 at 4:31 PM, Stephen Sprague <sp...@gmail.com> wrote:
> @OP - first thing i'd ask is does your python script obey the ^A,^B,^C,^D
> etc. nesting delimiter pattern.  give that your create table does not
> specify delimiters those are the defaults.  nb. ^A == control-A == \001
>
> Cheers,
> Stephen.
>
>
> On Tue, Jan 14, 2014 at 3:11 PM, Andre Araujo <ar...@pythian.com> wrote:
>>
>> I had a similar issue in the past when trying to cast an empty array to
>> array(<bigint>). By default Hive assumes it's an array(<string>).
>> I don't think there's currently a Hive syntax to cast values to complex
>> data types. If there's one, I'd love to know what it is :)
>>
>>
>> On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a table that is of the following format
>>>
>>> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>>>
>>> Now I have a custom script that does some computation and generates
>>> the value for f2
>>> like so
>>>
>>> from (
>>>     from randomtable r
>>>     map r.g1, r.g2, r.g3
>>>     using '/bin/cat' as g1, g2, g3
>>>     cluster by g1 ) m
>>>     insert overwrite table t1
>>>     reduce m.g1, m.g2, m.g3
>>>     using 'python customScript.py' as ( f1 , f2 );
>>>
>>> however f2 is not being loaded properly into t1, it comes up broken or
>>> null. What should I do so that f2 is loaded as an array of structs.
>>>
>>>
>>> Thanks,
>>>
>>> --
>>> Rohan Monga
>>
>>
>>
>>
>> --
>> André Araújo
>> Big Data Consultant/Solutions Architect
>> The Pythian Group - Australia - www.pythian.com
>>
>> Office (calls from within Australia): 1300 366 021 x1270
>> Office (international): +61 2 8016 7000  x270 OR +1 613 565 8696   x1270
>> Mobile: +61 410 323 559
>> Fax: +61 2 9805 0544
>> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>>
>> “Success is not about standing at the top, it's the steps you leave
>> behind.” — Iker Pou (rock climber)
>>
>> --
>>
>>
>>
>

Re: casting complex data types for outputs of custom scripts

Posted by Stephen Sprague <sp...@gmail.com>.
@OP - first thing i'd ask is does your python script obey the ^A,^B,^C,^D
etc. nesting delimiter pattern.  give that your create table does not
specify delimiters those are the defaults.  nb. ^A == control-A == \001

Cheers,
Stephen.


On Tue, Jan 14, 2014 at 3:11 PM, Andre Araujo <ar...@pythian.com> wrote:

> I had a similar issue in the past when trying to cast an empty array to
> array(<bigint>). By default Hive assumes it's an array(<string>).
> I don't think there's currently a Hive syntax to cast values to complex
> data types. If there's one, I'd love to know what it is :)
>
>
> On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a table that is of the following format
>>
>> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>>
>> Now I have a custom script that does some computation and generates
>> the value for f2
>> like so
>>
>> from (
>>     from randomtable r
>>     map r.g1, r.g2, r.g3
>>     using '/bin/cat' as g1, g2, g3
>>     cluster by g1 ) m
>>     insert overwrite table t1
>>     reduce m.g1, m.g2, m.g3
>>     using 'python customScript.py' as ( f1 , f2 );
>>
>> however f2 is not being loaded properly into t1, it comes up broken or
>> null. What should I do so that f2 is loaded as an array of structs.
>>
>>
>> Thanks,
>>
>> --
>> Rohan Monga
>>
>
>
>
> --
> André Araújo
> Big Data Consultant/Solutions Architect
> The Pythian Group - Australia - www.pythian.com
>
> Office (calls from within Australia): 1300 366 021 x1270
> Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
> Mobile: +61 410 323 559
> Fax: +61 2 9805 0544
> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>
> “Success is not about standing at the top, it's the steps you leave behind.”
> — Iker Pou (rock climber)
>
> --
>
>
>
>

Re: casting complex data types for outputs of custom scripts

Posted by Andre Araujo <ar...@pythian.com>.
I had a similar issue in the past when trying to cast an empty array to
array(<bigint>). By default Hive assumes it's an array(<string>).
I don't think there's currently a Hive syntax to cast values to complex
data types. If there's one, I'd love to know what it is :)


On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:

> Hi,
>
> I have a table that is of the following format
>
> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>
> Now I have a custom script that does some computation and generates
> the value for f2
> like so
>
> from (
>     from randomtable r
>     map r.g1, r.g2, r.g3
>     using '/bin/cat' as g1, g2, g3
>     cluster by g1 ) m
>     insert overwrite table t1
>     reduce m.g1, m.g2, m.g3
>     using 'python customScript.py' as ( f1 , f2 );
>
> however f2 is not being loaded properly into t1, it comes up broken or
> null. What should I do so that f2 is loaded as an array of structs.
>
>
> Thanks,
>
> --
> Rohan Monga
>



-- 
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.”
— Iker Pou (rock climber)

-- 


--