You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by rohan monga <mo...@gmail.com> on 2014/01/14 00:22:30 UTC
casting complex data types for outputs of custom scripts
Hi,
I have a table that is of the following format
create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
Now I have a custom script that does some computation and generates
the value for f2
like so
from (
from randomtable r
map r.g1, r.g2, r.g3
using '/bin/cat' as g1, g2, g3
cluster by g1 ) m
insert overwrite table t1
reduce m.g1, m.g2, m.g3
using 'python customScript.py' as ( f1 , f2 );
however f2 is not being loaded properly into t1, it comes up broken or
null. What should I do so that f2 is loaded as an array of structs.
Thanks,
--
Rohan Monga
RE: casting complex data types for outputs of custom scripts
Posted by "Bogala, Chandra Reddy" <Ch...@gs.com>.
Can it be possible to share python script which does the conversion?
Thanks,
Chandra
-----Original Message-----
From: rohan monga [mailto:monga.rohan@gmail.com]
Sent: Monday, January 20, 2014 6:08 AM
To: user@hive.apache.org
Subject: Re: casting complex data types for outputs of custom scripts
sorry for the delayed response.
yes the python script follows that.
--
Rohan Monga
On Tue, Jan 14, 2014 at 4:31 PM, Stephen Sprague <sp...@gmail.com> wrote:
> @OP - first thing i'd ask is does your python script obey the
> ^A,^B,^C,^D etc. nesting delimiter pattern. give that your create
> table does not specify delimiters those are the defaults. nb. ^A ==
> control-A == \001
>
> Cheers,
> Stephen.
>
>
> On Tue, Jan 14, 2014 at 3:11 PM, Andre Araujo <ar...@pythian.com> wrote:
>>
>> I had a similar issue in the past when trying to cast an empty array
>> to array(<bigint>). By default Hive assumes it's an array(<string>).
>> I don't think there's currently a Hive syntax to cast values to
>> complex data types. If there's one, I'd love to know what it is :)
>>
>>
>> On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a table that is of the following format
>>>
>>> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>>>
>>> Now I have a custom script that does some computation and generates
>>> the value for f2 like so
>>>
>>> from (
>>> from randomtable r
>>> map r.g1, r.g2, r.g3
>>> using '/bin/cat' as g1, g2, g3
>>> cluster by g1 ) m
>>> insert overwrite table t1
>>> reduce m.g1, m.g2, m.g3
>>> using 'python customScript.py' as ( f1 , f2 );
>>>
>>> however f2 is not being loaded properly into t1, it comes up broken
>>> or null. What should I do so that f2 is loaded as an array of structs.
>>>
>>>
>>> Thanks,
>>>
>>> --
>>> Rohan Monga
>>
>>
>>
>>
>> --
>> André Araújo
>> Big Data Consultant/Solutions Architect The Pythian Group - Australia
>> - www.pythian.com
>>
>> Office (calls from within Australia): 1300 366 021 x1270
>> Office (international): +61 2 8016 7000 x270 OR +1 613 565 8696 x1270
>> Mobile: +61 410 323 559
>> Fax: +61 2 9805 0544
>> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>>
>> "Success is not about standing at the top, it's the steps you leave
>> behind." - Iker Pou (rock climber)
>>
>> --
>>
>>
>>
>
Re: casting complex data types for outputs of custom scripts
Posted by rohan monga <mo...@gmail.com>.
sorry for the delayed response.
yes the python script follows that.
--
Rohan Monga
On Tue, Jan 14, 2014 at 4:31 PM, Stephen Sprague <sp...@gmail.com> wrote:
> @OP - first thing i'd ask is does your python script obey the ^A,^B,^C,^D
> etc. nesting delimiter pattern. give that your create table does not
> specify delimiters those are the defaults. nb. ^A == control-A == \001
>
> Cheers,
> Stephen.
>
>
> On Tue, Jan 14, 2014 at 3:11 PM, Andre Araujo <ar...@pythian.com> wrote:
>>
>> I had a similar issue in the past when trying to cast an empty array to
>> array(<bigint>). By default Hive assumes it's an array(<string>).
>> I don't think there's currently a Hive syntax to cast values to complex
>> data types. If there's one, I'd love to know what it is :)
>>
>>
>> On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a table that is of the following format
>>>
>>> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>>>
>>> Now I have a custom script that does some computation and generates
>>> the value for f2
>>> like so
>>>
>>> from (
>>> from randomtable r
>>> map r.g1, r.g2, r.g3
>>> using '/bin/cat' as g1, g2, g3
>>> cluster by g1 ) m
>>> insert overwrite table t1
>>> reduce m.g1, m.g2, m.g3
>>> using 'python customScript.py' as ( f1 , f2 );
>>>
>>> however f2 is not being loaded properly into t1, it comes up broken or
>>> null. What should I do so that f2 is loaded as an array of structs.
>>>
>>>
>>> Thanks,
>>>
>>> --
>>> Rohan Monga
>>
>>
>>
>>
>> --
>> André Araújo
>> Big Data Consultant/Solutions Architect
>> The Pythian Group - Australia - www.pythian.com
>>
>> Office (calls from within Australia): 1300 366 021 x1270
>> Office (international): +61 2 8016 7000 x270 OR +1 613 565 8696 x1270
>> Mobile: +61 410 323 559
>> Fax: +61 2 9805 0544
>> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>>
>> “Success is not about standing at the top, it's the steps you leave
>> behind.” — Iker Pou (rock climber)
>>
>> --
>>
>>
>>
>
Re: casting complex data types for outputs of custom scripts
Posted by Stephen Sprague <sp...@gmail.com>.
@OP - first thing i'd ask is does your python script obey the ^A,^B,^C,^D
etc. nesting delimiter pattern. give that your create table does not
specify delimiters those are the defaults. nb. ^A == control-A == \001
Cheers,
Stephen.
On Tue, Jan 14, 2014 at 3:11 PM, Andre Araujo <ar...@pythian.com> wrote:
> I had a similar issue in the past when trying to cast an empty array to
> array(<bigint>). By default Hive assumes it's an array(<string>).
> I don't think there's currently a Hive syntax to cast values to complex
> data types. If there's one, I'd love to know what it is :)
>
>
> On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a table that is of the following format
>>
>> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>>
>> Now I have a custom script that does some computation and generates
>> the value for f2
>> like so
>>
>> from (
>> from randomtable r
>> map r.g1, r.g2, r.g3
>> using '/bin/cat' as g1, g2, g3
>> cluster by g1 ) m
>> insert overwrite table t1
>> reduce m.g1, m.g2, m.g3
>> using 'python customScript.py' as ( f1 , f2 );
>>
>> however f2 is not being loaded properly into t1, it comes up broken or
>> null. What should I do so that f2 is loaded as an array of structs.
>>
>>
>> Thanks,
>>
>> --
>> Rohan Monga
>>
>
>
>
> --
> André Araújo
> Big Data Consultant/Solutions Architect
> The Pythian Group - Australia - www.pythian.com
>
> Office (calls from within Australia): 1300 366 021 x1270
> Office (international): +61 2 8016 7000 x270 *OR* +1 613 565 8696 x1270
> Mobile: +61 410 323 559
> Fax: +61 2 9805 0544
> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
>
> “Success is not about standing at the top, it's the steps you leave behind.”
> — Iker Pou (rock climber)
>
> --
>
>
>
>
Re: casting complex data types for outputs of custom scripts
Posted by Andre Araujo <ar...@pythian.com>.
I had a similar issue in the past when trying to cast an empty array to
array(<bigint>). By default Hive assumes it's an array(<string>).
I don't think there's currently a Hive syntax to cast values to complex
data types. If there's one, I'd love to know what it is :)
On 14 January 2014 10:22, rohan monga <mo...@gmail.com> wrote:
> Hi,
>
> I have a table that is of the following format
>
> create table t1 ( f1 int, f2 array<struct<a1:int, a2:int>> );
>
> Now I have a custom script that does some computation and generates
> the value for f2
> like so
>
> from (
> from randomtable r
> map r.g1, r.g2, r.g3
> using '/bin/cat' as g1, g2, g3
> cluster by g1 ) m
> insert overwrite table t1
> reduce m.g1, m.g2, m.g3
> using 'python customScript.py' as ( f1 , f2 );
>
> however f2 is not being loaded properly into t1, it comes up broken or
> null. What should I do so that f2 is loaded as an array of structs.
>
>
> Thanks,
>
> --
> Rohan Monga
>
--
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com
Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000 x270 *OR* +1 613 565 8696 x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
“Success is not about standing at the top, it's the steps you leave behind.”
— Iker Pou (rock climber)
--
--