You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by anusha Mangina <an...@gmail.com> on 2014/09/03 22:47:52 UTC
Collect_set() of non-primitive types
I have a table defined as:
CREATE TABLE foo (
id INT,
start_time STRING,
name STRING,
value STRING,
type STRING
)
The 'name', 'value' and 'type' fields actually describe another object, and
I'd like to turn these into STRUCTs, something like:
CREATE TABLE bar (
id INT,
start_time STRING,
object STRUCT<name: STRING,
value: STRING,
type: STRING>
)
However, I'd also like to create a rollup table containing the most recent
result for a given 'id' field. There can be multiple different 'object'
structs for a given combination of 'id' and 'start_time', so I thought I'd
turn this into an array of structs. My ideal rollup table would look like:
CREATE TABLE rollup (
id INT,
start_time STRING,
objects ARRAY<STRUCT<name: STRING,
value: STRING,
type: STRING>>
)
However, I can't do this because the following query fails:
INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(object)
FROM bar
GROUP BY id, start_time
Here's the error I get:
FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
but struct<name:string,value:string,type:string> was passed as parameter 1.
Is there any way I can do this?
Re: Collect_set() of non-primitive types
Posted by Nishant Kelkar <ni...@gmail.com>.
Geez, too many mistakes for the day :P
I meant the following above
*CREATE TABLE* rollup_new *AS*
SELECT id, start_time, collect_set(concat_ws(",", object.name
<http://objects.name/>, object.value, object.type)) AS product_details
FROM bar
GROUP BY id, start_time;
The change is in the table creation query, since an INSERT OVERWRITE would
expect a struct and get a string instead, thus throwing errors right?
On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar <ni...@gmail.com>
wrote:
> Sorry, I meant the following in my example:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(concat_ws(",", object.name <http://objects.name/>, object.value, object.type)) AS product_details
> FROM bar
> GROUP BY id, start_time;
>
> "object" instead of "object*s*", as you have in your example :)
>
> Best Regards,
> Nishant Kelkar
>
>
> On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar <ni...@gmail.com>
> wrote:
>
>> I don't know of anything like what you want atleast until Hive 0.11.
>>
>> However, you could try something like this:
>>
>> INSERT OVERWRITE TABLE rollup
>> SELECT id, start_time, collect_set(concat_ws(",", objects.name, objects.value, objects.type)) AS product_details
>> FROM bar
>> GROUP BY id, start_time;
>>
>>
>> It's a bit hacky, but it does the trick. It basically concats all the
>> objects you care about in a string, and then collects a set of that. Then,
>> when you want to extract a field from the "product_details" array, just do
>> this (say you want to extract the very first product's name in each array):
>>
>> SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
>>
>>
>> Hope that helps!
>>
>> Best Regards,
>> Nishant Kelkar
>>
>>
>>
>> On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <an...@gmail.com>
>> wrote:
>>
>>> I have a table defined as:
>>>
>>> CREATE TABLE foo (
>>> id INT,
>>> start_time STRING,
>>> name STRING,
>>> value STRING,
>>> type STRING
>>> )
>>>
>>> The 'name', 'value' and 'type' fields actually describe another object, and
>>> I'd like to turn these into STRUCTs, something like:
>>>
>>> CREATE TABLE bar (
>>> id INT,
>>> start_time STRING,
>>> object STRUCT<name: STRING,
>>> value: STRING,
>>> type: STRING>
>>> )
>>>
>>> However, I'd also like to create a rollup table containing the most recent
>>> result for a given 'id' field. There can be multiple different 'object'
>>> structs for a given combination of 'id' and 'start_time', so I thought I'd
>>> turn this into an array of structs. My ideal rollup table would look like:
>>>
>>> CREATE TABLE rollup (
>>> id INT,
>>> start_time STRING,
>>> objects ARRAY<STRUCT<name: STRING,
>>> value: STRING,
>>> type: STRING>>
>>> )
>>>
>>> However, I can't do this because the following query fails:
>>>
>>> INSERT OVERWRITE TABLE rollup
>>> SELECT id, start_time, collect_set(object)
>>> FROM bar
>>> GROUP BY id, start_time
>>>
>>> Here's the error I get:
>>>
>>> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
>>> but struct<name:string,value:string,type:string> was passed as parameter 1.
>>>
>>> Is there any way I can do this?
>>>
>>>
>>
>
Re: Collect_set() of non-primitive types
Posted by Nishant Kelkar <ni...@gmail.com>.
Sorry, I meant the following in my example:
INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(concat_ws(",", object.name
<http://objects.name/>, object.value, object.type)) AS product_details
FROM bar
GROUP BY id, start_time;
"object" instead of "object*s*", as you have in your example :)
Best Regards,
Nishant Kelkar
On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar <ni...@gmail.com>
wrote:
> I don't know of anything like what you want atleast until Hive 0.11.
>
> However, you could try something like this:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(concat_ws(",", objects.name, objects.value, objects.type)) AS product_details
> FROM bar
> GROUP BY id, start_time;
>
>
> It's a bit hacky, but it does the trick. It basically concats all the
> objects you care about in a string, and then collects a set of that. Then,
> when you want to extract a field from the "product_details" array, just do
> this (say you want to extract the very first product's name in each array):
>
> SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
>
>
> Hope that helps!
>
> Best Regards,
> Nishant Kelkar
>
>
>
> On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <an...@gmail.com>
> wrote:
>
>> I have a table defined as:
>>
>> CREATE TABLE foo (
>> id INT,
>> start_time STRING,
>> name STRING,
>> value STRING,
>> type STRING
>> )
>>
>> The 'name', 'value' and 'type' fields actually describe another object, and
>> I'd like to turn these into STRUCTs, something like:
>>
>> CREATE TABLE bar (
>> id INT,
>> start_time STRING,
>> object STRUCT<name: STRING,
>> value: STRING,
>> type: STRING>
>> )
>>
>> However, I'd also like to create a rollup table containing the most recent
>> result for a given 'id' field. There can be multiple different 'object'
>> structs for a given combination of 'id' and 'start_time', so I thought I'd
>> turn this into an array of structs. My ideal rollup table would look like:
>>
>> CREATE TABLE rollup (
>> id INT,
>> start_time STRING,
>> objects ARRAY<STRUCT<name: STRING,
>> value: STRING,
>> type: STRING>>
>> )
>>
>> However, I can't do this because the following query fails:
>>
>> INSERT OVERWRITE TABLE rollup
>> SELECT id, start_time, collect_set(object)
>> FROM bar
>> GROUP BY id, start_time
>>
>> Here's the error I get:
>>
>> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
>> but struct<name:string,value:string,type:string> was passed as parameter 1.
>>
>> Is there any way I can do this?
>>
>>
>
Re: Collect_set() of non-primitive types
Posted by Nishant Kelkar <ni...@gmail.com>.
I don't know of anything like what you want atleast until Hive 0.11.
However, you could try something like this:
INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(concat_ws(",", objects.name,
objects.value, objects.type)) AS product_details
FROM bar
GROUP BY id, start_time;
It's a bit hacky, but it does the trick. It basically concats all the
objects you care about in a string, and then collects a set of that. Then,
when you want to extract a field from the "product_details" array, just do
this (say you want to extract the very first product's name in each array):
SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
Hope that helps!
Best Regards,
Nishant Kelkar
On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <an...@gmail.com>
wrote:
> I have a table defined as:
>
> CREATE TABLE foo (
> id INT,
> start_time STRING,
> name STRING,
> value STRING,
> type STRING
> )
>
> The 'name', 'value' and 'type' fields actually describe another object, and
> I'd like to turn these into STRUCTs, something like:
>
> CREATE TABLE bar (
> id INT,
> start_time STRING,
> object STRUCT<name: STRING,
> value: STRING,
> type: STRING>
> )
>
> However, I'd also like to create a rollup table containing the most recent
> result for a given 'id' field. There can be multiple different 'object'
> structs for a given combination of 'id' and 'start_time', so I thought I'd
> turn this into an array of structs. My ideal rollup table would look like:
>
> CREATE TABLE rollup (
> id INT,
> start_time STRING,
> objects ARRAY<STRUCT<name: STRING,
> value: STRING,
> type: STRING>>
> )
>
> However, I can't do this because the following query fails:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(object)
> FROM bar
> GROUP BY id, start_time
>
> Here's the error I get:
>
> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
> but struct<name:string,value:string,type:string> was passed as parameter 1.
>
> Is there any way I can do this?
>
>