You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by anusha Mangina <an...@gmail.com> on 2014/09/03 22:47:52 UTC

Collect_set() of non-primitive types

I have a table defined as:

CREATE TABLE foo (
  id INT,
  start_time STRING,
  name STRING,
  value STRING,
  type STRING
)

The 'name', 'value' and 'type' fields actually describe another object, and
I'd like to turn these into STRUCTs, something like:

CREATE TABLE bar (
  id INT,
  start_time STRING,
  object STRUCT<name: STRING,
    value: STRING,
    type: STRING>
)

However, I'd also like to create a rollup table containing the most recent
result for a given 'id' field. There can be multiple different 'object'
structs for a given combination of 'id' and 'start_time', so I thought I'd
turn this into an array of structs. My ideal rollup table would look like:

CREATE TABLE rollup (
  id INT,
  start_time STRING,
  objects ARRAY<STRUCT<name: STRING,
    value: STRING,
    type: STRING>>
)

However, I can't do this because the following query fails:

INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(object)
FROM bar
GROUP BY id, start_time

Here's the error I get:

FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
but struct<name:string,value:string,type:string> was passed as parameter 1.

Is there any way I can do this?

Re: Collect_set() of non-primitive types

Posted by Nishant Kelkar <ni...@gmail.com>.
Geez, too many mistakes for the day :P

I meant the following above

*CREATE TABLE* rollup_new *AS*
SELECT id, start_time, collect_set(concat_ws(",", object.name
<http://objects.name/>, object.value, object.type)) AS product_details
FROM bar
GROUP BY id, start_time;

The change is in the table creation query, since an INSERT OVERWRITE would
expect a struct and get a string instead, thus throwing errors right?



On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar <ni...@gmail.com>
wrote:

> Sorry, I meant the following in my example:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(concat_ws(",", object.name <http://objects.name/>, object.value, object.type)) AS product_details
> FROM bar
> GROUP BY id, start_time;
>
> "object" instead of "object*s*", as you have in your example :)
>
> Best Regards,
> Nishant Kelkar
>
>
> On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar <ni...@gmail.com>
> wrote:
>
>> I don't know of anything like what you want atleast until Hive 0.11.
>>
>> However, you could try something like this:
>>
>> INSERT OVERWRITE TABLE rollup
>> SELECT id, start_time, collect_set(concat_ws(",", objects.name, objects.value, objects.type)) AS product_details
>> FROM bar
>> GROUP BY id, start_time;
>>
>>
>> It's a bit hacky, but it does the trick. It basically concats all the
>> objects you care about in a string, and then collects a set of that. Then,
>> when you want to extract a field from the "product_details" array, just do
>> this (say you want to extract the very first product's name in each array):
>>
>> SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
>>
>>
>> Hope that helps!
>>
>> Best Regards,
>> Nishant Kelkar
>>
>>
>>
>> On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <an...@gmail.com>
>> wrote:
>>
>>> I have a table defined as:
>>>
>>> CREATE TABLE foo (
>>>   id INT,
>>>   start_time STRING,
>>>   name STRING,
>>>   value STRING,
>>>   type STRING
>>> )
>>>
>>> The 'name', 'value' and 'type' fields actually describe another object, and
>>> I'd like to turn these into STRUCTs, something like:
>>>
>>> CREATE TABLE bar (
>>>   id INT,
>>>   start_time STRING,
>>>   object STRUCT<name: STRING,
>>>     value: STRING,
>>>     type: STRING>
>>> )
>>>
>>> However, I'd also like to create a rollup table containing the most recent
>>> result for a given 'id' field. There can be multiple different 'object'
>>> structs for a given combination of 'id' and 'start_time', so I thought I'd
>>> turn this into an array of structs. My ideal rollup table would look like:
>>>
>>> CREATE TABLE rollup (
>>>   id INT,
>>>   start_time STRING,
>>>   objects ARRAY<STRUCT<name: STRING,
>>>     value: STRING,
>>>     type: STRING>>
>>> )
>>>
>>> However, I can't do this because the following query fails:
>>>
>>> INSERT OVERWRITE TABLE rollup
>>> SELECT id, start_time, collect_set(object)
>>> FROM bar
>>> GROUP BY id, start_time
>>>
>>> Here's the error I get:
>>>
>>> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
>>> but struct<name:string,value:string,type:string> was passed as parameter 1.
>>>
>>> Is there any way I can do this?
>>>
>>>
>>
>

Re: Collect_set() of non-primitive types

Posted by Nishant Kelkar <ni...@gmail.com>.
Sorry, I meant the following in my example:

INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(concat_ws(",", object.name
<http://objects.name/>, object.value, object.type)) AS product_details
FROM bar
GROUP BY id, start_time;

"object" instead of "object*s*", as you have in your example :)

Best Regards,
Nishant Kelkar


On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar <ni...@gmail.com>
wrote:

> I don't know of anything like what you want atleast until Hive 0.11.
>
> However, you could try something like this:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(concat_ws(",", objects.name, objects.value, objects.type)) AS product_details
> FROM bar
> GROUP BY id, start_time;
>
>
> It's a bit hacky, but it does the trick. It basically concats all the
> objects you care about in a string, and then collects a set of that. Then,
> when you want to extract a field from the "product_details" array, just do
> this (say you want to extract the very first product's name in each array):
>
> SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
>
>
> Hope that helps!
>
> Best Regards,
> Nishant Kelkar
>
>
>
> On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <an...@gmail.com>
> wrote:
>
>> I have a table defined as:
>>
>> CREATE TABLE foo (
>>   id INT,
>>   start_time STRING,
>>   name STRING,
>>   value STRING,
>>   type STRING
>> )
>>
>> The 'name', 'value' and 'type' fields actually describe another object, and
>> I'd like to turn these into STRUCTs, something like:
>>
>> CREATE TABLE bar (
>>   id INT,
>>   start_time STRING,
>>   object STRUCT<name: STRING,
>>     value: STRING,
>>     type: STRING>
>> )
>>
>> However, I'd also like to create a rollup table containing the most recent
>> result for a given 'id' field. There can be multiple different 'object'
>> structs for a given combination of 'id' and 'start_time', so I thought I'd
>> turn this into an array of structs. My ideal rollup table would look like:
>>
>> CREATE TABLE rollup (
>>   id INT,
>>   start_time STRING,
>>   objects ARRAY<STRUCT<name: STRING,
>>     value: STRING,
>>     type: STRING>>
>> )
>>
>> However, I can't do this because the following query fails:
>>
>> INSERT OVERWRITE TABLE rollup
>> SELECT id, start_time, collect_set(object)
>> FROM bar
>> GROUP BY id, start_time
>>
>> Here's the error I get:
>>
>> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
>> but struct<name:string,value:string,type:string> was passed as parameter 1.
>>
>> Is there any way I can do this?
>>
>>
>

Re: Collect_set() of non-primitive types

Posted by Nishant Kelkar <ni...@gmail.com>.
I don't know of anything like what you want atleast until Hive 0.11.

However, you could try something like this:

INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(concat_ws(",", objects.name,
objects.value, objects.type)) AS product_details
FROM bar
GROUP BY id, start_time;


It's a bit hacky, but it does the trick. It basically concats all the
objects you care about in a string, and then collects a set of that. Then,
when you want to extract a field from the "product_details" array, just do
this (say you want to extract the very first product's name in each array):

SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;


Hope that helps!

Best Regards,
Nishant Kelkar



On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <an...@gmail.com>
wrote:

> I have a table defined as:
>
> CREATE TABLE foo (
>   id INT,
>   start_time STRING,
>   name STRING,
>   value STRING,
>   type STRING
> )
>
> The 'name', 'value' and 'type' fields actually describe another object, and
> I'd like to turn these into STRUCTs, something like:
>
> CREATE TABLE bar (
>   id INT,
>   start_time STRING,
>   object STRUCT<name: STRING,
>     value: STRING,
>     type: STRING>
> )
>
> However, I'd also like to create a rollup table containing the most recent
> result for a given 'id' field. There can be multiple different 'object'
> structs for a given combination of 'id' and 'start_time', so I thought I'd
> turn this into an array of structs. My ideal rollup table would look like:
>
> CREATE TABLE rollup (
>   id INT,
>   start_time STRING,
>   objects ARRAY<STRUCT<name: STRING,
>     value: STRING,
>     type: STRING>>
> )
>
> However, I can't do this because the following query fails:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(object)
> FROM bar
> GROUP BY id, start_time
>
> Here's the error I get:
>
> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
> but struct<name:string,value:string,type:string> was passed as parameter 1.
>
> Is there any way I can do this?
>
>