You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2018/07/30 08:21:28 UTC

How to replace collect_set Hive function in Impala?

Hi everyone!



could anybody tell, how I can replace collect_set Hive function in impala?



Query like this:



select
col1,

collect_set(distinct col2)

from dpi_parquet_gzip
group by
col1



thanks a lot!

-- 
Take Care
Fawze Abujaber

Re: How to replace collect_set Hive function in Impala?

Posted by Greg Rahn <gr...@gmail.com>.
If returning a delimited string works vs an array - see group_concat()
https://impala.apache.org/docs/build/html/topics/impala_group_concat.html
On Mon, Jul 30, 2018 at 3:18 PM Zoltan Borok-Nagy
<bo...@cloudera.com> wrote:
>
> Hi Fawze,
>
> In Impala, only scalar types are allowed in the select list because Impala always produces result sets with all scalar values, i.e. simple tables.
> The collect_set() function in Hive returns an array, and Impala cannot put an array into a single cell of an output table.
>
> If you want to write files that contain complex data I'm afraid you'll need Hive.
>
> BR,
>     Zoltan
>
>
>
>
> On Mon, Jul 30, 2018 at 10:21 AM Fawze Abujaber <fa...@gmail.com> wrote:
>>
>> Hi everyone!
>>
>>
>>
>> could anybody tell, how I can replace collect_set Hive function in impala?
>>
>>
>>
>> Query like this:
>>
>>
>>
>> select
>> col1,
>>
>> collect_set(distinct col2)
>>
>> from dpi_parquet_gzip
>> group by
>> col1
>>
>>
>>
>> thanks a lot!
>>
>> --
>> Take Care
>> Fawze Abujaber

Re: How to replace collect_set Hive function in Impala?

Posted by Zoltan Borok-Nagy <bo...@cloudera.com>.
Hi Fawze,

In Impala, only scalar types are allowed in the select list because Impala
always produces result sets with all scalar values, i.e. simple tables.
The collect_set() function in Hive returns an array, and Impala cannot put
an array into a single cell of an output table.

If you want to write files that contain complex data I'm afraid you'll need
Hive.

BR,
    Zoltan




On Mon, Jul 30, 2018 at 10:21 AM Fawze Abujaber <fa...@gmail.com> wrote:

> Hi everyone!
>
>
>
> could anybody tell, how I can replace collect_set Hive function in impala?
>
>
>
> Query like this:
>
>
>
> select
> col1,
>
> collect_set(distinct col2)
>
> from dpi_parquet_gzip
> group by
> col1
>
>
>
> thanks a lot!
>
> --
> Take Care
> Fawze Abujaber
>