You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Raviv M-G <ra...@post.harvard.edu> on 2010/07/06 22:31:05 UTC

GROUP_CONCAT function

Hi all,

Is there a way to use the built-in functions of Pig (or has someone
already written a UDF) to create a similar result to SQL's
GROUP_CONCAT?

The idea is that I have a long list of book ISBN numbers and author names:

123    John Doe
123    Jane Doe

and I would like to be able to group by the ISBN number and then
concatenate them for export to the format:

123    John Doe; Jane Doe

Thanks,
Raviv

Re: GROUP_CONCAT function

Posted by hc busy <hc...@gmail.com>.
Yeah, you can definitely accomplish that in an UDF. it would take one
parameter which is a bag and performs string concatenation on the members of
the bag. the UDF would be like a reducer that is applied to a bag of outputs
from mapper. (the mapper could do other things, like putting quotes around
the name:

123 "John Doe", "Doe, John"


)


On Tue, Jul 6, 2010 at 1:31 PM, Raviv M-G <ra...@post.harvard.edu> wrote:

> Hi all,
>
> Is there a way to use the built-in functions of Pig (or has someone
> already written a UDF) to create a similar result to SQL's
> GROUP_CONCAT?
>
> The idea is that I have a long list of book ISBN numbers and author names:
>
> 123    John Doe
> 123    Jane Doe
>
> and I would like to be able to group by the ISBN number and then
> concatenate them for export to the format:
>
> 123    John Doe; Jane Doe
>
> Thanks,
> Raviv
>

Re: GROUP_CONCAT function

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
There isn't one that I am aware of, but it'd be trivial to write. Take a
look at StringConcat builtin, which does something similar (but for tuples,
and without delimiters).

-D




On Tue, Jul 6, 2010 at 1:31 PM, Raviv M-G <ra...@post.harvard.edu> wrote:

> Hi all,
>
> Is there a way to use the built-in functions of Pig (or has someone
> already written a UDF) to create a similar result to SQL's
> GROUP_CONCAT?
>
> The idea is that I have a long list of book ISBN numbers and author names:
>
> 123    John Doe
> 123    Jane Doe
>
> and I would like to be able to group by the ISBN number and then
> concatenate them for export to the format:
>
> 123    John Doe; Jane Doe
>
> Thanks,
> Raviv
>