You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matt Tanquary <ma...@gmail.com> on 2010/12/21 17:31:36 UTC

Concat on bags?

This set results from a JOIN:

(04f4c2fd-8be2-41c3-b045-283de80909ba,1966,2L)
(04f4c2fd-8be2-41c3-b045-283de80909ba,3845,2L)

Using PIG, I group this and get:

(669a4b47-d3c3-4950-9ec0-f1e24064d9d9,{(669a4b47-d3c3-4950-9ec0-f1e24064d9d9,1634,2L),(669a4b47-d3c3-4950-9ec0-f1e24064d9d9,1966,2L)})

After FOREACH...GENERATE:

({(1966),(3845)},{(2L),(2L)})

What I want to do is derive:

(1966|3845,2L)

The trouble is that everything is bagged up from the group and I'm not sure
how to unbag for the output so I can do things like apply CONCAT, UNIQUE on
the fields, etc. I have tried nested FOREACH statements, but I can't seem to
drill down far enough to de-reference the values the way I'd like.

Is this a job for UDF or is there anything in Pig Latin that I can do to
accomplish this task?

Thanks!
-M@

Re: Concat on bags?

Posted by Daniel Dai <ji...@yahoo-inc.com>.
You will need a UDF to concat bag items.

Daniel

Matt Tanquary wrote:
> This set results from a JOIN:
>
> (04f4c2fd-8be2-41c3-b045-283de80909ba,1966,2L)
> (04f4c2fd-8be2-41c3-b045-283de80909ba,3845,2L)
>
> Using PIG, I group this and get:
>
> (669a4b47-d3c3-4950-9ec0-f1e24064d9d9,{(669a4b47-d3c3-4950-9ec0-f1e24064d9d9,1634,2L),(669a4b47-d3c3-4950-9ec0-f1e24064d9d9,1966,2L)})
>
> After FOREACH...GENERATE:
>
> ({(1966),(3845)},{(2L),(2L)})
>
> What I want to do is derive:
>
> (1966|3845,2L)
>
> The trouble is that everything is bagged up from the group and I'm not sure
> how to unbag for the output so I can do things like apply CONCAT, UNIQUE on
> the fields, etc. I have tried nested FOREACH statements, but I can't seem to
> drill down far enough to de-reference the values the way I'd like.
>
> Is this a job for UDF or is there anything in Pig Latin that I can do to
> accomplish this task?
>
> Thanks!
> -M@
>