You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Benjamin Juhn <be...@gmail.com> on 2012/07/08 19:57:47 UTC

Accessing nested bags & tuples

Hi There,

I'm trying to concat the first tag string with type string for all records.  Could someone advise on syntax?
records: {meta:(type: chararray, tags_bag: {t: (tags_tuple: (tags: chararray))}

Thanks,
Ben

Re: Accessing nested bags & tuples

Posted by Abhinav Neelam <ab...@gmail.com>.
Alex: your solution won't work because the field 'type' isn't accessible
within tags_bag.

Ben: I think you're saying you want to concat the 'tags' value in the *first
* tags_tuple in tags_bag?
Just to clarify, bags are unordered, so asking for the 'first' element in a
bag doesn't make sense unless you do a LIMIT immediately after an ORDER BY,
or use the TOP function. Either way, you need to order your bag first to
impose the idea of 'first element' on to it.

You could do something like this:

proc = foreach records { ordered_bag = ORDER tags_bag BY tags_tuple;
first_tuple_bag = LIMIT ordered_bag 1; generate FLATTEN(first_tuple_bag) as
tag_first, type, tags_bag; }
out = foreach proc generate CONCAT(tag_first.$0, type) as tag_first_type,
type, tags_bag;

This will produce an extra field containing the CONCAT-ed value of type and
the first tags_bag tuple, but the bag tags_bag remains unaltered. But if
you want to actually modify tags_bag to have its 'first' (by whatever
ordering) tuple contain the CONCAT-ed value, that'd be more tricky. I don't
know why you'd want such a thing, but a UDF would accomplish that quickest
probably. If Pig supported a nested UNION, I'd have a way of doing this in
'native' Pig, but as there isn't, I can't think of a non-UDF approach
straight away.

Thanks,
Abhinav


On 8 July 2012 23:27, Benjamin Juhn <be...@gmail.com> wrote:

> Hi There,
>
> I'm trying to concat the first tag string with type string for all
> records.  Could someone advise on syntax?
> records: {meta:(type: chararray, tags_bag: {t: (tags_tuple: (tags:
> chararray))}
>
> Thanks,
> Ben

Re: Accessing nested bags & tuples

Posted by Alex Rovner <al...@gmail.com>.
Nested for each might work:

FOR EACH record {

  FOR EACH tags_bag {
    result = CONCAT(type, tags);
  }
}
On Sun, Jul 8, 2012 at 1:57 PM, Benjamin Juhn <be...@gmail.com> wrote:

> Hi There,
>
> I'm trying to concat the first tag string with type string for all
> records.  Could someone advise on syntax?
> records: {meta:(type: chararray, tags_bag: {t: (tags_tuple: (tags:
> chararray))}
>
> Thanks,
> Ben