You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Pete Warden <pe...@petewarden.com> on 2011/10/15 07:53:53 UTC

Converting an inner bag to an outer bag/relation?

Newbie question - I have an inner bag of tuples that I'd like to convert
into an outer bag/relation and I'm struggling to figure out how
For example if I have
({(1,2),(3,4),(5,6)}
({(7,8),(9,10)}
I'd like it to become
(1,2)
(3,4)
(5,6)
(7,8)
(9,10)
The motivation behind that is a Cassandra field that contains a packed,
variable-length data structure, a bit like a CSV string encoding multiple
rows of data
I can convert the raw char array into an inner bag of tuples but I need to
'explode' it to work properly with it

I'm open to "don't do that, here's why it's a dumb idea", but it feels like
I'm missing an operator that could be used to implement this. I have a
partially-working solution using streaming, but the presence of new lines in
the chararray makes that approach tough. Any advice much appreciated.

cheers,
           Pete

Re: Converting an inner bag to an outer bag/relation?

Posted by Jeremy Hanna <je...@gmail.com>.
One of the reasons why we did pygmalion here was to facilitate working with tabular data - extracting out values (with FromCassandraBag) using specified column names.  Not sure if it works with your use case, but just to mention it - it doesn't work as easily with dynamic column names.
https://github.com/jeromatron/pygmalion/

On Oct 15, 2011, at 12:58 AM, Pete Warden wrote:

> Never mind, it looks like the FLATTEN operator should do the trick. I'd only
> seen it with tuples, didn't realize it did what I needed with inner bags
> until I RTFM-ed again.
> 
> On Fri, Oct 14, 2011 at 10:53 PM, Pete Warden <pe...@petewarden.com> wrote:
> 
>> Newbie question - I have an inner bag of tuples that I'd like to convert
>> into an outer bag/relation and I'm struggling to figure out how
>> For example if I have
>> ({(1,2),(3,4),(5,6)}
>> ({(7,8),(9,10)}
>> I'd like it to become
>> (1,2)
>> (3,4)
>> (5,6)
>> (7,8)
>> (9,10)
>> The motivation behind that is a Cassandra field that contains a packed,
>> variable-length data structure, a bit like a CSV string encoding multiple
>> rows of data
>> I can convert the raw char array into an inner bag of tuples but I need to
>> 'explode' it to work properly with it
>> 
>> I'm open to "don't do that, here's why it's a dumb idea", but it feels like
>> I'm missing an operator that could be used to implement this. I have a
>> partially-working solution using streaming, but the presence of new lines in
>> the chararray makes that approach tough. Any advice much appreciated.
>> 
>> cheers,
>>           Pete
>> 


Re: Converting an inner bag to an outer bag/relation?

Posted by Pete Warden <pe...@petewarden.com>.
Never mind, it looks like the FLATTEN operator should do the trick. I'd only
seen it with tuples, didn't realize it did what I needed with inner bags
until I RTFM-ed again.

On Fri, Oct 14, 2011 at 10:53 PM, Pete Warden <pe...@petewarden.com> wrote:

> Newbie question - I have an inner bag of tuples that I'd like to convert
> into an outer bag/relation and I'm struggling to figure out how
> For example if I have
> ({(1,2),(3,4),(5,6)}
> ({(7,8),(9,10)}
> I'd like it to become
> (1,2)
> (3,4)
> (5,6)
> (7,8)
> (9,10)
> The motivation behind that is a Cassandra field that contains a packed,
> variable-length data structure, a bit like a CSV string encoding multiple
> rows of data
> I can convert the raw char array into an inner bag of tuples but I need to
> 'explode' it to work properly with it
>
> I'm open to "don't do that, here's why it's a dumb idea", but it feels like
> I'm missing an operator that could be used to implement this. I have a
> partially-working solution using streaming, but the presence of new lines in
> the chararray makes that approach tough. Any advice much appreciated.
>
> cheers,
>            Pete
>