You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eli Finkelshteyn <ie...@gmail.com> on 2012/02/09 19:26:39 UTC

Flatten a Bag on One Line?

This is probably easy, but my PigLatin is rusty, and I don't seem to be 
able to find an answer on Google. If I have a record of the form:

     98812   3       {(48567859),(15996334),(15897772)}

How can I flatten that bag to leave all members on a single row, ie:

     98812    3    48567859    15996334    15897772

Cheers,
Eli

Re: Flatten a Bag on One Line?

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Hey Folks,
Sorry it took so long to get back on this. The function I wound up using 
is really simple:

@outputSchema("t:tuple()")
def bagToTuple(bag):
   t = tuple([item[0] for item in bag])
   return t

You would use this in PIG to get what I wanted by just running that 
function on a bag and then flattening the result, for example:

flattened_line = FOREACH line_with_bag GENERATE something, 
something_else, flatten(myfuncs.bagToTuple(some_bag));

Thejas, I created a JIRA for this here 
<https://issues.apache.org/jira/browse/PIG-2529>. This is the first one 
I've ever made, so please excuse me if I messed anything up in the format.

Cheers,
Eli

On 2/10/12 7:07 PM, Thejas Nair wrote:
> Pig doesn't have a piggybank for python udfs, but it makes sense to 
> create one.
> Please attach your udf to a a new jira, and we can figure where to put 
> it .
>
> -Thejas
>
>
> On 2/10/12 1:14 PM, Eli Finkelshteyn wrote:
>> I was going to do this as a python udf, but haven't had a chance yet
>> since other stuff I was working on took priority. As soon as I do write
>> it, I'll be sure to upload it here. On a related note: is there a
>> piggybank for python udfs I could contribute it to for posterity?
>>
>> Eli
>>
>> On 2/10/12 11:09 AM, pablomar wrote:
>>> what about something like this?
>>> (typing on the phone, forgive any mistake)
>>>
>>> public class Flat extends EvalFunc<Tuple>
>>> {
>>> public Tuple exec(Tuple input) throws IOException
>>> {
>>> try
>>> {
>>> List<Object> list = new LinkedList<Object>();
>>> DataBag bag = (DataBag)input.get(0);
>>> Iterator it = bag.iterator();
>>> while(it.hasNext())
>>> {
>>> Tuple t = (Tuple)it.next();
>>> if(t != null&& t.size()>0)
>>> list.add(t.get(0));
>>> }
>>>
>>> TupleFactory fac = TupleFactory.getInstance();
>>> return fac.newTuple(list);
>>> }
>>> catch....
>>>
>>> On 2/10/12, Brendan Gill<bm...@gmail.com> wrote:
>>>> Eli,
>>>>
>>>> I'm trying to do exactly this, but am pretty new to Pig. Any chance 
>>>> you
>>>> would share what the UDF would look like? Then I can tailor it to our
>>>> needs.
>>>>
>>>> Much appreciated if possible,
>>>>
>>>> Brendan
>>>>
>>>>
>>>>
>>>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<ie...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it
>>>>> is.
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>>>>
>>>>>> I actually can't think of an easy way to do this without it 
>>>>>> becoming a
>>>>>> cross product. You could just right a really simple udf that takes
>>>>>> a bag
>>>>>> and spits out just the members.
>>>>>>
>>>>>> Yulia
>>>>>>
>>>>>> On 2/9/12 1:26 PM, "Eli
>>>>>> Finkelshteyn"<ie...@gmail.com>>
>>>>>> wrote:
>>>>>>
>>>>>> This is probably easy, but my PigLatin is rusty, and I don't seem
>>>>>> to be
>>>>>>> able to find an answer on Google. If I have a record of the form:
>>>>>>>
>>>>>>> 98812 3 {(48567859),(15996334),(**15897772)}
>>>>>>>
>>>>>>> How can I flatten that bag to leave all members on a single row, 
>>>>>>> ie:
>>>>>>>
>>>>>>> 98812 3 48567859 15996334 15897772
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Eli
>>>>>>>
>>
>


Re: Flatten a Bag on One Line?

Posted by Thejas Nair <th...@hortonworks.com>.
Pig doesn't have a piggybank for python udfs, but it makes sense to 
create one.
Please attach your udf to a a new jira, and we can figure where to put it .

-Thejas


On 2/10/12 1:14 PM, Eli Finkelshteyn wrote:
> I was going to do this as a python udf, but haven't had a chance yet
> since other stuff I was working on took priority. As soon as I do write
> it, I'll be sure to upload it here. On a related note: is there a
> piggybank for python udfs I could contribute it to for posterity?
>
> Eli
>
> On 2/10/12 11:09 AM, pablomar wrote:
>> what about something like this?
>> (typing on the phone, forgive any mistake)
>>
>> public class Flat extends EvalFunc<Tuple>
>> {
>> public Tuple exec(Tuple input) throws IOException
>> {
>> try
>> {
>> List<Object> list = new LinkedList<Object>();
>> DataBag bag = (DataBag)input.get(0);
>> Iterator it = bag.iterator();
>> while(it.hasNext())
>> {
>> Tuple t = (Tuple)it.next();
>> if(t != null&& t.size()>0)
>> list.add(t.get(0));
>> }
>>
>> TupleFactory fac = TupleFactory.getInstance();
>> return fac.newTuple(list);
>> }
>> catch....
>>
>> On 2/10/12, Brendan Gill<bm...@gmail.com> wrote:
>>> Eli,
>>>
>>> I'm trying to do exactly this, but am pretty new to Pig. Any chance you
>>> would share what the UDF would look like? Then I can tailor it to our
>>> needs.
>>>
>>> Much appreciated if possible,
>>>
>>> Brendan
>>>
>>>
>>>
>>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<ie...@gmail.com>
>>> wrote:
>>>
>>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it
>>>> is.
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>>>
>>>>> I actually can't think of an easy way to do this without it becoming a
>>>>> cross product. You could just right a really simple udf that takes
>>>>> a bag
>>>>> and spits out just the members.
>>>>>
>>>>> Yulia
>>>>>
>>>>> On 2/9/12 1:26 PM, "Eli
>>>>> Finkelshteyn"<ie...@gmail.com>>
>>>>> wrote:
>>>>>
>>>>> This is probably easy, but my PigLatin is rusty, and I don't seem
>>>>> to be
>>>>>> able to find an answer on Google. If I have a record of the form:
>>>>>>
>>>>>> 98812 3 {(48567859),(15996334),(**15897772)}
>>>>>>
>>>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>>>
>>>>>> 98812 3 48567859 15996334 15897772
>>>>>>
>>>>>> Cheers,
>>>>>> Eli
>>>>>>
>


Re: Flatten a Bag on One Line?

Posted by Eli Finkelshteyn <ie...@gmail.com>.
I was going to do this as a python udf, but haven't had a chance yet 
since other stuff I was working on took priority. As soon as I do write 
it, I'll be sure to upload it here. On a related note: is there a 
piggybank for python udfs I could contribute it to for posterity?

Eli

On 2/10/12 11:09 AM, pablomar wrote:
> what about something like this?
> (typing on the phone, forgive any mistake)
>
> public class Flat extends EvalFunc<Tuple>
> {
> public Tuple exec(Tuple input) throws IOException
> {
> try
> {
> List<Object>  list = new LinkedList<Object>();
> DataBag bag = (DataBag)input.get(0);
> Iterator it = bag.iterator();
> while(it.hasNext())
> {
> Tuple t = (Tuple)it.next();
> if(t != null&&  t.size()>0)
> list.add(t.get(0));
> }
>
> TupleFactory fac = TupleFactory.getInstance();
> return fac.newTuple(list);
> }
> catch....
>
> On 2/10/12, Brendan Gill<bm...@gmail.com>  wrote:
>> Eli,
>>
>> I'm trying to do exactly this, but am pretty new to Pig.  Any chance you
>> would share what the UDF would look like?  Then I can tailor it to our
>> needs.
>>
>> Much appreciated if possible,
>>
>> Brendan
>>
>>
>>
>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<ie...@gmail.com>  wrote:
>>
>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
>>>
>>> Eli
>>>
>>>
>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>>
>>>> I actually can't think of an easy way to do this without it becoming a
>>>> cross product. You could just right a really simple udf that takes a bag
>>>> and spits out just the members.
>>>>
>>>> Yulia
>>>>
>>>> On 2/9/12 1:26 PM, "Eli
>>>> Finkelshteyn"<ie...@gmail.com>>
>>>>   wrote:
>>>>
>>>>   This is probably easy, but my PigLatin is rusty, and I don't seem to be
>>>>> able to find an answer on Google. If I have a record of the form:
>>>>>
>>>>>      98812   3       {(48567859),(15996334),(**15897772)}
>>>>>
>>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>>
>>>>>      98812    3    48567859    15996334    15897772
>>>>>
>>>>> Cheers,
>>>>> Eli
>>>>>


Re: Flatten a Bag on One Line?

Posted by pablomar <pa...@gmail.com>.
what about something like this?
(typing on the phone, forgive any mistake)

public class Flat extends EvalFunc <Tuple>
{
public Tuple exec(Tuple input) throws IOException
{
try
{
List <Object> list = new LinkedList<Object>();
DataBag bag = (DataBag)input.get(0);
Iterator it = bag.iterator();
while(it.hasNext())
{
Tuple t = (Tuple)it.next();
if(t != null && t.size()>0)
list.add(t.get(0));
}

TupleFactory fac = TupleFactory.getInstance();
return fac.newTuple(list);
}
catch....

On 2/10/12, Brendan Gill <bm...@gmail.com> wrote:
> Eli,
>
> I'm trying to do exactly this, but am pretty new to Pig.  Any chance you
> would share what the UDF would look like?  Then I can tailor it to our
> needs.
>
> Much appreciated if possible,
>
> Brendan
>
>
>
> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn <ie...@gmail.com> wrote:
>
>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
>>
>> Eli
>>
>>
>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>
>>> I actually can't think of an easy way to do this without it becoming a
>>> cross product. You could just right a really simple udf that takes a bag
>>> and spits out just the members.
>>>
>>> Yulia
>>>
>>> On 2/9/12 1:26 PM, "Eli
>>> Finkelshteyn"<ie...@gmail.com>>
>>>  wrote:
>>>
>>>  This is probably easy, but my PigLatin is rusty, and I don't seem to be
>>>> able to find an answer on Google. If I have a record of the form:
>>>>
>>>>     98812   3       {(48567859),(15996334),(**15897772)}
>>>>
>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>
>>>>     98812    3    48567859    15996334    15897772
>>>>
>>>> Cheers,
>>>> Eli
>>>>
>>>
>>
>

Re: Flatten a Bag on One Line?

Posted by Brendan Gill <bm...@gmail.com>.
Eli,

I'm trying to do exactly this, but am pretty new to Pig.  Any chance you
would share what the UDF would look like?  Then I can tailor it to our
needs.

Much appreciated if possible,

Brendan



On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn <ie...@gmail.com> wrote:

> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
>
> Eli
>
>
> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>
>> I actually can't think of an easy way to do this without it becoming a
>> cross product. You could just right a really simple udf that takes a bag
>> and spits out just the members.
>>
>> Yulia
>>
>> On 2/9/12 1:26 PM, "Eli Finkelshteyn"<ie...@gmail.com>>
>>  wrote:
>>
>>  This is probably easy, but my PigLatin is rusty, and I don't seem to be
>>> able to find an answer on Google. If I have a record of the form:
>>>
>>>     98812   3       {(48567859),(15996334),(**15897772)}
>>>
>>> How can I flatten that bag to leave all members on a single row, ie:
>>>
>>>     98812    3    48567859    15996334    15897772
>>>
>>> Cheers,
>>> Eli
>>>
>>
>

Re: Flatten a Bag on One Line?

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.

Eli

On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
> I actually can't think of an easy way to do this without it becoming a
> cross product. You could just right a really simple udf that takes a bag
> and spits out just the members.
>
> Yulia
>
> On 2/9/12 1:26 PM, "Eli Finkelshteyn"<ie...@gmail.com>  wrote:
>
>> This is probably easy, but my PigLatin is rusty, and I don't seem to be
>> able to find an answer on Google. If I have a record of the form:
>>
>>      98812   3       {(48567859),(15996334),(15897772)}
>>
>> How can I flatten that bag to leave all members on a single row, ie:
>>
>>      98812    3    48567859    15996334    15897772
>>
>> Cheers,
>> Eli


Re: Flatten a Bag on One Line?

Posted by Yulia Tolskaya <yu...@magnetic.is>.
I actually can't think of an easy way to do this without it becoming a
cross product. You could just right a really simple udf that takes a bag
and spits out just the members.

Yulia

On 2/9/12 1:26 PM, "Eli Finkelshteyn" <ie...@gmail.com> wrote:

>This is probably easy, but my PigLatin is rusty, and I don't seem to be
>able to find an answer on Google. If I have a record of the form:
>
>     98812   3       {(48567859),(15996334),(15897772)}
>
>How can I flatten that bag to leave all members on a single row, ie:
>
>     98812    3    48567859    15996334    15897772
>
>Cheers,
>Eli