You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eyal Allweil <ey...@yahoo.com.INVALID> on 2016/02/26 21:47:01 UTC

Accumulator behavior with two bags

I asked this question on Stack Overflow, but this is a better place to ask.
What happens when a tuple with more than one bag gets sent to a UDF that implements Accumulator? (and the accumulator should be used) Does this mean that the first bag gets sent in batches, but subsequent bags are sent in their entirety? Or all the bags get sent in batches? Or the accumulator isn't used?
Here's a link to the question there:
http://stackoverflow.com/questions/35610426/how-does-pig-handle-tuples-with-more-than-one-bag-when-using-the-accumulator
Thanks,Eyal


Re: Accumulator behavior with two bags

Posted by Eyal Allweil <ey...@yahoo.com.INVALID>.
Ok, let me state what I think happens (from looking at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage), and I'd be happy if someone could confirm or correct me.
It looks like no matter how many bags there are, if the accumulator is used the same amount of tuples are transferred for each bag, i.e., the first pig.accumulative.batchsize tuples, then the next, until all the bags are exhausted, and then getValue() will be called.

Is this right? 

    On Friday, February 26, 2016 10:47 PM, Eyal Allweil <ey...@yahoo.com> wrote:
 

 I asked this question on Stack Overflow, but this is a better place to ask.
What happens when a tuple with more than one bag gets sent to a UDF that implements Accumulator? (and the accumulator should be used) Does this mean that the first bag gets sent in batches, but subsequent bags are sent in their entirety? Or all the bags get sent in batches? Or the accumulator isn't used?
Here's a link to the question there:
http://stackoverflow.com/questions/35610426/how-does-pig-handle-tuples-with-more-than-one-bag-when-using-the-accumulator
Thanks,Eyal