You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2012/02/14 03:19:28 UTC

Is it safe to modify and return a bag that was given as input?

I feel like the answer is that it is not safe, but I'd like to make sure.
IE is the following ok, and if it is not, why not?

public DataBag exec(Tuple input) throws IOException {
  DataBag bag = (DataBag)input.get(0);
  long index=0;
  for (Tuple tuple : bag) {
    tuple.append(index++);
  }
  return bag;
}

Appreciate the guidance.

Re: Is it safe to modify and return a bag that was given as input?

Posted by Jonathan Coveney <jc...@gmail.com>.
I appreciate that. We should probably feature this prominently somewhere in
the documentation.

2012/2/14 Alan Gates <ga...@hortonworks.com>

> Originally Tuples were written to allow in place modifications.  Lately
> we've started doing things to tuples that would violate that, such as the
> work Dmitriy's done to use Tuples of specific types as an optimization in
> certain situations and the work we're doing in HCat to make tuple coming
> from HCat a thin wrapper over HCatRecord which is again a thin wrapper over
> a Hive SerDe.
>
> So, to actually answer your question, it's generally better to create a
> new tuple.
>
> Alan.
>
> On Feb 14, 2012, at 11:14 AM, Jonathan Coveney wrote:
>
> > Thanks, Alan. Is this the same case for Tuples? For example, if I were to
> > take Tuples from an input, append the value, then add that tuple to a new
> > bag, is that safe? Or can Tuples be modified after the fact as well?
> >
> > 2012/2/14 Alan Gates <ga...@hortonworks.com>
> >
> >> No.  Bags are written with the explicit assumption that once reading
> >> begins, there will never be another write to the bag.  This simplifies a
> >> lot of the code in the bags as far as spilling.
> >>
> >> Alan.
> >>
> >> On Feb 13, 2012, at 6:19 PM, Jonathan Coveney wrote:
> >>
> >>> I feel like the answer is that it is not safe, but I'd like to make
> sure.
> >>> IE is the following ok, and if it is not, why not?
> >>>
> >>> public DataBag exec(Tuple input) throws IOException {
> >>> DataBag bag = (DataBag)input.get(0);
> >>> long index=0;
> >>> for (Tuple tuple : bag) {
> >>>   tuple.append(index++);
> >>> }
> >>> return bag;
> >>> }
> >>>
> >>> Appreciate the guidance.
> >>
> >>
>
>

Re: Is it safe to modify and return a bag that was given as input?

Posted by Alan Gates <ga...@hortonworks.com>.
Originally Tuples were written to allow in place modifications.  Lately we've started doing things to tuples that would violate that, such as the work Dmitriy's done to use Tuples of specific types as an optimization in certain situations and the work we're doing in HCat to make tuple coming from HCat a thin wrapper over HCatRecord which is again a thin wrapper over a Hive SerDe.

So, to actually answer your question, it's generally better to create a new tuple.

Alan.

On Feb 14, 2012, at 11:14 AM, Jonathan Coveney wrote:

> Thanks, Alan. Is this the same case for Tuples? For example, if I were to
> take Tuples from an input, append the value, then add that tuple to a new
> bag, is that safe? Or can Tuples be modified after the fact as well?
> 
> 2012/2/14 Alan Gates <ga...@hortonworks.com>
> 
>> No.  Bags are written with the explicit assumption that once reading
>> begins, there will never be another write to the bag.  This simplifies a
>> lot of the code in the bags as far as spilling.
>> 
>> Alan.
>> 
>> On Feb 13, 2012, at 6:19 PM, Jonathan Coveney wrote:
>> 
>>> I feel like the answer is that it is not safe, but I'd like to make sure.
>>> IE is the following ok, and if it is not, why not?
>>> 
>>> public DataBag exec(Tuple input) throws IOException {
>>> DataBag bag = (DataBag)input.get(0);
>>> long index=0;
>>> for (Tuple tuple : bag) {
>>>   tuple.append(index++);
>>> }
>>> return bag;
>>> }
>>> 
>>> Appreciate the guidance.
>> 
>> 


Re: Is it safe to modify and return a bag that was given as input?

Posted by Jonathan Coveney <jc...@gmail.com>.
Thanks, Alan. Is this the same case for Tuples? For example, if I were to
take Tuples from an input, append the value, then add that tuple to a new
bag, is that safe? Or can Tuples be modified after the fact as well?

2012/2/14 Alan Gates <ga...@hortonworks.com>

> No.  Bags are written with the explicit assumption that once reading
> begins, there will never be another write to the bag.  This simplifies a
> lot of the code in the bags as far as spilling.
>
> Alan.
>
> On Feb 13, 2012, at 6:19 PM, Jonathan Coveney wrote:
>
> > I feel like the answer is that it is not safe, but I'd like to make sure.
> > IE is the following ok, and if it is not, why not?
> >
> > public DataBag exec(Tuple input) throws IOException {
> >  DataBag bag = (DataBag)input.get(0);
> >  long index=0;
> >  for (Tuple tuple : bag) {
> >    tuple.append(index++);
> >  }
> >  return bag;
> > }
> >
> > Appreciate the guidance.
>
>

Re: Is it safe to modify and return a bag that was given as input?

Posted by Alan Gates <ga...@hortonworks.com>.
No.  Bags are written with the explicit assumption that once reading begins, there will never be another write to the bag.  This simplifies a lot of the code in the bags as far as spilling.

Alan.

On Feb 13, 2012, at 6:19 PM, Jonathan Coveney wrote:

> I feel like the answer is that it is not safe, but I'd like to make sure.
> IE is the following ok, and if it is not, why not?
> 
> public DataBag exec(Tuple input) throws IOException {
>  DataBag bag = (DataBag)input.get(0);
>  long index=0;
>  for (Tuple tuple : bag) {
>    tuple.append(index++);
>  }
>  return bag;
> }
> 
> Appreciate the guidance.