You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Sandy Ryza <sa...@cloudera.com> on 2013/06/10 02:06:56 UTC

emitting the same object with different internals

Will the following code work in Crunch?

---
private SomeMutableObject smo;

public void process(Integer input, Emitter<SomeMutableObject> emitter) {
  smo.mutate(input);
  emitter.emit(smo);
}
---

i.e. will the object be written/copied when emit is called is called so
that changes to it in a later call of the process function won't change
what was emitted in an earlier one?


thanks for any help!
Sandy

Re: emitting the same object with different internals

Posted by Josh Wills <jw...@cloudera.com>.
A single input record will flow through all of the DoFns that are contained
within the map/reduce stage of the computation before another record is
processed, so mutating an object and then passing it along is usually a
safe operation in Crunch. It will be serialized to the output from that
stage before the next output is processed.

That said, I generally prefer immutable objects, or some sort of builder
pattern that allows you to easily convert an immutable object into a
mutable form and then create another immutable object after you make
changes to the mutable builder. I always find myself doing something like
caching a collection of objects at some point in my pipeline, and when I
do, the use of mutable objects ends up biting me. The PType class has a
method, getDetachedValue, which allows you to safely copy an object into a
different type, and we make liberal use of it in the internal libraries
when we need to do some caching and can't be sure of whether or not the
input object is immutable.

Josh


On Sun, Jun 9, 2013 at 5:06 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Will the following code work in Crunch?
>
> ---
> private SomeMutableObject smo;
>
> public void process(Integer input, Emitter<SomeMutableObject> emitter) {
>   smo.mutate(input);
>   emitter.emit(smo);
> }
> ---
>
> i.e. will the object be written/copied when emit is called is called so
> that changes to it in a later call of the process function won't change
> what was emitted in an earlier one?
>
>
> thanks for any help!
> Sandy
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>