You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Ed Kohlwey <ek...@gmail.com> on 2012/05/18 17:45:35 UTC

FakeResolvingDecoder

Hi,
I'm working on updating Gora to Avro 1.7- I've mostly figured out what
I need to do, except whats happening in FakeResolvingDecoder.java.

Avro now uses a nice factory system which essentially prevents you
from extending some of these core classes, so a different workaround
will have to do.

It looks like this is basically a way to work around having dirty bits
added to the Avro protocol. Is that right? Has there been any
historical discussion of doing things differently like augmenting
record schemas to include dirty bits, or making the dirty bits a
transient member of a parent class? Or am I off base here?

Is there any augmenting done other than dirty bits?

Re: FakeResolvingDecoder

Posted by Enis Söztutar <en...@hortonworks.com>.
I am all up for not hijacking avro API's :)

Dirty-bits serialization came up first in Hadoop mapreduce, since we have
to serialize the data and the mutation state between tasks. I can think of
other cases, where you may want to serialize object-mutation state, where
you are passing the objects through the wire, but it is not a big use case
(compared to hadoop).

Currently, we are extending DatumReader/Writer's in avro, and define a
custom Hadoop serialization to use the DatumReader/Writers, which
effectively augments the on-wire data format cleanly. We can accomplish a
similar thing, without extending DatumReaders, but wrapping around them. I
believe DatumReader/Writer APIs should be visible, but I am not sure.
Otherwise we can use higher level public API's to do the serialization.

Cheers,
Enis

On Fri, May 18, 2012 at 2:17 PM, Ed Kohlwey <ek...@gmail.com> wrote:

> Enis,
> Thanks for the pointers. Are the dirty bits only used by Map/Reduce or
> for general persistence in terms of application logic? I guess in the
> latter case its ok for them to be transient, and if the only other use
> case is in Map/Reduce, something could maybe be done in the input and
> output formats to avoid fiddling with the pseudo-official Avro API's.
>
> On Fri, May 18, 2012 at 2:05 PM, Enis Söztutar <en...@hortonworks.com>
> wrote:
> > Hi Ed,
> >
> > Good to see some interest in pushing things forward.
> >
> > As the javadoc says, FakeResolvingDecoder is pretty much a big dirty hack
> > to work around Avro's internals, but as you pointed out much has changed
> in
> > Avro, so we may have to rethink those parts.
> >
> > We need the dirty bits in the serialization for mapreduce, but not for
> the
> > final serialization at the store (hbase, cassandra, etc). The reasoning
> is
> > that during map - reduce phases, we may mutate the objects in map, which
> is
> > serialized and deserialized from reduce and used there.
> >
> > I have not spend any time on the change in avro for some time, so cannot
> > comment on what would be the cleanest way to go. Either way, we can
> augment
> > the schema, or hijack DatumReaders/Writers. If you are willing to work on
> > this, I think it is best to find out what is public / stable in avro, and
> > extend those parts. When we first wrote these parts, avro was very young,
> > and it was not clear what was the public API. Maybe consulting avro
> folks,
> > and pushing for changes / hooks in avro so that things don't break is a
> > good option.
> >
> > I don't believe we need anything other that dirty bits to be augmented.
> If
> > you are planning to work on this, feel free to reach out.
> >
> > Cheers,
> > Enis
> >
> > On Fri, May 18, 2012 at 8:45 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> >
> >> Hi,
> >> I'm working on updating Gora to Avro 1.7- I've mostly figured out what
> >> I need to do, except whats happening in FakeResolvingDecoder.java.
> >>
> >> Avro now uses a nice factory system which essentially prevents you
> >> from extending some of these core classes, so a different workaround
> >> will have to do.
> >>
> >> It looks like this is basically a way to work around having dirty bits
> >> added to the Avro protocol. Is that right? Has there been any
> >> historical discussion of doing things differently like augmenting
> >> record schemas to include dirty bits, or making the dirty bits a
> >> transient member of a parent class? Or am I off base here?
> >>
> >> Is there any augmenting done other than dirty bits?
> >>
>

Re: FakeResolvingDecoder

Posted by Ed Kohlwey <ek...@gmail.com>.
Enis,
Thanks for the pointers. Are the dirty bits only used by Map/Reduce or
for general persistence in terms of application logic? I guess in the
latter case its ok for them to be transient, and if the only other use
case is in Map/Reduce, something could maybe be done in the input and
output formats to avoid fiddling with the pseudo-official Avro API's.

On Fri, May 18, 2012 at 2:05 PM, Enis Söztutar <en...@hortonworks.com> wrote:
> Hi Ed,
>
> Good to see some interest in pushing things forward.
>
> As the javadoc says, FakeResolvingDecoder is pretty much a big dirty hack
> to work around Avro's internals, but as you pointed out much has changed in
> Avro, so we may have to rethink those parts.
>
> We need the dirty bits in the serialization for mapreduce, but not for the
> final serialization at the store (hbase, cassandra, etc). The reasoning is
> that during map - reduce phases, we may mutate the objects in map, which is
> serialized and deserialized from reduce and used there.
>
> I have not spend any time on the change in avro for some time, so cannot
> comment on what would be the cleanest way to go. Either way, we can augment
> the schema, or hijack DatumReaders/Writers. If you are willing to work on
> this, I think it is best to find out what is public / stable in avro, and
> extend those parts. When we first wrote these parts, avro was very young,
> and it was not clear what was the public API. Maybe consulting avro folks,
> and pushing for changes / hooks in avro so that things don't break is a
> good option.
>
> I don't believe we need anything other that dirty bits to be augmented. If
> you are planning to work on this, feel free to reach out.
>
> Cheers,
> Enis
>
> On Fri, May 18, 2012 at 8:45 AM, Ed Kohlwey <ek...@gmail.com> wrote:
>
>> Hi,
>> I'm working on updating Gora to Avro 1.7- I've mostly figured out what
>> I need to do, except whats happening in FakeResolvingDecoder.java.
>>
>> Avro now uses a nice factory system which essentially prevents you
>> from extending some of these core classes, so a different workaround
>> will have to do.
>>
>> It looks like this is basically a way to work around having dirty bits
>> added to the Avro protocol. Is that right? Has there been any
>> historical discussion of doing things differently like augmenting
>> record schemas to include dirty bits, or making the dirty bits a
>> transient member of a parent class? Or am I off base here?
>>
>> Is there any augmenting done other than dirty bits?
>>

Re: FakeResolvingDecoder

Posted by Enis Söztutar <en...@hortonworks.com>.
Hi Ed,

Good to see some interest in pushing things forward.

As the javadoc says, FakeResolvingDecoder is pretty much a big dirty hack
to work around Avro's internals, but as you pointed out much has changed in
Avro, so we may have to rethink those parts.

We need the dirty bits in the serialization for mapreduce, but not for the
final serialization at the store (hbase, cassandra, etc). The reasoning is
that during map - reduce phases, we may mutate the objects in map, which is
serialized and deserialized from reduce and used there.

I have not spend any time on the change in avro for some time, so cannot
comment on what would be the cleanest way to go. Either way, we can augment
the schema, or hijack DatumReaders/Writers. If you are willing to work on
this, I think it is best to find out what is public / stable in avro, and
extend those parts. When we first wrote these parts, avro was very young,
and it was not clear what was the public API. Maybe consulting avro folks,
and pushing for changes / hooks in avro so that things don't break is a
good option.

I don't believe we need anything other that dirty bits to be augmented. If
you are planning to work on this, feel free to reach out.

Cheers,
Enis

On Fri, May 18, 2012 at 8:45 AM, Ed Kohlwey <ek...@gmail.com> wrote:

> Hi,
> I'm working on updating Gora to Avro 1.7- I've mostly figured out what
> I need to do, except whats happening in FakeResolvingDecoder.java.
>
> Avro now uses a nice factory system which essentially prevents you
> from extending some of these core classes, so a different workaround
> will have to do.
>
> It looks like this is basically a way to work around having dirty bits
> added to the Avro protocol. Is that right? Has there been any
> historical discussion of doing things differently like augmenting
> record schemas to include dirty bits, or making the dirty bits a
> transient member of a parent class? Or am I off base here?
>
> Is there any augmenting done other than dirty bits?
>