You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Justin Santa Barbara <ju...@fathomdb.com> on 2009/11/11 05:47:20 UTC

Why Paranamer?

Given some of the complexities involved in paranamer builds, I'm wondering
why Avro doesn't use annotations instead.  It seems that parameter names are
only needed for Avro interfaces, and these are mostly generated from .avpr
files (rather than being hand-written).

I've done a quick implementation, and it seems to work well:
http://github.com/justinsb/avro/commit/b5532e3bf8967a6e97c795a480f74731945849e5

The actual @Named annotation is here:
http://github.com/justinsb/avro/blob/259047e9b51bdb690a8b2c0ce5693ba65ed94536/src/java/org/apache/avro/Named.java

So an example generated interface looks like this, and there's no need to
run Paranamer:

@SuppressWarnings("all")
public interface BulkData {
  ByteBuffer read() throws AvroRemoteException;
  Void write(@Named("data") ByteBuffer data) throws AvroRemoteException;
}

I'm guessing there's some historical reason here - can anyone fill me in on
the reasoning?

Justin

Re: Why Paranamer?

Posted by Justin Santa Barbara <ju...@fathomdb.com>.
I agree that it makes sense.  It also seems that for the goal of seamlessly
moving Hadoop interfaces to use Avro, that doesn't _require_ that the
parameter names be correct, only that they be consistent.  So, if we don't
have paranamer or Java-annotations, we can simply fall back to
auto-generated names like string1, string2, int1, like in Avro-164.  That
way we can get a simple build (I really hate complicated builds, and it'll
probably encourage avro adoption if we don't need build-system changes)

If in future someone changes the method signature of hadoop interface in a
way that would break serialization compatibility, then that might be the
time to require annotations (my preference) or paranamer, or that they move
to a .avpr!

It would suck a little that this would effectively bake in names like
'string2' into the contract for Hadoop interfaces, but if we can make the
barrier low enough, maybe someone would do a little work on the Hadoop
interfaces.  For me, a java annotation or manually adding the paranamer
metadata meets that threshold, adding a step to the build probably doesn't.


On the performance issue, I don't think reflection is actually that bad on
Java 6 if you cache the Method/Field objects.  But I think we should allow
external helper objects for serialization.  The helper would implement the
get/set field contract, but would act on a 'target object'.  This would
allow us to use existing classes (which could have extra methods/logic).  It
would also let us use the existing Hadoop objects unchanged.  We would have
an implementation that works using runtime Reflection, and we could also
code-generate these just like we do in the Specific case.  If people think
this is a good idea I'll open a JIRA ticket for it (and maybe even work on
it at the Hackathon@Digg!)

Justin





On Thu, Nov 12, 2009 at 10:43 PM, Philip Zeyliger <ph...@cloudera.com>wrote:

> Makes sense to me.  I think it may be useful to check in the .avpr files
> that are induced on the way, to let folks start trying to use different
> clients for certain operations.
>
> -- Philip
>
> On Wed, Nov 11, 2009 at 11:21 AM, Doug Cutting <cu...@apache.org> wrote:
>
> > Justin Santa Barbara wrote:
> >
> >> What about Philip's point on existing Hadoop interfaces?  Any plans for
> >> how
> >> we'll generate the Protocol object for these?
> >>
> >
> > I'm hoping to use reflection initially for this.  That's the motivation
> for
> > my renewed interest in AVRO-80.
> >
> > https://issues.apache.org/jira/browse/AVRO-80
> >
> > My rationale is that I don't want to assume we'll move Hadoop onto Avro
> > overnight.  So I'd like to move things in a way that's easy to maintain
> in a
> > branch/patch.  If we can get reflection to work, then we only need to
> update
> > two places per Hadoop protocol: where it calls RPC.getProxy() and
> > RPC.getServer().  Then we can start looking at performance.  We cannot
> > commit Avro-based Hadoop RPC until performance is adequate, and we don't
> > want to have to maintain a patch that changes many central data
> structures
> > in Hadoop while we're testing and improving performance, since that might
> > take time.
> >
> > Once we've committed Hadoop to using Avro, then we can consider,
> > protocol-by-protocol, replacing Hadoop's Writable objects with Avro
> > generated objects.  Until then, the protocol will be defined implicitly
> by
> > Java through reflection.
> >
> > Note that if performance is inadequate due to reflection itself, rather
> > than the client/server implementations, we might resort to byte-code
> > modification to accelerate it.
> >
> > https://issues.apache.org/jira/browse/AVRO-143
> >
> > This would also be a temporary approach.  Longer-term we should move
> Hadoop
> > to use protocols declared in .avpr files and generated classes. But I
> don't
> > think that's practical in the short-term.
> >
> > My current short-term goal is to try to get Avro's reflection to the
> point
> > where it can implement NamenodeProtocol.
> >
> > Does this make sense?
> >
> > Doug
> >
>

Re: Why Paranamer?

Posted by Philip Zeyliger <ph...@cloudera.com>.
Makes sense to me.  I think it may be useful to check in the .avpr files
that are induced on the way, to let folks start trying to use different
clients for certain operations.

-- Philip

On Wed, Nov 11, 2009 at 11:21 AM, Doug Cutting <cu...@apache.org> wrote:

> Justin Santa Barbara wrote:
>
>> What about Philip's point on existing Hadoop interfaces?  Any plans for
>> how
>> we'll generate the Protocol object for these?
>>
>
> I'm hoping to use reflection initially for this.  That's the motivation for
> my renewed interest in AVRO-80.
>
> https://issues.apache.org/jira/browse/AVRO-80
>
> My rationale is that I don't want to assume we'll move Hadoop onto Avro
> overnight.  So I'd like to move things in a way that's easy to maintain in a
> branch/patch.  If we can get reflection to work, then we only need to update
> two places per Hadoop protocol: where it calls RPC.getProxy() and
> RPC.getServer().  Then we can start looking at performance.  We cannot
> commit Avro-based Hadoop RPC until performance is adequate, and we don't
> want to have to maintain a patch that changes many central data structures
> in Hadoop while we're testing and improving performance, since that might
> take time.
>
> Once we've committed Hadoop to using Avro, then we can consider,
> protocol-by-protocol, replacing Hadoop's Writable objects with Avro
> generated objects.  Until then, the protocol will be defined implicitly by
> Java through reflection.
>
> Note that if performance is inadequate due to reflection itself, rather
> than the client/server implementations, we might resort to byte-code
> modification to accelerate it.
>
> https://issues.apache.org/jira/browse/AVRO-143
>
> This would also be a temporary approach.  Longer-term we should move Hadoop
> to use protocols declared in .avpr files and generated classes. But I don't
> think that's practical in the short-term.
>
> My current short-term goal is to try to get Avro's reflection to the point
> where it can implement NamenodeProtocol.
>
> Does this make sense?
>
> Doug
>

Re: Why Paranamer?

Posted by Doug Cutting <cu...@apache.org>.
Justin Santa Barbara wrote:
> What about Philip's point on existing Hadoop interfaces?  Any plans for how
> we'll generate the Protocol object for these?

I'm hoping to use reflection initially for this.  That's the motivation 
for my renewed interest in AVRO-80.

https://issues.apache.org/jira/browse/AVRO-80

My rationale is that I don't want to assume we'll move Hadoop onto Avro 
overnight.  So I'd like to move things in a way that's easy to maintain 
in a branch/patch.  If we can get reflection to work, then we only need 
to update two places per Hadoop protocol: where it calls RPC.getProxy() 
and RPC.getServer().  Then we can start looking at performance.  We 
cannot commit Avro-based Hadoop RPC until performance is adequate, and 
we don't want to have to maintain a patch that changes many central data 
structures in Hadoop while we're testing and improving performance, 
since that might take time.

Once we've committed Hadoop to using Avro, then we can consider, 
protocol-by-protocol, replacing Hadoop's Writable objects with Avro 
generated objects.  Until then, the protocol will be defined implicitly 
by Java through reflection.

Note that if performance is inadequate due to reflection itself, rather 
than the client/server implementations, we might resort to byte-code 
modification to accelerate it.

https://issues.apache.org/jira/browse/AVRO-143

This would also be a temporary approach.  Longer-term we should move 
Hadoop to use protocols declared in .avpr files and generated classes. 
But I don't think that's practical in the short-term.

My current short-term goal is to try to get Avro's reflection to the 
point where it can implement NamenodeProtocol.

Does this make sense?

Doug

Re: Why Paranamer?

Posted by Scott Carey <sc...@richrelevance.com>.


On 11/11/09 8:49 AM, "Justin Santa Barbara" <ju...@fathomdb.com> wrote:

> Caching the protocol seems a better approach than mine.  I did notice that
> the Protocol was being reconstructed using reflection (and there seem to be
> a few bugs around it, e.g.  Avro-171), so was thinking that maybe this
> should be done.  Happy that you did it for me!
> 
> What about Philip's point on existing Hadoop interfaces?  Any plans for how
> we'll generate the Protocol object for these?
> 
> Justin

Maybe Annotation Mix-ins are useful here?  You could define all the
necessary bits to connect an existing set of classes/interfaces and not
modify any of the original classes:
http://wiki.fasterxml.com/JacksonMixInAnnotations

I'm not familiar enough with the Avro implementation yet to know if I'm way
off on this however...

> 
> 
> 
> 
> On Wed, Nov 11, 2009 at 8:42 AM, Doug Cutting <cu...@apache.org> wrote:
> 
>> Justin Santa Barbara wrote:
>> 
>>> @SuppressWarnings("all")
>>> public interface BulkData {
>>>  ByteBuffer read() throws AvroRemoteException;
>>>  Void write(@Named("data") ByteBuffer data) throws AvroRemoteException;
>>> }
>>> 
>> 
>> I've taken a different approach in AVRO-185.
>> 
>>  https://issues.apache.org/jira/browse/AVRO-185
>> 
>> There I simply added the protocol to the generated interface, so reflection
>> need no longer be used to determine the method list, nor method parameter
>> names.
>> 
>> 
>>  I'm guessing there's some historical reason here - can anyone fill me in
>>> on
>>> the reasoning?
>>> 
>> 
>> The historic reason is that reflect was implemented before specific, and
>> specific was built to depend on reflect.  Reflect already had a means to get
>> the method list (including parameter names) from an interface, so none was
>> added for specific.
>> 
>> Doug
>> 
> 


Re: Why Paranamer?

Posted by Justin Santa Barbara <ju...@fathomdb.com>.
Caching the protocol seems a better approach than mine.  I did notice that
the Protocol was being reconstructed using reflection (and there seem to be
a few bugs around it, e.g.  Avro-171), so was thinking that maybe this
should be done.  Happy that you did it for me!

What about Philip's point on existing Hadoop interfaces?  Any plans for how
we'll generate the Protocol object for these?

Justin




On Wed, Nov 11, 2009 at 8:42 AM, Doug Cutting <cu...@apache.org> wrote:

> Justin Santa Barbara wrote:
>
>> @SuppressWarnings("all")
>> public interface BulkData {
>>  ByteBuffer read() throws AvroRemoteException;
>>  Void write(@Named("data") ByteBuffer data) throws AvroRemoteException;
>> }
>>
>
> I've taken a different approach in AVRO-185.
>
>  https://issues.apache.org/jira/browse/AVRO-185
>
> There I simply added the protocol to the generated interface, so reflection
> need no longer be used to determine the method list, nor method parameter
> names.
>
>
>  I'm guessing there's some historical reason here - can anyone fill me in
>> on
>> the reasoning?
>>
>
> The historic reason is that reflect was implemented before specific, and
> specific was built to depend on reflect.  Reflect already had a means to get
> the method list (including parameter names) from an interface, so none was
> added for specific.
>
> Doug
>

Re: Why Paranamer?

Posted by Doug Cutting <cu...@apache.org>.
Justin Santa Barbara wrote:
> @SuppressWarnings("all")
> public interface BulkData {
>   ByteBuffer read() throws AvroRemoteException;
>   Void write(@Named("data") ByteBuffer data) throws AvroRemoteException;
> }

I've taken a different approach in AVRO-185.

   https://issues.apache.org/jira/browse/AVRO-185

There I simply added the protocol to the generated interface, so 
reflection need no longer be used to determine the method list, nor 
method parameter names.

> I'm guessing there's some historical reason here - can anyone fill me in on
> the reasoning?

The historic reason is that reflect was implemented before specific, and 
specific was built to depend on reflect.  Reflect already had a means to 
get the method list (including parameter names) from an interface, so 
none was added for specific.

Doug

Re: Why Paranamer?

Posted by Philip Zeyliger <ph...@cloudera.com>.
Hi Justin,

I very much agree that it would be great to get rid of the paranamer
build step for the generated code.

> Given some of the complexities involved in paranamer builds, I'm wondering
> why Avro doesn't use annotations instead.  It seems that parameter names are
> only needed for Avro interfaces, and these are mostly generated from .avpr
> files (rather than being hand-written).

For the generated code, there's a ticket
(http://issues.apache.org/jira/browse/AVRO-164) to generate the
paranamer data as part of the code-generation, instead of needing a
paranamer build step.

> I'm guessing there's some historical reason here - can anyone fill me in on
> the reasoning?

I believe that the goal is to replace seamlessly how Hadoop handles
RPCs; that's done using reflection on existing interfaces (instead of
generating code from an .avpr file).  It's not practical to go back
and annotate each parameter in the existing code.  "Void
write(@Named("data") ByteBuffer data)" may be fine in generated code,
but it's a bit messy in normal code.

I like your implementation, and using annotations has a certain
elegance.  It'd be nice not to have two different code paths for
figuring out the parameters (though it wouldn't be the worst thing in
the world).  What do you think of the approach in AVRO-164, which
would be to simply generate the public static final String (or is it
String[]--don't remember) that Paranamer uses?

Cheers,

-- Philip