You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2011/01/06 06:45:45 UTC

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978151#action_12978151 ] 

Scott Carey commented on AVRO-726:
----------------------------------

We should definitely consider this so that such extensions are possible.

As for the use case above, the latest version of Avro uses Velocity templates for SpecificRecord generation -- so you could generate classes with all the getter/setters you want.  One of the items I wanted to get to was to experiment with ways to remove the boxing/unboxing overhead for these objects.  IndexedRecord neatly simplifies access to fields but has the boxing overhead.

Are you willing to share or contribute what you have done so far?

There are some approaches I have considered.   I was going to take a stab at 'merging' the Reflect and Specific API by way of annotations to get rid of this overhead -- essentially have the code gen create annotated classes that specified how the schema maps to the fields -- and have reflect consume those.  Later down the road, Reflect could use cgilib and/or asm to create serialization / deserialization classes on the fly that would be compiled to very very efficient code and be faster than the current Specific API.

Lastly, note that autoboxing/unboxing in some cases has become 'free' on the latest Sun JVM with '-XX:+UseEscapeAnalysis' on.  This will default to on in a later JVM release.   If the object is boxed manually (using new Integer(), not Integer.valueOf()) and the object does not escape a small enough code block, the JVM will avoid object creation entirely.   This can help some of the IndexedRecord API usage.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Ye, Xiaolu - GMRT-EST" <xi...@baml.com>.
+Rich who manages our project.

Hi Scott,

Glad to hear that you are willing to make the change. Yes, we are aware of the velocity template change. We've already applied that to our project. 

The way we did it is that we've created SpecificRecordEx that extends SpecificRecord and added setter/getter of primitive types to it. All our generated classes are derived from SpecificRecordEx. We then added customized compiler and templates to generate code with setter/getter for primitive types based on schema. In addition, we've added dispatch function to SpecificRecordEx to allow register callbacks, created Builder class for each impl class to ease object creation and added template to generate interface. 

We are more than happy to share what we have done. We would love to see this become part of Avro if you find it useful. After all, it's good for us as well as we don't need to maintain that ourselves when the project evolves. 

I will supply the patch shortly. But my current code is based on cyclical reference patch (https://issues.apache.org/jira/browse/AVRO-695). I noticed that this hasn't been integrated to trunk. Is there a plan to do it? Otherwise, I would need to strip that out first.

By the way, our project also might need .NET support in Avro. Would you be able to let me know the priority for https://issues.apache.org/jira/browse/AVRO-533? 

Thanks,

Xiaolu

-----Original Message-----
From: Scott Carey (JIRA) [mailto:jira@apache.org] 
Sent: Thursday, January 06, 2011 12:46 AM
To: dev@avro.apache.org
Subject: [jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes


    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978151#action_12978151 ] 

Scott Carey commented on AVRO-726:
----------------------------------

We should definitely consider this so that such extensions are possible.

As for the use case above, the latest version of Avro uses Velocity templates for SpecificRecord generation -- so you could generate classes with all the getter/setters you want.  One of the items I wanted to get to was to experiment with ways to remove the boxing/unboxing overhead for these objects.  IndexedRecord neatly simplifies access to fields but has the boxing overhead.

Are you willing to share or contribute what you have done so far?

There are some approaches I have considered.   I was going to take a stab at 'merging' the Reflect and Specific API by way of annotations to get rid of this overhead -- essentially have the code gen create annotated classes that specified how the schema maps to the fields -- and have reflect consume those.  Later down the road, Reflect could use cgilib and/or asm to create serialization / deserialization classes on the fly that would be compiled to very very efficient code and be faster than the current Specific API.

Lastly, note that autoboxing/unboxing in some cases has become 'free' on the latest Sun JVM with '-XX:+UseEscapeAnalysis' on.  This will default to on in a later JVM release.   If the object is boxed manually (using new Integer(), not Integer.valueOf()) and the object does not escape a small enough code block, the JVM will avoid object creation entirely.   This can help some of the IndexedRecord API usage.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

----------------------------------------------------------------------
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you consent to the foregoing.