You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Xiaolu Ye (JIRA)" <ji...@apache.org> on 2011/01/05 22:50:47 UTC

[jira] Created: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
--------------------------------------------------------------------------------------------------------------

                 Key: AVRO-726
                 URL: https://issues.apache.org/jira/browse/AVRO-726
             Project: Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.5.0
            Reporter: Xiaolu Ye
             Fix For: 1.5.0


Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
Thanks,

Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Ye, Xiaolu - GMRT-EST" <xi...@baml.com>.
+Rich who manages our project.

Hi Scott,

Glad to hear that you are willing to make the change. Yes, we are aware of the velocity template change. We've already applied that to our project. 

The way we did it is that we've created SpecificRecordEx that extends SpecificRecord and added setter/getter of primitive types to it. All our generated classes are derived from SpecificRecordEx. We then added customized compiler and templates to generate code with setter/getter for primitive types based on schema. In addition, we've added dispatch function to SpecificRecordEx to allow register callbacks, created Builder class for each impl class to ease object creation and added template to generate interface. 

We are more than happy to share what we have done. We would love to see this become part of Avro if you find it useful. After all, it's good for us as well as we don't need to maintain that ourselves when the project evolves. 

I will supply the patch shortly. But my current code is based on cyclical reference patch (https://issues.apache.org/jira/browse/AVRO-695). I noticed that this hasn't been integrated to trunk. Is there a plan to do it? Otherwise, I would need to strip that out first.

By the way, our project also might need .NET support in Avro. Would you be able to let me know the priority for https://issues.apache.org/jira/browse/AVRO-533? 

Thanks,

Xiaolu

-----Original Message-----
From: Scott Carey (JIRA) [mailto:jira@apache.org] 
Sent: Thursday, January 06, 2011 12:46 AM
To: dev@avro.apache.org
Subject: [jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes


    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978151#action_12978151 ] 

Scott Carey commented on AVRO-726:
----------------------------------

We should definitely consider this so that such extensions are possible.

As for the use case above, the latest version of Avro uses Velocity templates for SpecificRecord generation -- so you could generate classes with all the getter/setters you want.  One of the items I wanted to get to was to experiment with ways to remove the boxing/unboxing overhead for these objects.  IndexedRecord neatly simplifies access to fields but has the boxing overhead.

Are you willing to share or contribute what you have done so far?

There are some approaches I have considered.   I was going to take a stab at 'merging' the Reflect and Specific API by way of annotations to get rid of this overhead -- essentially have the code gen create annotated classes that specified how the schema maps to the fields -- and have reflect consume those.  Later down the road, Reflect could use cgilib and/or asm to create serialization / deserialization classes on the fly that would be compiled to very very efficient code and be faster than the current Specific API.

Lastly, note that autoboxing/unboxing in some cases has become 'free' on the latest Sun JVM with '-XX:+UseEscapeAnalysis' on.  This will default to on in a later JVM release.   If the object is boxed manually (using new Integer(), not Integer.valueOf()) and the object does not escape a small enough code block, the JVM will avoid object creation entirely.   This can help some of the IndexedRecord API usage.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

----------------------------------------------------------------------
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you consent to the foregoing.

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Xiaolu Ye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996085#comment-12996085 ] 

Xiaolu Ye commented on AVRO-726:
--------------------------------

Hi Scott,

Sorry, just saw your comment. I'll supply a patch shortly.

Meanwhile, is it possible to make those data members protected in 1.5 anyway? Noticed as of now, it's not in the blocker list. 

Thanks,

Xiaolu

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-726:
------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 1.5.1)
                   1.5.0
         Assignee: Doug Cutting  (was: Scott Carey)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I committed this.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Doug Cutting
>             Fix For: 1.5.0
>
>         Attachments: AVRO-726.1.patch, AVRO-726.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Xiaolu Ye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996589#comment-12996589 ] 

Xiaolu Ye commented on AVRO-726:
--------------------------------

Hi Scott,

I've created jira AVRO-777 for the enhancement. Just want to mention that in addition to change the data member to protected, could you also look at whether some of the member functions could be made protected like npe() in GenericDatumWriter? I needed to use npe in my derived class and had to change GenericDatumWriter in my local copy.

Thanks,

Xiaolu

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998011#comment-12998011 ] 

Scott Carey commented on AVRO-726:
----------------------------------

Its because we have some more discussion/documenation before there is consensus and this does not break API.   We could get 1.5.1 out relatively fast if there is demand for it.   Also, if there are delays in 1.5.0 this could still slip back in, but it is not blocking the release.

Questions:
* Would a protected setter/getter work for you?  That would expose less of the internals than a protected member variable.
* Is it OK if the field is final?  One of the two was final before, and if they both can be, then only a getter is necessary.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-726:
------------------------------

    Fix Version/s:     (was: 1.5.0)
                   1.5.1

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998095#comment-12998095 ] 

Scott Carey commented on AVRO-726:
----------------------------------

{quote}
I've created jira AVRO-777 for the enhancement.
{quote}
In case anyone is looking for it, its AVRO-770.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-726:
-----------------------------

    Status: Patch Available  (was: Open)

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-726:
------------------------------

    Attachment: AVRO-726.patch

Here's a version that, instead of changing the data field to be protected, adds the method:

public GenericData getData();

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch, AVRO-726.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996090#comment-12996090 ] 

Scott Carey commented on AVRO-726:
----------------------------------

Yes, I'll look at getting just the basics: set those to protected, done for 1.5.0.  I can't guarantee it but it is the first thing I'll look at after the API changing blockers.

As for the patch, even if it only applies against 1.4.1, as long as it is posted here and has the granted Apache license we can review and start talking about incorporating it or using it as a base to build similar features that solve your use cases.  You might want to start a new JIRA for that, and we'll leave this one for making the members protected only.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Xiaolu Ye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979796#action_12979796 ] 

Xiaolu Ye commented on AVRO-726:
--------------------------------

Hi Scott,

As said in the email, I'm more than happy to share what we've done. Here's the summary:

1) Introduced SpecificRecordEx interface that extends SpecificRecord and added setter/getter of primitive types in it. 
2) All generated classes are derived from SpecificRecordEx with impl class has setter/getter of primitive types auto-generated based on the schema
3) Auto generate interface class for record type with user friendly setter/getter of proper types and maintain inheritance structure based on supertypes definition in the schema
4) Support callback dispatch by introducing Cbk interface in SpecificRecordEx with dispatch(Cbk) function. All derived interface has its own Cbk interface derived from SpecificRecordEx.Cbk, and its impl class implements dispatch(Cbk) to invoke the proper callback. 
5) Auto generate Builder class for each impl class 

If you think this is in the right direction of what you want, I'm more than happy to supply a patch. Also interested to get your input for any enhancement. Please let me know. I have taked out cyclical reference patch from my implemenation based on comments from AVRO-695 that Moustapha will change its impl.

Thanks,

Xiaolu

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Xiaolu Ye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997984#comment-12997984 ] 

Xiaolu Ye commented on AVRO-726:
--------------------------------

For us, making the fields protected is good enough.

Btw, I noticed the fix version is now 1.5.1 for this issue. Is it because of the npe function I requested for GenericDatumWriter?

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey reassigned AVRO-726:
--------------------------------

    Assignee: Scott Carey

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985210#action_12985210 ] 

Scott Carey commented on AVRO-726:
----------------------------------

Hi Xiaolu,


These features sound very interesting and useful.  I would love to see a patch and figure out how to incorporate similar features.   Even if we go in a slightly different direction, I think all your needs are things I would like to address:

1/2)  reduce cost of autoboxing and memory footprint of objects. 
3) Solutions for mapping inheritance to avro.
4) Callbacks -- I would need to see what you did here and some example use cases -- are these for users or the internal framework?
5) Generating code that instantiates objects with the builder pattern and avoids publicly exposed variables has been discussed many times here, I definitely want to have that built-in to Avro.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978151#action_12978151 ] 

Scott Carey commented on AVRO-726:
----------------------------------

We should definitely consider this so that such extensions are possible.

As for the use case above, the latest version of Avro uses Velocity templates for SpecificRecord generation -- so you could generate classes with all the getter/setters you want.  One of the items I wanted to get to was to experiment with ways to remove the boxing/unboxing overhead for these objects.  IndexedRecord neatly simplifies access to fields but has the boxing overhead.

Are you willing to share or contribute what you have done so far?

There are some approaches I have considered.   I was going to take a stab at 'merging' the Reflect and Specific API by way of annotations to get rid of this overhead -- essentially have the code gen create annotated classes that specified how the schema maps to the fields -- and have reflect consume those.  Later down the road, Reflect could use cgilib and/or asm to create serialization / deserialization classes on the fly that would be compiled to very very efficient code and be faster than the current Specific API.

Lastly, note that autoboxing/unboxing in some cases has become 'free' on the latest Sun JVM with '-XX:+UseEscapeAnalysis' on.  This will default to on in a later JVM release.   If the object is boxed manually (using new Integer(), not Integer.valueOf()) and the object does not escape a small enough code block, the JVM will avoid object creation entirely.   This can help some of the IndexedRecord API usage.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000439#comment-13000439 ] 

Scott Carey commented on AVRO-726:
----------------------------------

+1
Looks good, this version is simpler and making the data field final in both cases is an improvement.


> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch, AVRO-726.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Xiaolu Ye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996379#comment-12996379 ] 

Xiaolu Ye commented on AVRO-726:
--------------------------------

sounds good. i'll create a jira and summit the part as part of that. 

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>             Fix For: 1.5.0
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Xiaolu Ye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998018#comment-12998018 ] 

Xiaolu Ye commented on AVRO-726:
--------------------------------

Both should be fine. Thanks.

Just want to confirm, in 1.5.1, you would make GenericDatumWriter.npe function protected as well? In my derived class, I override writeRecord which throws npe as in the super class. Thanks!

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-726:
-----------------------------

    Attachment: AVRO-726.1.patch

Simple patch that exposes the data member variables to child classes.  Also exposes a couple helper methods to child classes.

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-726) Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997630#comment-12997630 ] 

Doug Cutting commented on AVRO-726:
-----------------------------------

Protected ites should have javadoc.  Also, might we provide getters and setters instead of just making the fields protected?

> Make GenericDatumReader/GenericDatumWriter data member protected so that it can be used by the derived classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-726
>                 URL: https://issues.apache.org/jira/browse/AVRO-726
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Xiaolu Ye
>            Assignee: Scott Carey
>             Fix For: 1.5.1
>
>         Attachments: AVRO-726.1.patch
>
>
> Currently, GenericDatumReader/GenericDatumWriter data members are private. Is it possible to make them protected so that we could extend those classes and create our own special DatumReader/Writer? The reason we want to do that is because we've created our own base SpecificRecordEx that implements SpecificRecord and added put/get for primitive types. We now want to extend the GenericDatumReader/GenericDatumWriter to use those primitive put/get functions to reduce box/unbox for better performance. 
> Thanks,
> Xiaolu

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira