You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 01:16:29 UTC

[jira] [Created] (AVRO-911) remove object reuse from Java APIs

remove object reuse from Java APIs
----------------------------------

                 Key: AVRO-911
                 URL: https://issues.apache.org/jira/browse/AVRO-911
             Project: Avro
          Issue Type: Improvement
          Components: java
            Reporter: Doug Cutting
            Assignee: Doug Cutting
             Fix For: 1.6.0


Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.

If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Doug Cutting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121585#comment-13121585 ] 

Doug Cutting commented on AVRO-911:
-----------------------------------

Todd: I'll try some longer runs with verbose GC.
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121674#comment-13121674 ] 

Scott Carey commented on AVRO-911:
----------------------------------

Object reuse is also hard in some of the work that I am doing (or... have not had time to do in months) in AVRO-859.  Trying to apply object re-use to complicated object graphs is not very beneficial.  Additionally making such object graphs as immutable as possible has performance gains of its own and simplifies code.

In simple cases, re-use can have big gains.  These mostly boil down to avoiding boxing of small primitives.  Here, you go from allocating something to allocating nothing.
For Utf8, we have to copy out a byte[] from the stream, so the Utf8 object allocation is only a small portion of the total allocated, unless it is an empty string and we were to re-use an empty byte[].

Delaying or avoiding Utf8 <-> String conversion is very beneficial however.  I use Utf8 in many places now for this purpose.  
I support Avro removing object re-use for the general case.  Specializations for mutable boxed primitives or even simply returning / accepting primitives are something we can add later.
The low level read and write should have options for dealing with String as well as Utf8.  Higher level APIs can choose either (for example, one might have two different SpecificCompiler templates, or switch the type based on an annotation in AvroIDL).


As far as EscapeAnalysis introducing object allocation elision, this won't affect most use cases here.  It would if you create a new object, call a method on it, then throw it away within the scope of a method or loop, and in a few slightly larger scopes.

                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122209#comment-13122209 ] 

Scott Carey commented on AVRO-911:
----------------------------------

For a record that is filled with primitives, re-use is also valuable.  There are a few cases where it does not work so well:
Unions
string/bytes
Maps

In the above cases either it is difficult or unlikely for reuse to be effective (Unions, Maps) or re-use has small benefit and causes unexpected behavior (String, Bytes)

What about the following simplification:

* Utf8 is not re-used.  (we may be able to re-use the empty string with itself, or make Utf8 immutable and re-use equivalent copies, but that is not in scope here)
* byte arrays are not re-used
* maps are not re-used

Arrays and Records, where the biggest gain of re-use is possible, remains.


                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Douglas Kaminsky (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121695#comment-13121695 ] 

Douglas Kaminsky commented on AVRO-911:
---------------------------------------

Object reuse is not about a direct throughput/latency improvement, it's about a consistent processing velocity. Unpredictable GC causes disruptions in processing speed, which is unacceptable for real time systems (at least, from the perspective of my current project), where we are measuring latency in sub-millisecond scale.

I would support refactoring the reuse behavior into a subclass of the existing reader, but not eliminating it.
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Milind Bhandarkar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121643#comment-13121643 ] 

Milind Bhandarkar commented on AVRO-911:
----------------------------------------

Todd, indeed GC kicking in for a lot of objects will slow tasks somewhat. Where I came from, the basic data type that was being fed as a value was not a simple int or text. It was a complex structure that had a map, a list of maps, and a map of maps. The Value object reuse in reducer meant that only the top level object was reused. Everything underneath had to be reallocated anyway.

Plus number of people cloning their objects were doing it incorrectly.

Considering the complexity, and as you correctly point out in your blog, making sure that GC does not trigger during the lifetime of the task by allocating bigger heap, meant that reusing the objects were not a worthwhile optimization.
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Doug Cutting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121582#comment-13121582 ] 

Doug Cutting commented on AVRO-911:
-----------------------------------

Does anyone have benchmarks where Avro's object reuse of Utf8's demonstrates a significant performance advantage?
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121604#comment-13121604 ] 

Todd Lipcon commented on AVRO-911:
----------------------------------

Which JDK6? :)
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-911) remove object reuse from Java APIs

Posted by "Doug Cutting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-911:
------------------------------

    Attachment: perf-reuse.patch

Enabling verbose GC output shows that there are >~20 GCs during a benchmark, so I think its running long enough.  Dunno about the escape analysis.  I'm using JDK 6 and Scott's Perf.java with -G and -s, with and without the attached patch.
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121580#comment-13121580 ] 

Todd Lipcon commented on AVRO-911:
----------------------------------

Which JVM are you using? Is it one that enables escape analysis by default? Do the perf tests run long enough to trigger several rounds of young generation GC? Performance should be equal except that object reuse will have fewer young gen GCs (and thus have better throughput in the long run)
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Doug Cutting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121618#comment-13121618 ] 

Doug Cutting commented on AVRO-911:
-----------------------------------

java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Server VM (build 17.1-b03, mixed mode)

                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121635#comment-13121635 ] 

Todd Lipcon commented on AVRO-911:
----------------------------------

At least as of about a year ago, reusing objects was still worth doing. See "Tip 6" on this blog post I did: http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121625#comment-13121625 ] 

Todd Lipcon commented on AVRO-911:
----------------------------------

hm, ok, EscapeAnalysis is on by default starting with u23 afaik. So I'm surprised you don't see any real difference.
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (AVRO-911) remove object reuse from Java APIs

Posted by "Doug Cutting (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved AVRO-911.
-------------------------------

       Resolution: Won't Fix
    Fix Version/s:     (was: 1.6.0)
     Hadoop Flags:   (was: Incompatible change)

We have another way to resolve AVRO-813 without removing reuse.  Several folks seem to want to keep reuse in Avro's Java API's, so I will resolve this as "Won't Fix".
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-911) remove object reuse from Java APIs

Posted by "Milind Bhandarkar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121630#comment-13121630 ] 

Milind Bhandarkar commented on AVRO-911:
----------------------------------------

I always believed that object reuse (as was done in mapreduce a few years ago), was premature optimization. Indeed, in mapreduce too, even now people get surprised when iter.getNext() returns the same object reference everytime, and causes a lot of headache. So, I think removing object reuse in Avro is a great step forward.
                
> remove object reuse from Java APIs
> ----------------------------------
>
>                 Key: AVRO-911
>                 URL: https://issues.apache.org/jira/browse/AVRO-911
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.6.0
>
>         Attachments: perf-reuse.patch
>
>
> Avro's Java APIs were designed to permit object reuse when reading with the assumption that would provide performance advantages.  In particular, the old parameter in DatumReader<T>.read(T old, Decoder), the Utf8 class, and the GenericArray.peek() method were all designed for this purpose.  But I am unable to see significant performance improvements when objects are reused.  I tried modifying Perf.java's GenericTest to reuse records, and its StringTest to not reuse Utf8 instances and, in both cases, performance is not substantially altered.
> If we were to remove these then issues such as AVRO-803 would disappear.  Always using java.lang.String instead of Utf8 would remove a lot of user confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira