You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2011/01/24 08:26:43 UTC

[jira] Created: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Java: Performance Regression and memory pressure with GenericDatumReader
------------------------------------------------------------------------

                 Key: AVRO-743
                 URL: https://issues.apache.org/jira/browse/AVRO-743
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.5.0
            Reporter: Scott Carey
            Priority: Critical
             Fix For: 1.5.0


AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.

Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.

This is associated with memory bloat related to ThreadLocals.

More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991201#comment-12991201 ] 

Scott Carey commented on AVRO-743:
----------------------------------

With a larger heap, it completes but slowly.  There is still a large regression.  Using the new Perf.java, here are full read results with and without the AVRO-650 changes.  

args: -nowrite -server -Xmx256m -Xms256m -XX:+UseParallelGC -XX:+UseCompressedOops -XX:+DoEscapeAnalysis

Only the generic results are below -- only the "one time use" reader tests are affected and other generic tests are a good reference.
{code}
                  GenericRead:  1811 ms,      3.680 million entries/sec.    142.805 million bytes/sec
           GenericNested_Read:  3015 ms,      2.211 million entries/sec.     85.801 million bytes/sec
      GenericWithDefault_Read:  3253 ms,      2.049 million entries/sec.     79.532 million bytes/sec
   GenericWithOutOfOrder_Read:  1855 ms,      3.594 million entries/sec.    139.472 million bytes/sec
    GenericWithPromotion_Read:  1962 ms,      3.397 million entries/sec.    131.853 million bytes/sec
GenericOneTimeDecoderUse_Read:  1791 ms,      3.721 million entries/sec.    144.426 million bytes/sec
 GenericOneTimeReaderUse_Read:  6989 ms,      0.954 million entries/sec.     37.014 million bytes/sec
       GenericOneTimeUse_Read:  7373 ms,      0.904 million entries/sec.     35.088 million bytes/sec
{code}

If I revert AVRO-650, I get:
{code}
                  GenericRead:  1808 ms,      3.687 million entries/sec.    143.076 million bytes/sec
           GenericNested_Read:  2872 ms,      2.321 million entries/sec.     90.062 million bytes/sec
      GenericWithDefault_Read:  3389 ms,      1.967 million entries/sec.     76.340 million bytes/sec
   GenericWithOutOfOrder_Read:  1805 ms,      3.693 million entries/sec.    143.319 million bytes/sec
    GenericWithPromotion_Read:  1978 ms,      3.369 million entries/sec.    130.759 million bytes/sec
GenericOneTimeDecoderUse_Read:  1803 ms,      3.696 million entries/sec.    143.443 million bytes/sec
 GenericOneTimeReaderUse_Read:  2289 ms,      2.912 million entries/sec.    113.024 million bytes/sec
       GenericOneTimeUse_Read:  2299 ms,      2.899 million entries/sec.    112.501 million bytes/sec
{code}


To prevent the cases where GenericDatumReaders are created and disposed rapidly from causing this issue, I tried several things.  One was to remove the resolver cached in GenericDatumReader entirely and only use the global cache.  This was surprisingly fast, but slowed all Generic tests by 10% to 15%.
Any variation that creates a new threadLocal per instance of GenericDatumReader was bad.  An alternate attempt tried to instead keep one global ThreadLocal WeakReferenceCache with GenericDatumReader's as keys to track the relationship was faster, but still a large memory hog and performance problem.



This is still not 100% thread-safe, but it is no worse than before.   Since we allow mutating state in setSchema() and setExpected() the only way to be completely thread-safe is to synchronize those as well as their access .  Performance dropped quite a bit when I did that.  Longer term we need to make these objects immutable, and use a builder pattern when we don't know all the fields prior to construction.



> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992257#comment-12992257 ] 

Stu Hood commented on AVRO-743:
-------------------------------

If someone needed to use a GenericDatumReader in multiple threads after this patch, would the recommended approach be a ThreadLocal GDR for each consumer?

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: AVRO-743.patch
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-743:
-----------------------------

    Attachment: AVRO-743.patch

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: AVRO-743.patch
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-743:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed in revision 1068045.

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: AVRO-743.patch
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-743:
-----------------------------

    Status: Patch Available  (was: Open)

Patch implements a simple solution that worked best:

Keep track of the thread that created the GenericDatumReader, and allow only this thread to access the local resolver reference.  Other threads use the global resolver cache.

Performance results:
{noformat}
                  GenericRead:  1824 ms,      3.654 million entries/sec.    141.829 million bytes/sec
           GenericNested_Read:  2940 ms,      2.267 million entries/sec.     87.989 million bytes/sec
      GenericWithDefault_Read:  3364 ms,      1.981 million entries/sec.     76.898 million bytes/sec
   GenericWithOutOfOrder_Read:  1799 ms,      3.705 million entries/sec.    143.792 million bytes/sec
    GenericWithPromotion_Read:  2020 ms,      3.300 million entries/sec.    128.059 million bytes/sec
GenericOneTimeDecoderUse_Read:  1813 ms,      3.676 million entries/sec.    142.682 million bytes/sec
 GenericOneTimeReaderUse_Read:  2290 ms,      2.910 million entries/sec.    112.944 million bytes/sec
       GenericOneTimeUse_Read:  2253 ms,      2.959 million entries/sec.    114.831 million bytes/sec
{noformat}

It is just as thread-safe as the prior version.  If multiple threads are concurrently doing setSchema() or setExcpeted() other threads may not see the changes at the same time.  It should be stable, and eventually consistent.  If AVRO-650 was good enough, this should be too.  

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: AVRO-743.patch
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992290#comment-12992290 ] 

Scott Carey commented on AVRO-743:
----------------------------------

It should be as thread-safe as it was before the patch, so what worked before should continue to work.   

In general, per thread is probably the way to go.  A static ThreadLocal<GenericDatumReader> will work, or tie it to the consumer in the app in a more natural way (part of a thread pool?) if there is one.

The fact is, the API is mutable, and if one thread calls setSchema() while another is reading a datum, that contract is never going to be great.  Even fully synchronized it doesn't make a lot of sense to do that.  

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: AVRO-743.patch
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985550#action_12985550 ] 

Scott Carey commented on AVRO-743:
----------------------------------

On trunk, Perf.java does not complete in reasonable time for me.  I can make some changes that help, but these get the following performance:
{noformat}
NestedRecordTest: 1525 ms, 32.77809426848799 million entries/sec.  82.48148529345245 million bytes/sec
ResolverTest: 746 ms, 11.162490388711324 million entries/sec.  433.1971901548255 million bytes/sec
MigrationWithDefaultTest: 2986 ms, 2.7902796608268243 million entries/sec.  108.28598876454534 million bytes/sec
GenericReaderTest: 1510 ms, 2.758188340605312 million entries/sec.  107.0486888261762 million bytes/sec
GenericReaderOneTimeUseReaderTest: 7258 ms, 0.5739884392267267 million entries/sec.  22.27995399275671 million bytes/sec
GenericReaderOneTimeUseDecoderTest: 1569 ms, 2.6550866272593567 million entries/sec.  103.03571383053837 million bytes/sec
GenericReaderOneTimeUseTest: 7662 ms, 0.5437721933184865 million entries/sec.  21.10381644161428 million bytes/sec
GenericReaderTestWithDefaultTest: 2615 ms, 1.593089132314149 million entries/sec.  61.826693773206664 million bytes/sec
GenericReaderTestWithOutOfOrderTest: 1542 ms, 2.7016336892067576 million entries/sec.  104.87577931741035 million bytes/sec
GenericReaderTestWithPromotionTest: 1558 ms, 2.6727042262917378 million entries/sec.  103.74088785226581 million bytes/sec
{noformat}

Performance when I revert the AVRO-650 changes is much better for GenericReader tests:
{noformat}
NestedRecordTest: 1324 ms, 37.760699871915705 million entries/sec.  95.01951472969381 million bytes/sec
ResolverTest: 715 ms, 11.648566400183402 million entries/sec.  452.0609701034574 million bytes/sec
MigrationWithDefaultTest: 2919 ms, 2.8541473504518238 million entries/sec.  110.76458473406217 million bytes/sec
GenericReaderTest: 1109 ms, 3.754252987904214 million entries/sec.  145.70943802892032 million bytes/sec
GenericReaderOneTimeUseReaderTest: 1454 ms, 2.864026615891172 million entries/sec.  111.15055988231816 million bytes/sec
GenericReaderOneTimeUseDecoderTest: 1132 ms, 3.6795452959841604 million entries/sec.  142.80018121749544 million bytes/sec
GenericReaderOneTimeUseTest: 1471 ms, 2.830739253347936 million entries/sec.  109.8712732110088 million bytes/sec
GenericReaderTestWithDefaultTest: 2020 ms, 2.06191332095476 million entries/sec.  80.03430501173852 million bytes/sec
GenericReaderTestWithOutOfOrderTest: 1124 ms, 3.7038683266586956 million entries/sec.  143.73524875433927 million bytes/sec
GenericReaderTestWithPromotionTest: 1440 ms, 2.8925109202440358 million entries/sec.  112.26017811174887 million bytes/sec
{noformat}


My conclusion after several variations and experiments is that:

A new ThreadLocal per GenericDatumReader simply won't work, its very expensive.
You can do better by keeping a static ThreadLocal<WeakIdentityHashMap<GenericDatumReader,ResolvingDecoder>  but this is still too slow and creates a lot of mess on the heap.  In the "GenericReaderOneTimeUseReaderTest it bloats the map size tremendously because the map keeps an entry for every object not yet GC'd.  'jmap' output looks like this during the test:
{noformat}
 num     #instances         #bytes  class name
----------------------------------------------
   1:        371287       23762368  org.apache.avro.util.WeakIdentityHashMap$IdentityWeakReference
   2:        359761       17268528  java.util.HashMap$Entry
   3:        171171        6846840  org.apache.avro.generic.GenericDatumReader
   4:           258        4241968  [Ljava.util.HashMap$Entry;
{noformat}

When I revert AVRO-650 it is much faster and the most common objects on the heap are much more sane:
{noformat}
 num     #instances         #bytes  class name
----------------------------------------------
   1:        299380        7185120  java.lang.Long
   2:        299125        7179000  java.lang.Double
   3:          1470        6658728  [B
   4:         12485        1707464  <constMethodKlass>
{noformat}

What it comes down to, is that either our map, or the one on the thread, has to keep track of every instance created between garbage collections if we want to track per thread state for an object.   The global ThreadLocal for resolvers works because its key is (Schema actual, Schema expected)  we aren't creating and destroying a large number of Schema objects and attempting to resolve them and that is a use case that makes little sense.  


The best long term options I have thought of so far are then:
* Leave GenericDatumReader in the state before AVRO-650 -- not thread safe -- and fix the places elsewhere that assume it is thread safe.
* Make GenericDatumReader threadsafe via immutability.   This would require some other changes (builder pattern?) to allow for the cases where we now call setSchema and setExpected after construction.

I think the second option is the better one, but it would be an API break in a significant way.   The more I think about it, the more that I feel that all of the complicated behavioral objects should be immutable -- Schema/Protocol, DatumReader/DatumWriter, ResolvingDecoders, etc.     That or not thread safe -- mutability + threadsafe is either hard, bug prone, or performs poorly.

The first option is easier and might be OK for 1.5.0 -- fix the stuff in ipc that is re-using decoders across threads calling setSchema or setExpected.




> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991496#comment-12991496 ] 

Doug Cutting commented on AVRO-743:
-----------------------------------

+1 This looks great.

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>         Attachments: AVRO-743.patch
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (AVRO-743) Java: Performance Regression and memory pressure with GenericDatumReader

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey reassigned AVRO-743:
--------------------------------

    Assignee: Scott Carey

> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira