You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/11/07 13:00:00 UTC

[jira] [Commented] (AVRO-1881) Avro (Java) Memory Leak when reusing JsonDecoder instance

    [ https://issues.apache.org/jira/browse/AVRO-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678198#comment-16678198 ] 

ASF GitHub Bot commented on AVRO-1881:
--------------------------------------

Fokko commented on issue #183: AVRO-1881 - Avro (Java) Memory Leak when reusing JsonDecoder instance
URL: https://github.com/apache/avro/pull/183#issuecomment-436614946
 
 
   @nandorKollar Interesting patch, can you rebase onto master?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Avro (Java) Memory Leak when reusing JsonDecoder instance
> ---------------------------------------------------------
>
>                 Key: AVRO-1881
>                 URL: https://issues.apache.org/jira/browse/AVRO-1881
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.1
>         Environment: Ubuntu 15.04
> Oracle 1.8.0_91 and OpenJDK 1.8.0_45
>            Reporter: Matt Allen
>            Assignee: Nandor Kollar
>            Priority: Major
>             Fix For: 1.9.0
>
>
> {{JsonDecoder}} maintains state for each record decoded, leading to a memory leak if the same instance is used for multiple inputs. Using {{JsonDecoder.configure}} to change the input does not correctly clean up the state stored in {{JsonDecoder.reorderBuffers}}, which leads to an unbounded number of {{ReorderBuffer}} instances being accumulated. If a new {{JsonDecoder}} is created for each input there is no memory leak, but it is significantly more expensive than reusing the same instance.
> This problem seems to only occur when the input schema contains a record, which is consistent with the {{reorderBuffers}} being the source of the leak. My first look at the {{JsonDecoder}} code leads me to believe that the {{reorderBuffers}} stack should be empty after a record is fully processed, so there may be other behavior at play here.
> The following is a minimal example which will exhaust a 50MB heap (-Xmx50m) after about 5.25 million iterations. The first section demonstrates that no memory leak is encountered when creating a fresh {{JsonDecoder}} instance for each input.
> {code:title=JsonDecoderMemoryLeak.java|borderStyle=solid}
> import org.apache.avro.Schema;
> import org.apache.avro.io.*;
> import org.apache.avro.generic.*;
> import java.io.IOException;
> public class JsonDecoderMemoryLeak {
>     public static DecoderFactory decoderFactory = DecoderFactory.get();
>     public static JsonDecoder createDecoder(String input, Schema schema) throws IOException {
>         return decoderFactory.jsonDecoder(schema, input);
>     }
>     public static Object decodeAvro(String input, Schema schema, JsonDecoder decoder) throws IOException {
>         if (decoder == null) {
>             decoder = createDecoder(input, schema);
>         } else {
>             decoder.configure(input);
>         }
>         GenericDatumReader reader = new GenericDatumReader<GenericRecord>(schema);
>         return reader.read(null, decoder);
>     }
>     public static Schema.Parser parser = new Schema.Parser();
>     public static Schema schema = parser.parse("{\"name\": \"TestRecord\", \"type\": \"record\", \"fields\": [{\"name\": \"field1\", \"type\": \"long\"}]}");
>     public static String record(long i) {
>         StringBuilder builder = new StringBuilder("{\"field1\": ");
>         builder.append(i);
>         builder.append("}");
>         return builder.toString();
>     }
>     public static void main(String[] args) throws IOException {
>         // No memory issues when creating a new decoder for each record
>         System.out.println("Running with fresh JsonDecoder instances for 6000000 iterations");
>         for(long i = 0; i < 6000000; i++) {
>             decodeAvro(record(i), schema, null);
>         }
>         
>         // Runs out of memory after ~5250000 records
>         System.out.println("Running with a single reused JsonDecoder instance");
>         long count = 0;
>         try {
>             JsonDecoder decoder = createDecoder(record(0), schema);
>             while(true) {
>                 decodeAvro(record(count), schema, decoder);
>                 count++;
>             }
>         } catch (OutOfMemoryError e) {
>             System.out.println("Out of memory after " + count + " records");
>             e.printStackTrace();
>         }
>     }
> }
> {code}
> {code:title=Output|borderStyle=solid}
> $ java -Xmx50m -jar json-decoder-memory-leak.jar 
> Running with fresh JsonDecoder instances for 6000000 iterations
> Running with a single reused JsonDecoder instance
> Out of memory after 5242880 records
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:3210)
>         at java.util.Arrays.copyOf(Arrays.java:3181)
>         at java.util.Vector.grow(Vector.java:266)
>         at java.util.Vector.ensureCapacityHelper(Vector.java:246)
>         at java.util.Vector.addElement(Vector.java:620)
>         at java.util.Stack.push(Stack.java:67)
>         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:487)
>         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
>         at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:178)
>         at org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:162)
>         at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>         at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
>         at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
>         at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>         at com.spiceworks.App.decodeAvro(App.java:25)
>         at com.spiceworks.App.main(App.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)