You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2017/01/05 12:15:58 UTC
[jira] [Assigned] (AVRO-1881) Avro (Java) Memory Leak when reusing JsonDecoder instance

     [ https://issues.apache.org/jira/browse/AVRO-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nandor Kollar reassigned AVRO-1881:
-----------------------------------

    Assignee: Nandor Kollar

> Avro (Java) Memory Leak when reusing JsonDecoder instance
> ---------------------------------------------------------
>
>                 Key: AVRO-1881
>                 URL: https://issues.apache.org/jira/browse/AVRO-1881
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.1
>         Environment: Ubuntu 15.04
> Oracle 1.8.0_91 and OpenJDK 1.8.0_45
>            Reporter: Matt Allen
>            Assignee: Nandor Kollar
>
> {{JsonDecoder}} maintains state for each record decoded, leading to a memory leak if the same instance is used for multiple inputs. Using {{JsonDecoder.configure}} to change the input does not correctly clean up the state stored in {{JsonDecoder.reorderBuffers}}, which leads to an unbounded number of {{ReorderBuffer}} instances being accumulated. If a new {{JsonDecoder}} is created for each input there is no memory leak, but it is significantly more expensive than reusing the same instance.
> This problem seems to only occur when the input schema contains a record, which is consistent with the {{reorderBuffers}} being the source of the leak. My first look at the {{JsonDecoder}} code leads me to believe that the {{reorderBuffers}} stack should be empty after a record is fully processed, so there may be other behavior at play here.
> The following is a minimal example which will exhaust a 50MB heap (-Xmx50m) after about 5.25 million iterations. The first section demonstrates that no memory leak is encountered when creating a fresh {{JsonDecoder}} instance for each input.
> {code:title=JsonDecoderMemoryLeak.java|borderStyle=solid}
> import org.apache.avro.Schema;
> import org.apache.avro.io.*;
> import org.apache.avro.generic.*;
> import java.io.IOException;
> public class JsonDecoderMemoryLeak {
>     public static DecoderFactory decoderFactory = DecoderFactory.get();
>     public static JsonDecoder createDecoder(String input, Schema schema) throws IOException {
>         return decoderFactory.jsonDecoder(schema, input);
>     }
>     public static Object decodeAvro(String input, Schema schema, JsonDecoder decoder) throws IOException {
>         if (decoder == null) {
>             decoder = createDecoder(input, schema);
>         } else {
>             decoder.configure(input);
>         }
>         GenericDatumReader reader = new GenericDatumReader<GenericRecord>(schema);
>         return reader.read(null, decoder);
>     }
>     public static Schema.Parser parser = new Schema.Parser();
>     public static Schema schema = parser.parse("{\"name\": \"TestRecord\", \"type\": \"record\", \"fields\": [{\"name\": \"field1\", \"type\": \"long\"}]}");
>     public static String record(long i) {
>         StringBuilder builder = new StringBuilder("{\"field1\": ");
>         builder.append(i);
>         builder.append("}");
>         return builder.toString();
>     }
>     public static void main(String[] args) throws IOException {
>         // No memory issues when creating a new decoder for each record
>         System.out.println("Running with fresh JsonDecoder instances for 6000000 iterations");
>         for(long i = 0; i < 6000000; i++) {
>             decodeAvro(record(i), schema, null);
>         }
>         
>         // Runs out of memory after ~5250000 records
>         System.out.println("Running with a single reused JsonDecoder instance");
>         long count = 0;
>         try {
>             JsonDecoder decoder = createDecoder(record(0), schema);
>             while(true) {
>                 decodeAvro(record(count), schema, decoder);
>                 count++;
>             }
>         } catch (OutOfMemoryError e) {
>             System.out.println("Out of memory after " + count + " records");
>             e.printStackTrace();
>         }
>     }
> }
> {code}
> {code:title=Output|borderStyle=solid}
> $ java -Xmx50m -jar json-decoder-memory-leak.jar 
> Running with fresh JsonDecoder instances for 6000000 iterations
> Running with a single reused JsonDecoder instance
> Out of memory after 5242880 records
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:3210)
>         at java.util.Arrays.copyOf(Arrays.java:3181)
>         at java.util.Vector.grow(Vector.java:266)
>         at java.util.Vector.ensureCapacityHelper(Vector.java:246)
>         at java.util.Vector.addElement(Vector.java:620)
>         at java.util.Stack.push(Stack.java:67)
>         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:487)
>         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
>         at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:178)
>         at org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:162)
>         at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>         at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
>         at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
>         at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>         at com.spiceworks.App.decodeAvro(App.java:25)
>         at com.spiceworks.App.main(App.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)