You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2017/01/05 12:15:58 UTC
[jira] [Assigned] (AVRO-1881) Avro (Java) Memory Leak when reusing
JsonDecoder instance
[ https://issues.apache.org/jira/browse/AVRO-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nandor Kollar reassigned AVRO-1881:
-----------------------------------
Assignee: Nandor Kollar
> Avro (Java) Memory Leak when reusing JsonDecoder instance
> ---------------------------------------------------------
>
> Key: AVRO-1881
> URL: https://issues.apache.org/jira/browse/AVRO-1881
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.8.1
> Environment: Ubuntu 15.04
> Oracle 1.8.0_91 and OpenJDK 1.8.0_45
> Reporter: Matt Allen
> Assignee: Nandor Kollar
>
> {{JsonDecoder}} maintains state for each record decoded, leading to a memory leak if the same instance is used for multiple inputs. Using {{JsonDecoder.configure}} to change the input does not correctly clean up the state stored in {{JsonDecoder.reorderBuffers}}, which leads to an unbounded number of {{ReorderBuffer}} instances being accumulated. If a new {{JsonDecoder}} is created for each input there is no memory leak, but it is significantly more expensive than reusing the same instance.
> This problem seems to only occur when the input schema contains a record, which is consistent with the {{reorderBuffers}} being the source of the leak. My first look at the {{JsonDecoder}} code leads me to believe that the {{reorderBuffers}} stack should be empty after a record is fully processed, so there may be other behavior at play here.
> The following is a minimal example which will exhaust a 50MB heap (-Xmx50m) after about 5.25 million iterations. The first section demonstrates that no memory leak is encountered when creating a fresh {{JsonDecoder}} instance for each input.
> {code:title=JsonDecoderMemoryLeak.java|borderStyle=solid}
> import org.apache.avro.Schema;
> import org.apache.avro.io.*;
> import org.apache.avro.generic.*;
> import java.io.IOException;
> public class JsonDecoderMemoryLeak {
> public static DecoderFactory decoderFactory = DecoderFactory.get();
> public static JsonDecoder createDecoder(String input, Schema schema) throws IOException {
> return decoderFactory.jsonDecoder(schema, input);
> }
> public static Object decodeAvro(String input, Schema schema, JsonDecoder decoder) throws IOException {
> if (decoder == null) {
> decoder = createDecoder(input, schema);
> } else {
> decoder.configure(input);
> }
> GenericDatumReader reader = new GenericDatumReader<GenericRecord>(schema);
> return reader.read(null, decoder);
> }
> public static Schema.Parser parser = new Schema.Parser();
> public static Schema schema = parser.parse("{\"name\": \"TestRecord\", \"type\": \"record\", \"fields\": [{\"name\": \"field1\", \"type\": \"long\"}]}");
> public static String record(long i) {
> StringBuilder builder = new StringBuilder("{\"field1\": ");
> builder.append(i);
> builder.append("}");
> return builder.toString();
> }
> public static void main(String[] args) throws IOException {
> // No memory issues when creating a new decoder for each record
> System.out.println("Running with fresh JsonDecoder instances for 6000000 iterations");
> for(long i = 0; i < 6000000; i++) {
> decodeAvro(record(i), schema, null);
> }
>
> // Runs out of memory after ~5250000 records
> System.out.println("Running with a single reused JsonDecoder instance");
> long count = 0;
> try {
> JsonDecoder decoder = createDecoder(record(0), schema);
> while(true) {
> decodeAvro(record(count), schema, decoder);
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.println("Out of memory after " + count + " records");
> e.printStackTrace();
> }
> }
> }
> {code}
> {code:title=Output|borderStyle=solid}
> $ java -Xmx50m -jar json-decoder-memory-leak.jar
> Running with fresh JsonDecoder instances for 6000000 iterations
> Running with a single reused JsonDecoder instance
> Out of memory after 5242880 records
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3210)
> at java.util.Arrays.copyOf(Arrays.java:3181)
> at java.util.Vector.grow(Vector.java:266)
> at java.util.Vector.ensureCapacityHelper(Vector.java:246)
> at java.util.Vector.addElement(Vector.java:620)
> at java.util.Stack.push(Stack.java:67)
> at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:487)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
> at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:178)
> at org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:162)
> at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
> at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
> at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
> at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
> at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
> at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
> at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
> at com.spiceworks.App.decodeAvro(App.java:25)
> at com.spiceworks.App.main(App.java:52)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)