You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Yu-Wu Chu (Jira)" <ji...@apache.org> on 2022/05/24 23:06:00 UTC
[jira] [Created] (AVRO-3524) Memory leak when not reusing avro schema instance

Yu-Wu Chu created AVRO-3524:
-------------------------------

             Summary: Memory leak when not reusing avro schema instance
                 Key: AVRO-3524
                 URL: https://issues.apache.org/jira/browse/AVRO-3524
             Project: Apache Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.10.2, 1.9.2
         Environment: * openJdk 8
 * tested in Avro 1.9.2 and 1.10.2
            Reporter: Yu-Wu Chu


When deserializing avro record, if we do not use shared schema instance, the memory usage start growing as the number of deserializing growth.

Code with shared schema:
{code:java}
public void myTest() throws Exception {
    Schema schema = new Schema.Parser().parse(schemaString);
    final AvroEntity avroEntity = buildAvroEntity();
    final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
    final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
    writer.write( avroEntity, encoder);
    encoder.flush();
    final byte[] data = outputStream.toByteArray();
    DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);

    int count = 0;
    while (count < 100000) {
        final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
        //final Schema mySchema = new Schema.Parser().parse(schemaString);
        reader.setSchema(schema);
        reader.read(null, decoder);
        count++;
        if (count % 1000 == 0) {
            System.gc();
            System.out.println("test" + count);
        }
    }
    System.out.println("test" + count);
}{code}
 

Code without shared schema:
{code:java}
public void myTest() throws Exception {
    schema = new Schema.Parser().parse(schemaString);
    final AvroEntity avroEntity = buildAvroEntity();
    final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
    final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
    writer.write( avroEntity, encoder);
    encoder.flush();
    final byte[] data = outputStream.toByteArray();
    DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);

    int count = 0;
    while (count < 100000) {
        final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
        final Schema mySchema = new Schema.Parser().parse(schemaString);
        reader.setSchema(mySchema);
        reader.read(null, decoder);
        count++;
        if (count % 1000 == 0) {
            System.gc();
            System.out.println("test" + count);
        }
    }
    System.out.println("test" + count);
}{code}
 

Number of ConcurrentHashMapNode instances between shared schema and not-shared schema are 5,000 vs 1,500,000.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)