You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Mengran Wang (Jira)" <ji...@apache.org> on 2020/07/05 23:15:00 UTC

[jira] [Created] (AVRO-2882) Validate input data format before decoding it

Mengran Wang created AVRO-2882:
----------------------------------

             Summary: Validate input data format before decoding it
                 Key: AVRO-2882
                 URL: https://issues.apache.org/jira/browse/AVRO-2882
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.9.2, 1.8.2
            Reporter: Mengran Wang
         Attachments: Screen Shot 2020-06-18 at 5.48.39 PM.png

When decoding a byte array using the Avro BinaryDecoder and SpecificDatumReader, is it possible to use the schema to check whether the input matches the definition before allocating memory buffer to process the data? 

One bug we have in production is that we defined a type of payload that consists of two parts: the first part is a fixed size byte array and the second part is a record of variable-length strings. During the deserialization process, we'll extract the byte array first (using schema A) and then read out the strings (using schema B). However, we accidentally create a malformed payload that leaves out the byte array part. We assume Avro should throw out some kind of RuntimeException when decoding this malformed payload, but it ended up allocating a huge memory buffer *scratchUtf8* to read the string and eventually cause a JVM OOM error on our end. 
{code:java}
fixed MD5(16); // fixed length 
record A {
  MD5 hash;
}

record B {
  string name1;
  string name2;
  union {null, string} name3 = null;
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)