You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Lucas Heimberg (Jira)" <ji...@apache.org> on 2020/12/18 16:47:00 UTC

[jira] [Created] (AVRO-3005) Deserialization of string with > 256 characters fails

Lucas Heimberg created AVRO-3005:
------------------------------------

             Summary: Deserialization of string with > 256 characters fails
                 Key: AVRO-3005
                 URL: https://issues.apache.org/jira/browse/AVRO-3005
             Project: Apache Avro
          Issue Type: Bug
          Components: csharp
    Affects Versions: 1.10.1
            Reporter: Lucas Heimberg


Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded. 

This can be seen when serializing and subsequently deserializing a GenericRecord of schema 
{code:java}
{
  "type": "record",
  "name": "Foo",
  "fields": [
    { "name": "x", "type": "string" }
  ]
}{code}
with a field x containing a string of length > 256, as done in the test case:
{code:java}
        
public void Test()
{
    var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
            
    var datum = new GenericRecord(schema);            
    datum.Add("x", new String('x', 257));
    byte[] serialized;
    using (var ms = new MemoryStream())
    {
        var enc = new BinaryEncoder(ms);
        var writer = new GenericDatumWriter<GenericRecord>(schema);
        writer.Write(datum, enc);                
        serialized = ms.ToArray();
    }

    using (var ms = new MemoryStream(serialized))
    {
        var dec = new BinaryDecoder(ms);
        var deserialized = new GenericRecord(schema);
        var reader = new GenericDatumReader<GenericRecord>(schema, schema);
        reader.Read(deserialized, dec);
        Assert.Equal(datum, deserialized);
    }
}{code}
which yields the following exception
{code:java}
Avro.AvroException
End of stream reached
   at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
   at Avro.IO.BinaryDecoder.ReadString()
   at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
   at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
{code}
It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).

Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)