You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Lucas Heimberg (Jira)" <ji...@apache.org> on 2020/12/18 16:48:00 UTC

[jira] [Updated] (AVRO-3005) Deserialization of string with > 256 characters fails

     [ https://issues.apache.org/jira/browse/AVRO-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lucas Heimberg updated AVRO-3005:
---------------------------------
    Description: 
Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded. 

This can be seen when serializing and subsequently deserializing a GenericRecord of schema 
{code:java}
{
  "type": "record",
  "name": "Foo",
  "fields": [
    { "name": "x", "type": "string" }
  ]
}{code}
with a field x containing a string of length > 256, as done in the test case:
{code:java}
public void Test()
{
    var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
            
    var datum = new GenericRecord(schema);            
    datum.Add("x", new String('x', 257));
    byte[] serialized;
    using (var ms = new MemoryStream())
    {
        var enc = new BinaryEncoder(ms);
        var writer = new GenericDatumWriter<GenericRecord>(schema);
        writer.Write(datum, enc);                
        serialized = ms.ToArray();
    }

    using (var ms = new MemoryStream(serialized))
    {
        var dec = new BinaryDecoder(ms);
        var deserialized = new GenericRecord(schema);
        var reader = new GenericDatumReader<GenericRecord>(schema, schema);
        reader.Read(deserialized, dec);
        Assert.Equal(datum, deserialized);
    }
}{code}
which yields the following exception
{code:java}
Avro.AvroException
End of stream reached
   at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
   at Avro.IO.BinaryDecoder.ReadString()
   at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
   at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
{code}
It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).

Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]

 

  was:
Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded. 

This can be seen when serializing and subsequently deserializing a GenericRecord of schema 
{code:java}
{
  "type": "record",
  "name": "Foo",
  "fields": [
    { "name": "x", "type": "string" }
  ]
}{code}
with a field x containing a string of length > 256, as done in the test case:
{code:java}
        
public void Test()
{
    var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
            
    var datum = new GenericRecord(schema);            
    datum.Add("x", new String('x', 257));
    byte[] serialized;
    using (var ms = new MemoryStream())
    {
        var enc = new BinaryEncoder(ms);
        var writer = new GenericDatumWriter<GenericRecord>(schema);
        writer.Write(datum, enc);                
        serialized = ms.ToArray();
    }

    using (var ms = new MemoryStream(serialized))
    {
        var dec = new BinaryDecoder(ms);
        var deserialized = new GenericRecord(schema);
        var reader = new GenericDatumReader<GenericRecord>(schema, schema);
        reader.Read(deserialized, dec);
        Assert.Equal(datum, deserialized);
    }
}{code}
which yields the following exception
{code:java}
Avro.AvroException
End of stream reached
   at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
   at Avro.IO.BinaryDecoder.ReadString()
   at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
   at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
   at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
   at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
{code}
It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).

Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]

 


> Deserialization of string with > 256 characters fails
> -----------------------------------------------------
>
>                 Key: AVRO-3005
>                 URL: https://issues.apache.org/jira/browse/AVRO-3005
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: csharp
>    Affects Versions: 1.10.1
>            Reporter: Lucas Heimberg
>            Priority: Major
>
> Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded. 
> This can be seen when serializing and subsequently deserializing a GenericRecord of schema 
> {code:java}
> {
>   "type": "record",
>   "name": "Foo",
>   "fields": [
>     { "name": "x", "type": "string" }
>   ]
> }{code}
> with a field x containing a string of length > 256, as done in the test case:
> {code:java}
> public void Test()
> {
>     var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
>             
>     var datum = new GenericRecord(schema);            
>     datum.Add("x", new String('x', 257));
>     byte[] serialized;
>     using (var ms = new MemoryStream())
>     {
>         var enc = new BinaryEncoder(ms);
>         var writer = new GenericDatumWriter<GenericRecord>(schema);
>         writer.Write(datum, enc);                
>         serialized = ms.ToArray();
>     }
>     using (var ms = new MemoryStream(serialized))
>     {
>         var dec = new BinaryDecoder(ms);
>         var deserialized = new GenericRecord(schema);
>         var reader = new GenericDatumReader<GenericRecord>(schema, schema);
>         reader.Read(deserialized, dec);
>         Assert.Equal(datum, deserialized);
>     }
> }{code}
> which yields the following exception
> {code:java}
> Avro.AvroException
> End of stream reached
>    at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
>    at Avro.IO.BinaryDecoder.ReadString()
>    at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
>    at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
>    at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
> {code}
> It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).
> Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)