You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Lucas Heimberg (Jira)" <ji...@apache.org> on 2021/02/22 08:00:00 UTC

[jira] [Commented] (AVRO-3005) Deserialization of string with > 256 characters fails

    [ https://issues.apache.org/jira/browse/AVRO-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288231#comment-17288231 ] 

Lucas Heimberg commented on AVRO-3005:
--------------------------------------

Hello! Thank you, and sorry for that I did not read your comment earlier. I will create a PR for the unit test.

> Deserialization of string with > 256 characters fails
> -----------------------------------------------------
>
>                 Key: AVRO-3005
>                 URL: https://issues.apache.org/jira/browse/AVRO-3005
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: csharp
>    Affects Versions: 1.10.1
>            Reporter: Lucas Heimberg
>            Priority: Major
>         Attachments: AVRO-3005.patch
>
>
> Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded. 
> This can be seen when serializing and subsequently deserializing a GenericRecord of schema 
> {code:java}
> {
>   "type": "record",
>   "name": "Foo",
>   "fields": [
>     { "name": "x", "type": "string" }
>   ]
> }{code}
> with a field x containing a string of length > 256, as done in the test case Test(257):
> {code:java}
> public void Test(int n)
> {
>     var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
>             
>     var datum = new GenericRecord(schema);            
>     datum.Add("x", new String('x', n));
>     byte[] serialized;
>     using (var ms = new MemoryStream())
>     {
>         var enc = new BinaryEncoder(ms);
>         var writer = new GenericDatumWriter<GenericRecord>(schema);
>         writer.Write(datum, enc);                
>         serialized = ms.ToArray();
>     }
>     using (var ms = new MemoryStream(serialized))
>     {
>         var dec = new BinaryDecoder(ms);
>         var deserialized = new GenericRecord(schema);
>         var reader = new GenericDatumReader<GenericRecord>(schema, schema);
>         reader.Read(deserialized, dec);
>         Assert.Equal(datum, deserialized);
>     }
> }{code}
> which yields the following exception
> {code:java}
> Avro.AvroException
> End of stream reached
>    at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
>    at Avro.IO.BinaryDecoder.ReadString()
>    at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
>    at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
>    at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
> {code}
> The reason seems to be the following: when a string of length <= StackallocThreshold (=256) is read, a buffer, to read the content of the string from the stream into, is allocated on the stack with the exact length of the string. If the length is > StackallocThreshold, the buffer is obtained from ArrayPool<byte>.Shared.Rent(length), which returns a buffer of *minimum* length 'length', but possibly also a larger buffer.
> The Read(Span<byte> buffer) method is used to read the content of the string from the input stream. The method always tries to read as much bytes from the input stream as this buffer has length, and in particular will fail with the exception shown above when the stream does not have enough data anymore. Thus, if the string has expected length > StackallocThreshold and the buffer obtained from ArrayPool<byte>.Shared.Rent(length) has size > length, the Read method will either throw the above AvroException (when the string is the last element in the stream) or will already consume parts of following data items in the stream, in any case causing corruption.
> The provided patch turns the byte array returned by the ArrayPool into a Span with the correct length using the Splice method, instead of casting it implicitly to Span<byte>.
>  
> Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)