You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Lucas Heimberg (Jira)" <ji...@apache.org> on 2020/12/18 16:48:00 UTC
[jira] [Updated] (AVRO-3005) Deserialization of string with > 256
characters fails
[ https://issues.apache.org/jira/browse/AVRO-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lucas Heimberg updated AVRO-3005:
---------------------------------
Description:
Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded.
This can be seen when serializing and subsequently deserializing a GenericRecord of schema
{code:java}
{
"type": "record",
"name": "Foo",
"fields": [
{ "name": "x", "type": "string" }
]
}{code}
with a field x containing a string of length > 256, as done in the test case:
{code:java}
public void Test()
{
var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
var datum = new GenericRecord(schema);
datum.Add("x", new String('x', 257));
byte[] serialized;
using (var ms = new MemoryStream())
{
var enc = new BinaryEncoder(ms);
var writer = new GenericDatumWriter<GenericRecord>(schema);
writer.Write(datum, enc);
serialized = ms.ToArray();
}
using (var ms = new MemoryStream(serialized))
{
var dec = new BinaryDecoder(ms);
var deserialized = new GenericRecord(schema);
var reader = new GenericDatumReader<GenericRecord>(schema, schema);
reader.Read(deserialized, dec);
Assert.Equal(datum, deserialized);
}
}{code}
which yields the following exception
{code:java}
Avro.AvroException
End of stream reached
at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
at Avro.IO.BinaryDecoder.ReadString()
at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
{code}
It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).
Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
was:
Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded.
This can be seen when serializing and subsequently deserializing a GenericRecord of schema
{code:java}
{
"type": "record",
"name": "Foo",
"fields": [
{ "name": "x", "type": "string" }
]
}{code}
with a field x containing a string of length > 256, as done in the test case:
{code:java}
public void Test()
{
var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
var datum = new GenericRecord(schema);
datum.Add("x", new String('x', 257));
byte[] serialized;
using (var ms = new MemoryStream())
{
var enc = new BinaryEncoder(ms);
var writer = new GenericDatumWriter<GenericRecord>(schema);
writer.Write(datum, enc);
serialized = ms.ToArray();
}
using (var ms = new MemoryStream(serialized))
{
var dec = new BinaryDecoder(ms);
var deserialized = new GenericRecord(schema);
var reader = new GenericDatumReader<GenericRecord>(schema, schema);
reader.Read(deserialized, dec);
Assert.Equal(datum, deserialized);
}
}{code}
which yields the following exception
{code:java}
Avro.AvroException
End of stream reached
at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
at Avro.IO.BinaryDecoder.ReadString()
at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
{code}
It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).
Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
> Deserialization of string with > 256 characters fails
> -----------------------------------------------------
>
> Key: AVRO-3005
> URL: https://issues.apache.org/jira/browse/AVRO-3005
> Project: Apache Avro
> Issue Type: Bug
> Components: csharp
> Affects Versions: 1.10.1
> Reporter: Lucas Heimberg
> Priority: Major
>
> Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded.
> This can be seen when serializing and subsequently deserializing a GenericRecord of schema
> {code:java}
> {
> "type": "record",
> "name": "Foo",
> "fields": [
> { "name": "x", "type": "string" }
> ]
> }{code}
> with a field x containing a string of length > 256, as done in the test case:
> {code:java}
> public void Test()
> {
> var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
>
> var datum = new GenericRecord(schema);
> datum.Add("x", new String('x', 257));
> byte[] serialized;
> using (var ms = new MemoryStream())
> {
> var enc = new BinaryEncoder(ms);
> var writer = new GenericDatumWriter<GenericRecord>(schema);
> writer.Write(datum, enc);
> serialized = ms.ToArray();
> }
> using (var ms = new MemoryStream(serialized))
> {
> var dec = new BinaryDecoder(ms);
> var deserialized = new GenericRecord(schema);
> var reader = new GenericDatumReader<GenericRecord>(schema, schema);
> reader.Read(deserialized, dec);
> Assert.Equal(datum, deserialized);
> }
> }{code}
> which yields the following exception
> {code:java}
> Avro.AvroException
> End of stream reached
> at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
> at Avro.IO.BinaryDecoder.ReadString()
> at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
> at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
> at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
> at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
> at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
> at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
> at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
> {code}
> It seems that Avro.IO.BinaryDecoder.Read(Span<byte> buffer) reads over the end of the input stream when being passed the span returned by ArrayPool<byte>.Shared.Rent(length) (where length is the length of the string).
> Possiby related: [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)