You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Spencer Nelson <s...@spencerwnelson.com> on 2021/03/11 06:37:03 UTC

type promotion of bytes into strings

During schema resolution, the spec says that primitives written as
bytes can be "promoted" into strings, if the reader asks for a string.

How should the bytes be decoded into a string?

For example, suppose the writer has written a value under the schema
{"type": "bytes", "logicalType": "decimal", "precision": 5}. The
reader's schema is "string". What should happen, here?

Re: type promotion of bytes into strings

Posted by Spencer Nelson <s...@spencerwnelson.com>.
Okay, that makes sense. I think that should be added to the spec. I think
it would be clarifying, and help implementations across SDKs have
consistent behavior.

On Wed, Mar 17, 2021 at 11:18 AM Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  My expectation would be that a bytes <-> string promotion
> would be UTF-8 conversions, and the logical type would be ignored.
>
> This is almost definitely going to have weird results when the bytes
> data isn't UTF-8, but I wouldn't expect anything else to happen!  For
> something like a logicalType "decimal" or "duration" to be parsed to a
> nicer string representation would probably be unfeasible.
>
> I'd have to take a look to see what actually happens, but I hope that
> it's the same across all SDKs at the moment!
>
> Best regards, Ryan
>
> On Thu, Mar 11, 2021 at 7:37 AM Spencer Nelson <s...@spencerwnelson.com>
> wrote:
> >
> > During schema resolution, the spec says that primitives written as
> > bytes can be "promoted" into strings, if the reader asks for a string.
> >
> > How should the bytes be decoded into a string?
> >
> > For example, suppose the writer has written a value under the schema
> > {"type": "bytes", "logicalType": "decimal", "precision": 5}. The
> > reader's schema is "string". What should happen, here?
>

Re: type promotion of bytes into strings

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  My expectation would be that a bytes <-> string promotion
would be UTF-8 conversions, and the logical type would be ignored.

This is almost definitely going to have weird results when the bytes
data isn't UTF-8, but I wouldn't expect anything else to happen!  For
something like a logicalType "decimal" or "duration" to be parsed to a
nicer string representation would probably be unfeasible.

I'd have to take a look to see what actually happens, but I hope that
it's the same across all SDKs at the moment!

Best regards, Ryan

On Thu, Mar 11, 2021 at 7:37 AM Spencer Nelson <s...@spencerwnelson.com> wrote:
>
> During schema resolution, the spec says that primitives written as
> bytes can be "promoted" into strings, if the reader asks for a string.
>
> How should the bytes be decoded into a string?
>
> For example, suppose the writer has written a value under the schema
> {"type": "bytes", "logicalType": "decimal", "precision": 5}. The
> reader's schema is "string". What should happen, here?