You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Driesprong, Fokko" <fo...@driesprong.frl> on 2024/01/04 08:40:33 UTC

Re: [DISCUSS] Extending the UUID logical type with Fixed[16]

Hey everyone,

Happy New Year! Best wishes for 2024 for you and your family.

I went ahead and created a PR for the spec change:
https://github.com/apache/avro/pull/2672 Let me know if there are any
questions or concerns.

Kind regards,
Fokko

Op vr 22 dec 2023 om 14:52 schreef Fokko Driesprong <fo...@apache.org>:

> Hi Martin and Scott,
>
> Thanks for the question, and that's a good one. I would suggest:
>
> {
>
>   "type": "fixed",
>
>   "size": 16,
>
>   "logicalType": "uuid"
>
> }
>
> This is in line with the other logicalTypes. For example with date:
>
> {
>   "type": "int",
>   "logicalType": "date"
> }
>
> If you don't support the date, you can still read the int itself (days
> since Epoch).
>
> I've added a schema example to the Google doc and created a PR
> <https://github.com/apache/avro/pull/2646/> to clarify the current
> situation.
>
> I am curious about what you guys think of the proposed JSON-type
> representation.
>
> Kind regards,
> Fokko
>
>
> Op vr 22 dec 2023 om 14:25 schreef Scott Belden <sc...@gmail.com>:
>
>> I think you'd have to go with something like one of the first two options
>> (something in the schema) rather than some flag in a library. The problem
>> with an flag in a library is if someone has an avro file they want to
>> deserialize, they might not know if it was encoded with uuids as bytes or
>> strings and they'd be left with guessing one and trying again with the
>> second if the first failed which would not be a pleasant experience.
>>
>> -Scott
>>
>> On Fri, Dec 22, 2023 at 5:00 AM Martin Grigorov <mg...@apache.org>
>> wrote:
>>
>> > Hi,
>> >
>> > How would the application tell Avro what storage type to use - String or
>> > bytes ?
>> > - new logical type ? e.g. "logicalType": "uuid-bytes"
>> > - extra attribute ? e.g. { ..., "logicalType": "uuid", "storage-type":
>> > "bytes" }
>> > - global switch that tells the library to always use "string" or "bytes"
>> > for all UUIDs ?
>> > - ...
>> >
>> > Martin
>> >
>> > On Fri, Dec 22, 2023 at 10:49 AM Fokko Driesprong <fo...@apache.org>
>> > wrote:
>> >
>> > > Hey everyone,
>> > >
>> > > For Iceberg we're using UUIDs in Avro and we're storing them as
>> binary,
>> > > rather than a string. This has several advantages such as more compact
>> > > storage, more efficient reading, and more efficient skipping. For more
>> > > details, please check out the doc that I've created
>> > > <
>> > >
>> >
>> https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow
>> > > >
>> > > (and feel free to comment). Also created AVRO-3918
>> > > <https://issues.apache.org/jira/browse/AVRO-3918> on Jira to track
>> this.
>> > >
>> > > Looking forward to hearing from y'all!
>> > >
>> > > Kind regards and happy holidays,
>> > >
>> > > Fokko Driesprong
>> > >
>> >
>>
>