You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Jack Klamer <ja...@starburstdata.com> on 2023/04/21 15:28:43 UTC

Map Key Uniqueness (Spec clarification)

Hello Avro Devs,

I am looking for clarification in the spec as I am working on a
particular implementation for Avro data reading. Most Avro implementations
serialize/deserialize maps using language specific maps that guarantee one
value per key is written and the last value per key read is returned. There
however is no guidance in the spec on whether that is
necessary/implied/expected.

For my use case, I am deserializing keys and values into lists of each,
without checking for uniqueness, and want to know if this breaks the spec,
or if those who serialize maps without unique keys (presumably outside the
bound of most language implementations) can expect undefined behavior when
reading?

-- 
Jack Klamer

Software Engineer

*he/him*
jack.klamer@starburstdata.com
<https://www.starburst.io/>

Re: Map Key Uniqueness (Spec clarification)

Posted by Jack Klamer <jf...@gmail.com>.
Yeah to be clear, I’m hoping the keys are unique, I just don’t check or
guarantee it like other implementations do implicitly.

I never plan to serialize multiple keys per value.

On Fri, Apr 21, 2023 at 11:39 AM Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  You're correct that there is no guidance in the spec, and that
> conceptually an Avro map could be represented as an ordered list of
> (key, value) tuples.
>
> It's only an opinion, but I think the *best practice* for
> interoperability would be to avoid serializing duplicate keys.
> Whoever is deserializing a binary with a map and multiple, non-unique
> keys must make a choice about which to keep (or keep them all, or
> throw an error), and there's really no way to predict what it will end
> up with given the spec.
>
> I'd be tempted to say "keep last" should be the rule and added to the
> spec for this case... But I don't really have a very good
> justification!
>
> It's an interesting question, because other than the non-unique key
> problem, there's no reason that you couldn't represent the map as you
> suggest.
>
> Ryan
>
>
>
>
> On Fri, Apr 21, 2023 at 5:52 PM Jack Klamer
> <ja...@starburstdata.com> wrote:
> >
> > Hello Avro Devs,
> >
> > I am looking for clarification in the spec as I am working on a
> > particular implementation for Avro data reading. Most Avro
> implementations
> > serialize/deserialize maps using language specific maps that guarantee
> one
> > value per key is written and the last value per key read is returned.
> There
> > however is no guidance in the spec on whether that is
> > necessary/implied/expected.
> >
> > For my use case, I am deserializing keys and values into lists of each,
> > without checking for uniqueness, and want to know if this breaks the
> spec,
> > or if those who serialize maps without unique keys (presumably outside
> the
> > bound of most language implementations) can expect undefined behavior
> when
> > reading?
> >
> > --
> > Jack Klamer
> >
> > Software Engineer
> >
> > *he/him*
> > jack.klamer@starburstdata.com
> > <https://www.starburst.io/>
>

Re: Map Key Uniqueness (Spec clarification)

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  You're correct that there is no guidance in the spec, and that
conceptually an Avro map could be represented as an ordered list of
(key, value) tuples.

It's only an opinion, but I think the *best practice* for
interoperability would be to avoid serializing duplicate keys.
Whoever is deserializing a binary with a map and multiple, non-unique
keys must make a choice about which to keep (or keep them all, or
throw an error), and there's really no way to predict what it will end
up with given the spec.

I'd be tempted to say "keep last" should be the rule and added to the
spec for this case... But I don't really have a very good
justification!

It's an interesting question, because other than the non-unique key
problem, there's no reason that you couldn't represent the map as you
suggest.

Ryan




On Fri, Apr 21, 2023 at 5:52 PM Jack Klamer
<ja...@starburstdata.com> wrote:
>
> Hello Avro Devs,
>
> I am looking for clarification in the spec as I am working on a
> particular implementation for Avro data reading. Most Avro implementations
> serialize/deserialize maps using language specific maps that guarantee one
> value per key is written and the last value per key read is returned. There
> however is no guidance in the spec on whether that is
> necessary/implied/expected.
>
> For my use case, I am deserializing keys and values into lists of each,
> without checking for uniqueness, and want to know if this breaks the spec,
> or if those who serialize maps without unique keys (presumably outside the
> bound of most language implementations) can expect undefined behavior when
> reading?
>
> --
> Jack Klamer
>
> Software Engineer
>
> *he/him*
> jack.klamer@starburstdata.com
> <https://www.starburst.io/>