You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Erik Erlandson <ee...@redhat.com> on 2019/07/08 17:42:16 UTC

Re: supporting a "unit" field for avro schema

What should I do to move this forward? Does Avro have a PIP process?


On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <ee...@redhat.com> wrote:

>
> Regarding schema, my proposal for fingerprints would be that units are
> fingerprinted based on their canonical form, as defined here
> <http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>.
> Any two unit expressions having the same canonical form (including the
> corresponding coefficients) are exactly equivalent, and so their
> fingerprints can be the same. Possibly the unit could be stored on the
> schema in canonical form by convention, although canonical forms are
> frequently not as intuitive to humans and so in that case the documentation
> value of the unit might be reduced for humans examining the schema.
>
> For schema evolution, a unit change such that the previous and new unit
> are convertable (also defined as at the above link) would be well defined,
> and automatic transformation would just be the correct unit conversion
> (e.g. seconds to milliseconds). If the unit changes to a non-convertable
> unit (e.g. seconds to bytes) then no automatic transformation exists, and
> attempting to resolve the old and new schema would be an error. Note that
> establishing the conversion assumes that both original and new schemas are
> available at read time.
>
>
> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <ni...@basj.es> wrote:
>
>> I think we should approach this idea in two parts:
>>
>> 1) The schema. Things like does a different unit mean a different schema
>> fingerprint even though the bytes remain the same. What does a different
>> unit mean for schema evolution.
>>
>> 2) Language specifics. Scala has different possibilities than Java.
>>
>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <ee...@redhat.com> wrote:
>>
>> > I've been puzzling over what can be done to support this in more
>> > widely-used languages. The dilemma relative to the current language
>> > ecosystem is that languages with "modern" type systems (Haskell, Rust,
>> > Scala, etc) capable of supporting compile-time unit checking, in the
>> > particular style I've been exploring, are not yet widely used.
>> >
>> > With respect to Java, a couple approaches are plausible. One is to
>> enhance
>> > the language, for example with Java-8 compiler plugins. Another might
>> be to
>> > implement a unit type system similar to squants
>> > <https://github.com/typelevel/squants>. This style of unit type system
>> is
>> > not as flexible or intuitive as what can be done with Scala's latest
>> type
>> > system sorcery, but it would allow the community to build out a Java
>> native
>> > type system that supports compile-time unit analysis. And its coverage
>> of
>> > standard units could be made very good, as squants itself demonstrates.
>> >
>> > Python would also be a high-coverage target. I'm even less sure what to
>> do
>> > for python, as it has no compile-time type checking, but perhaps a
>> > squants-like python class system would add value. Maybe python's new
>> > type-hints feature could be leveraged?
>> >
>> > Regarding unit expression representation, I'm not unhappy with what I've
>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies that
>> > would need addressing. It doesn't yet support standard unit
>> abbreviations,
>> > nor does it understand plurals (e.g. it can parse "second" but not
>> > "seconds"). Since it's "unit" field is just a custom metadata key,
>> there is
>> > no enforcement. Parsers are currently instantiated via explicit lists of
>> > types, which is a property I like, but that may not work well in a world
>> > where multiple language bindings must be supported in a portable manner.
>> >
>> >
>> >
>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <ni...@basj.es> wrote:
>> >
>> > > Hi,
>> > >
>> > > I attended your talk in Berlin and at the end I thought "too bad this
>> is
>> > > only Scala".
>> > >
>> > > I think it's a good idea to have this in Avro.
>> > >
>> > > The details will be tricky: How to encode the units in the schema for
>> > > example.
>> > > Especially because of the automatic conversion you spoke about.
>> > >
>> > > Niels
>> > >
>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <ee...@redhat.com>
>> wrote:
>> > >
>> > > > Hi Avro community,
>> > > >
>> > > > Recently I have been experimenting with avro schema that are
>> extended
>> > > with
>> > > > a "unit" field. By "unit" I mean expressions like "second", or
>> > > "megabyte" -
>> > > > that is "units of measure".
>> > > >
>> > > > I delivered a short talk on my experiments at Berlin Buzzwords,
>> which
>> > can
>> > > > be viewed here:
>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
>> > > > I also wrote a short blog post that may be faster to ingest:
>> > > >
>> > > >
>> > >
>> >
>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
>> > > >
>> > > > I received some audience interest in making this concept "first
>> class"
>> > > for
>> > > > avro, and so I'm writing to see what the avro dev community thinks
>> of
>> > the
>> > > > idea. One issue is that this kind of unit checking is currently only
>> > > > available for Scala (and specifically scala 2.13 +).
>> > > >
>> > > > The Scala project itself is here:
>> > > > https://github.com/erikerlandson/coulomb
>> > > >
>> > > > Cheers,
>> > > > Erik
>> > > >
>> > >
>> >
>>
>

Re: supporting a "unit" field for avro schema

Posted by Erik Erlandson <ee...@redhat.com>.
Ismaël Mejía pointed out that the AEP for this feature should be voted on,
which I'm in complete agreement with. How should this proceed?
Cheers, Erik



On Wed, Sep 18, 2019 at 3:30 PM Erik Erlandson <ee...@redhat.com> wrote:

>
> I drafted an AEP for unit metadata on schema:
>
> https://docs.google.com/document/d/1IeVAtf6YcAAn35D4jmFQJjPpEMgEu79wWVMW37KvNps/
>
>
> On Tue, Jul 16, 2019 at 1:35 PM Erik Erlandson <ee...@redhat.com>
> wrote:
>
>> Hi Ryan,
>> Those are all great questions. They're all issues I have ideas about but
>> I'd want Avro community input for as well. For that reason I answered them
>> all on AVRO-2474 <https://issues.apache.org/jira/browse/AVRO-2474>
>> Cheers!
>> E
>>
>> On Tue, Jul 16, 2019 at 3:13 AM Ryan Skraba <ry...@skraba.com> wrote:
>>
>>> Hello!  I've been thinking about this and I generally like the idea of
>>> stronger types with units :D
>>>
>>> I have some questions about what you are thinking of when you say "first
>>> class concept" in Avro:
>>> - Would you expect a writer schema that wrote a Fahrenheit field and a
>>> reader schema that reads Celsius to interact transparently with generic
>>> data?
>>> - What about conversions that lose precision (i.e., if the above
>>> conversion
>>> was on an INT field)?
>>> - How much of "unit" support should be mandatory in the spec for cross
>>> language operation?  (a unit-aware Scala writer with a Fahrenheit field
>>> and
>>> a non-unit-aware reader with a Celsius field).
>>> - To what degree would a generic reader of Avro data be required to
>>> support
>>> quantity wrappers (i.e. how can we opt-in/opt-out cleanly from being
>>> unit-aware)?
>>>
>>> At scale, I'd be particularly keen to see the conversion detection
>>> (between
>>> two schemas / fields / quantities) take place once, and then the
>>> calculation reused for all of the subsequent datum passing through, but
>>> I'm
>>> not sure how that would work.
>>>
>>> We have some experience with passing a lot of client data through Avro,
>>> and
>>> we use generic data quite a bit -- I'd be tempted to think of "float
>>> (metres)" as a distinct type from "float (minutes)", but it would be a
>>> huge
>>> (but potentially interesting) change for the way we look at data.  That
>>> being said, as far as units go, we see a lot more unitless values
>>> (quantity
>>> of items, percents and other ratios, ratings).  The most frequent numeric
>>> values with units that we see are probably money or geolocation (in
>>> practice, already normalized to lat/long -- although I just learned about
>>> UTM!).  Surprisingly, there's not as much SI-type unit data as you might
>>> expect.
>>>
>>> I can definitely see the value of using a "unit" annotation in a
>>> generated
>>> specific record for a supported language -- as proven by your scala work!
>>> That might be an easy first target while working out what a first-class
>>> concept in the spec would entail.  I missed Berlin Buzzwords by a day,
>>> but
>>> enjoyed the video, thanks!
>>>
>>> Ryan
>>>
>>>
>>>
>>> On Tue, Jul 16, 2019 at 1:24 AM Erik Erlandson <ee...@redhat.com>
>>> wrote:
>>>
>>> > If I'm interpreting the situation correctly, there is an "Avro
>>> Enhancement
>>> > Proposal", but none have been filed in nearly a decade:
>>> >
>>> https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals
>>> >
>>> > As a start, I submitted a jira to track this idea:
>>> > https://issues.apache.org/jira/browse/AVRO-2474
>>> >
>>> >
>>> >
>>> > On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <ee...@redhat.com>
>>> > wrote:
>>> >
>>> > >
>>> > > What should I do to move this forward? Does Avro have a PIP process?
>>> > >
>>> > >
>>> > > On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <ee...@redhat.com>
>>> > > wrote:
>>> > >
>>> > >>
>>> > >> Regarding schema, my proposal for fingerprints would be that units
>>> are
>>> > >> fingerprinted based on their canonical form, as defined here
>>> > >> <
>>> >
>>> http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/
>>> > >.
>>> > >> Any two unit expressions having the same canonical form (including
>>> the
>>> > >> corresponding coefficients) are exactly equivalent, and so their
>>> > >> fingerprints can be the same. Possibly the unit could be stored on
>>> the
>>> > >> schema in canonical form by convention, although canonical forms are
>>> > >> frequently not as intuitive to humans and so in that case the
>>> > documentation
>>> > >> value of the unit might be reduced for humans examining the schema.
>>> > >>
>>> > >> For schema evolution, a unit change such that the previous and new
>>> unit
>>> > >> are convertable (also defined as at the above link) would be well
>>> > defined,
>>> > >> and automatic transformation would just be the correct unit
>>> conversion
>>> > >> (e.g. seconds to milliseconds). If the unit changes to a
>>> non-convertable
>>> > >> unit (e.g. seconds to bytes) then no automatic transformation
>>> exists,
>>> > and
>>> > >> attempting to resolve the old and new schema would be an error. Note
>>> > that
>>> > >> establishing the conversion assumes that both original and new
>>> schemas
>>> > are
>>> > >> available at read time.
>>> > >>
>>> > >>
>>> > >> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <ni...@basj.es>
>>> wrote:
>>> > >>
>>> > >>> I think we should approach this idea in two parts:
>>> > >>>
>>> > >>> 1) The schema. Things like does a different unit mean a different
>>> > schema
>>> > >>> fingerprint even though the bytes remain the same. What does a
>>> > different
>>> > >>> unit mean for schema evolution.
>>> > >>>
>>> > >>> 2) Language specifics. Scala has different possibilities than Java.
>>> > >>>
>>> > >>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <ee...@redhat.com>
>>> > wrote:
>>> > >>>
>>> > >>> > I've been puzzling over what can be done to support this in more
>>> > >>> > widely-used languages. The dilemma relative to the current
>>> language
>>> > >>> > ecosystem is that languages with "modern" type systems (Haskell,
>>> > Rust,
>>> > >>> > Scala, etc) capable of supporting compile-time unit checking, in
>>> the
>>> > >>> > particular style I've been exploring, are not yet widely used.
>>> > >>> >
>>> > >>> > With respect to Java, a couple approaches are plausible. One is
>>> to
>>> > >>> enhance
>>> > >>> > the language, for example with Java-8 compiler plugins. Another
>>> might
>>> > >>> be to
>>> > >>> > implement a unit type system similar to squants
>>> > >>> > <https://github.com/typelevel/squants>. This style of unit type
>>> > >>> system is
>>> > >>> > not as flexible or intuitive as what can be done with Scala's
>>> latest
>>> > >>> type
>>> > >>> > system sorcery, but it would allow the community to build out a
>>> Java
>>> > >>> native
>>> > >>> > type system that supports compile-time unit analysis. And its
>>> > coverage
>>> > >>> of
>>> > >>> > standard units could be made very good, as squants itself
>>> > demonstrates.
>>> > >>> >
>>> > >>> > Python would also be a high-coverage target. I'm even less sure
>>> what
>>> > >>> to do
>>> > >>> > for python, as it has no compile-time type checking, but perhaps
>>> a
>>> > >>> > squants-like python class system would add value. Maybe python's
>>> new
>>> > >>> > type-hints feature could be leveraged?
>>> > >>> >
>>> > >>> > Regarding unit expression representation, I'm not unhappy with
>>> what
>>> > >>> I've
>>> > >>> > prototyped in `coulomb-avro`, in broad strokes. It has
>>> deficiencies
>>> > >>> that
>>> > >>> > would need addressing. It doesn't yet support standard unit
>>> > >>> abbreviations,
>>> > >>> > nor does it understand plurals (e.g. it can parse "second" but
>>> not
>>> > >>> > "seconds"). Since it's "unit" field is just a custom metadata
>>> key,
>>> > >>> there is
>>> > >>> > no enforcement. Parsers are currently instantiated via explicit
>>> lists
>>> > >>> of
>>> > >>> > types, which is a property I like, but that may not work well in
>>> a
>>> > >>> world
>>> > >>> > where multiple language bindings must be supported in a portable
>>> > >>> manner.
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <ni...@basj.es>
>>> wrote:
>>> > >>> >
>>> > >>> > > Hi,
>>> > >>> > >
>>> > >>> > > I attended your talk in Berlin and at the end I thought "too
>>> bad
>>> > >>> this is
>>> > >>> > > only Scala".
>>> > >>> > >
>>> > >>> > > I think it's a good idea to have this in Avro.
>>> > >>> > >
>>> > >>> > > The details will be tricky: How to encode the units in the
>>> schema
>>> > for
>>> > >>> > > example.
>>> > >>> > > Especially because of the automatic conversion you spoke about.
>>> > >>> > >
>>> > >>> > > Niels
>>> > >>> > >
>>> > >>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <
>>> eerlands@redhat.com>
>>> > >>> wrote:
>>> > >>> > >
>>> > >>> > > > Hi Avro community,
>>> > >>> > > >
>>> > >>> > > > Recently I have been experimenting with avro schema that are
>>> > >>> extended
>>> > >>> > > with
>>> > >>> > > > a "unit" field. By "unit" I mean expressions like "second",
>>> or
>>> > >>> > > "megabyte" -
>>> > >>> > > > that is "units of measure".
>>> > >>> > > >
>>> > >>> > > > I delivered a short talk on my experiments at Berlin
>>> Buzzwords,
>>> > >>> which
>>> > >>> > can
>>> > >>> > > > be viewed here:
>>> > >>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
>>> > >>> > > > I also wrote a short blog post that may be faster to ingest:
>>> > >>> > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
>>> > >>> > > >
>>> > >>> > > > I received some audience interest in making this concept
>>> "first
>>> > >>> class"
>>> > >>> > > for
>>> > >>> > > > avro, and so I'm writing to see what the avro dev community
>>> > thinks
>>> > >>> of
>>> > >>> > the
>>> > >>> > > > idea. One issue is that this kind of unit checking is
>>> currently
>>> > >>> only
>>> > >>> > > > available for Scala (and specifically scala 2.13 +).
>>> > >>> > > >
>>> > >>> > > > The Scala project itself is here:
>>> > >>> > > > https://github.com/erikerlandson/coulomb
>>> > >>> > > >
>>> > >>> > > > Cheers,
>>> > >>> > > > Erik
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>
>>> >
>>>
>>

Re: supporting a "unit" field for avro schema

Posted by Erik Erlandson <ee...@redhat.com>.
I drafted an AEP for unit metadata on schema:
https://docs.google.com/document/d/1IeVAtf6YcAAn35D4jmFQJjPpEMgEu79wWVMW37KvNps/


On Tue, Jul 16, 2019 at 1:35 PM Erik Erlandson <ee...@redhat.com> wrote:

> Hi Ryan,
> Those are all great questions. They're all issues I have ideas about but
> I'd want Avro community input for as well. For that reason I answered them
> all on AVRO-2474 <https://issues.apache.org/jira/browse/AVRO-2474>
> Cheers!
> E
>
> On Tue, Jul 16, 2019 at 3:13 AM Ryan Skraba <ry...@skraba.com> wrote:
>
>> Hello!  I've been thinking about this and I generally like the idea of
>> stronger types with units :D
>>
>> I have some questions about what you are thinking of when you say "first
>> class concept" in Avro:
>> - Would you expect a writer schema that wrote a Fahrenheit field and a
>> reader schema that reads Celsius to interact transparently with generic
>> data?
>> - What about conversions that lose precision (i.e., if the above
>> conversion
>> was on an INT field)?
>> - How much of "unit" support should be mandatory in the spec for cross
>> language operation?  (a unit-aware Scala writer with a Fahrenheit field
>> and
>> a non-unit-aware reader with a Celsius field).
>> - To what degree would a generic reader of Avro data be required to
>> support
>> quantity wrappers (i.e. how can we opt-in/opt-out cleanly from being
>> unit-aware)?
>>
>> At scale, I'd be particularly keen to see the conversion detection
>> (between
>> two schemas / fields / quantities) take place once, and then the
>> calculation reused for all of the subsequent datum passing through, but
>> I'm
>> not sure how that would work.
>>
>> We have some experience with passing a lot of client data through Avro,
>> and
>> we use generic data quite a bit -- I'd be tempted to think of "float
>> (metres)" as a distinct type from "float (minutes)", but it would be a
>> huge
>> (but potentially interesting) change for the way we look at data.  That
>> being said, as far as units go, we see a lot more unitless values
>> (quantity
>> of items, percents and other ratios, ratings).  The most frequent numeric
>> values with units that we see are probably money or geolocation (in
>> practice, already normalized to lat/long -- although I just learned about
>> UTM!).  Surprisingly, there's not as much SI-type unit data as you might
>> expect.
>>
>> I can definitely see the value of using a "unit" annotation in a generated
>> specific record for a supported language -- as proven by your scala work!
>> That might be an easy first target while working out what a first-class
>> concept in the spec would entail.  I missed Berlin Buzzwords by a day, but
>> enjoyed the video, thanks!
>>
>> Ryan
>>
>>
>>
>> On Tue, Jul 16, 2019 at 1:24 AM Erik Erlandson <ee...@redhat.com>
>> wrote:
>>
>> > If I'm interpreting the situation correctly, there is an "Avro
>> Enhancement
>> > Proposal", but none have been filed in nearly a decade:
>> >
>> https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals
>> >
>> > As a start, I submitted a jira to track this idea:
>> > https://issues.apache.org/jira/browse/AVRO-2474
>> >
>> >
>> >
>> > On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <ee...@redhat.com>
>> > wrote:
>> >
>> > >
>> > > What should I do to move this forward? Does Avro have a PIP process?
>> > >
>> > >
>> > > On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <ee...@redhat.com>
>> > > wrote:
>> > >
>> > >>
>> > >> Regarding schema, my proposal for fingerprints would be that units
>> are
>> > >> fingerprinted based on their canonical form, as defined here
>> > >> <
>> >
>> http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/
>> > >.
>> > >> Any two unit expressions having the same canonical form (including
>> the
>> > >> corresponding coefficients) are exactly equivalent, and so their
>> > >> fingerprints can be the same. Possibly the unit could be stored on
>> the
>> > >> schema in canonical form by convention, although canonical forms are
>> > >> frequently not as intuitive to humans and so in that case the
>> > documentation
>> > >> value of the unit might be reduced for humans examining the schema.
>> > >>
>> > >> For schema evolution, a unit change such that the previous and new
>> unit
>> > >> are convertable (also defined as at the above link) would be well
>> > defined,
>> > >> and automatic transformation would just be the correct unit
>> conversion
>> > >> (e.g. seconds to milliseconds). If the unit changes to a
>> non-convertable
>> > >> unit (e.g. seconds to bytes) then no automatic transformation exists,
>> > and
>> > >> attempting to resolve the old and new schema would be an error. Note
>> > that
>> > >> establishing the conversion assumes that both original and new
>> schemas
>> > are
>> > >> available at read time.
>> > >>
>> > >>
>> > >> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <ni...@basj.es> wrote:
>> > >>
>> > >>> I think we should approach this idea in two parts:
>> > >>>
>> > >>> 1) The schema. Things like does a different unit mean a different
>> > schema
>> > >>> fingerprint even though the bytes remain the same. What does a
>> > different
>> > >>> unit mean for schema evolution.
>> > >>>
>> > >>> 2) Language specifics. Scala has different possibilities than Java.
>> > >>>
>> > >>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <ee...@redhat.com>
>> > wrote:
>> > >>>
>> > >>> > I've been puzzling over what can be done to support this in more
>> > >>> > widely-used languages. The dilemma relative to the current
>> language
>> > >>> > ecosystem is that languages with "modern" type systems (Haskell,
>> > Rust,
>> > >>> > Scala, etc) capable of supporting compile-time unit checking, in
>> the
>> > >>> > particular style I've been exploring, are not yet widely used.
>> > >>> >
>> > >>> > With respect to Java, a couple approaches are plausible. One is to
>> > >>> enhance
>> > >>> > the language, for example with Java-8 compiler plugins. Another
>> might
>> > >>> be to
>> > >>> > implement a unit type system similar to squants
>> > >>> > <https://github.com/typelevel/squants>. This style of unit type
>> > >>> system is
>> > >>> > not as flexible or intuitive as what can be done with Scala's
>> latest
>> > >>> type
>> > >>> > system sorcery, but it would allow the community to build out a
>> Java
>> > >>> native
>> > >>> > type system that supports compile-time unit analysis. And its
>> > coverage
>> > >>> of
>> > >>> > standard units could be made very good, as squants itself
>> > demonstrates.
>> > >>> >
>> > >>> > Python would also be a high-coverage target. I'm even less sure
>> what
>> > >>> to do
>> > >>> > for python, as it has no compile-time type checking, but perhaps a
>> > >>> > squants-like python class system would add value. Maybe python's
>> new
>> > >>> > type-hints feature could be leveraged?
>> > >>> >
>> > >>> > Regarding unit expression representation, I'm not unhappy with
>> what
>> > >>> I've
>> > >>> > prototyped in `coulomb-avro`, in broad strokes. It has
>> deficiencies
>> > >>> that
>> > >>> > would need addressing. It doesn't yet support standard unit
>> > >>> abbreviations,
>> > >>> > nor does it understand plurals (e.g. it can parse "second" but not
>> > >>> > "seconds"). Since it's "unit" field is just a custom metadata key,
>> > >>> there is
>> > >>> > no enforcement. Parsers are currently instantiated via explicit
>> lists
>> > >>> of
>> > >>> > types, which is a property I like, but that may not work well in a
>> > >>> world
>> > >>> > where multiple language bindings must be supported in a portable
>> > >>> manner.
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <ni...@basj.es>
>> wrote:
>> > >>> >
>> > >>> > > Hi,
>> > >>> > >
>> > >>> > > I attended your talk in Berlin and at the end I thought "too bad
>> > >>> this is
>> > >>> > > only Scala".
>> > >>> > >
>> > >>> > > I think it's a good idea to have this in Avro.
>> > >>> > >
>> > >>> > > The details will be tricky: How to encode the units in the
>> schema
>> > for
>> > >>> > > example.
>> > >>> > > Especially because of the automatic conversion you spoke about.
>> > >>> > >
>> > >>> > > Niels
>> > >>> > >
>> > >>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <eerlands@redhat.com
>> >
>> > >>> wrote:
>> > >>> > >
>> > >>> > > > Hi Avro community,
>> > >>> > > >
>> > >>> > > > Recently I have been experimenting with avro schema that are
>> > >>> extended
>> > >>> > > with
>> > >>> > > > a "unit" field. By "unit" I mean expressions like "second", or
>> > >>> > > "megabyte" -
>> > >>> > > > that is "units of measure".
>> > >>> > > >
>> > >>> > > > I delivered a short talk on my experiments at Berlin
>> Buzzwords,
>> > >>> which
>> > >>> > can
>> > >>> > > > be viewed here:
>> > >>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
>> > >>> > > > I also wrote a short blog post that may be faster to ingest:
>> > >>> > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
>> > >>> > > >
>> > >>> > > > I received some audience interest in making this concept
>> "first
>> > >>> class"
>> > >>> > > for
>> > >>> > > > avro, and so I'm writing to see what the avro dev community
>> > thinks
>> > >>> of
>> > >>> > the
>> > >>> > > > idea. One issue is that this kind of unit checking is
>> currently
>> > >>> only
>> > >>> > > > available for Scala (and specifically scala 2.13 +).
>> > >>> > > >
>> > >>> > > > The Scala project itself is here:
>> > >>> > > > https://github.com/erikerlandson/coulomb
>> > >>> > > >
>> > >>> > > > Cheers,
>> > >>> > > > Erik
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>

Re: supporting a "unit" field for avro schema

Posted by Erik Erlandson <ee...@redhat.com>.
Hi Ryan,
Those are all great questions. They're all issues I have ideas about but
I'd want Avro community input for as well. For that reason I answered them
all on AVRO-2474 <https://issues.apache.org/jira/browse/AVRO-2474>
Cheers!
E

On Tue, Jul 16, 2019 at 3:13 AM Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  I've been thinking about this and I generally like the idea of
> stronger types with units :D
>
> I have some questions about what you are thinking of when you say "first
> class concept" in Avro:
> - Would you expect a writer schema that wrote a Fahrenheit field and a
> reader schema that reads Celsius to interact transparently with generic
> data?
> - What about conversions that lose precision (i.e., if the above conversion
> was on an INT field)?
> - How much of "unit" support should be mandatory in the spec for cross
> language operation?  (a unit-aware Scala writer with a Fahrenheit field and
> a non-unit-aware reader with a Celsius field).
> - To what degree would a generic reader of Avro data be required to support
> quantity wrappers (i.e. how can we opt-in/opt-out cleanly from being
> unit-aware)?
>
> At scale, I'd be particularly keen to see the conversion detection (between
> two schemas / fields / quantities) take place once, and then the
> calculation reused for all of the subsequent datum passing through, but I'm
> not sure how that would work.
>
> We have some experience with passing a lot of client data through Avro, and
> we use generic data quite a bit -- I'd be tempted to think of "float
> (metres)" as a distinct type from "float (minutes)", but it would be a huge
> (but potentially interesting) change for the way we look at data.  That
> being said, as far as units go, we see a lot more unitless values (quantity
> of items, percents and other ratios, ratings).  The most frequent numeric
> values with units that we see are probably money or geolocation (in
> practice, already normalized to lat/long -- although I just learned about
> UTM!).  Surprisingly, there's not as much SI-type unit data as you might
> expect.
>
> I can definitely see the value of using a "unit" annotation in a generated
> specific record for a supported language -- as proven by your scala work!
> That might be an easy first target while working out what a first-class
> concept in the spec would entail.  I missed Berlin Buzzwords by a day, but
> enjoyed the video, thanks!
>
> Ryan
>
>
>
> On Tue, Jul 16, 2019 at 1:24 AM Erik Erlandson <ee...@redhat.com>
> wrote:
>
> > If I'm interpreting the situation correctly, there is an "Avro
> Enhancement
> > Proposal", but none have been filed in nearly a decade:
> >
> https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals
> >
> > As a start, I submitted a jira to track this idea:
> > https://issues.apache.org/jira/browse/AVRO-2474
> >
> >
> >
> > On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <ee...@redhat.com>
> > wrote:
> >
> > >
> > > What should I do to move this forward? Does Avro have a PIP process?
> > >
> > >
> > > On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <ee...@redhat.com>
> > > wrote:
> > >
> > >>
> > >> Regarding schema, my proposal for fingerprints would be that units are
> > >> fingerprinted based on their canonical form, as defined here
> > >> <
> >
> http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/
> > >.
> > >> Any two unit expressions having the same canonical form (including the
> > >> corresponding coefficients) are exactly equivalent, and so their
> > >> fingerprints can be the same. Possibly the unit could be stored on the
> > >> schema in canonical form by convention, although canonical forms are
> > >> frequently not as intuitive to humans and so in that case the
> > documentation
> > >> value of the unit might be reduced for humans examining the schema.
> > >>
> > >> For schema evolution, a unit change such that the previous and new
> unit
> > >> are convertable (also defined as at the above link) would be well
> > defined,
> > >> and automatic transformation would just be the correct unit conversion
> > >> (e.g. seconds to milliseconds). If the unit changes to a
> non-convertable
> > >> unit (e.g. seconds to bytes) then no automatic transformation exists,
> > and
> > >> attempting to resolve the old and new schema would be an error. Note
> > that
> > >> establishing the conversion assumes that both original and new schemas
> > are
> > >> available at read time.
> > >>
> > >>
> > >> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <ni...@basj.es> wrote:
> > >>
> > >>> I think we should approach this idea in two parts:
> > >>>
> > >>> 1) The schema. Things like does a different unit mean a different
> > schema
> > >>> fingerprint even though the bytes remain the same. What does a
> > different
> > >>> unit mean for schema evolution.
> > >>>
> > >>> 2) Language specifics. Scala has different possibilities than Java.
> > >>>
> > >>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <ee...@redhat.com>
> > wrote:
> > >>>
> > >>> > I've been puzzling over what can be done to support this in more
> > >>> > widely-used languages. The dilemma relative to the current language
> > >>> > ecosystem is that languages with "modern" type systems (Haskell,
> > Rust,
> > >>> > Scala, etc) capable of supporting compile-time unit checking, in
> the
> > >>> > particular style I've been exploring, are not yet widely used.
> > >>> >
> > >>> > With respect to Java, a couple approaches are plausible. One is to
> > >>> enhance
> > >>> > the language, for example with Java-8 compiler plugins. Another
> might
> > >>> be to
> > >>> > implement a unit type system similar to squants
> > >>> > <https://github.com/typelevel/squants>. This style of unit type
> > >>> system is
> > >>> > not as flexible or intuitive as what can be done with Scala's
> latest
> > >>> type
> > >>> > system sorcery, but it would allow the community to build out a
> Java
> > >>> native
> > >>> > type system that supports compile-time unit analysis. And its
> > coverage
> > >>> of
> > >>> > standard units could be made very good, as squants itself
> > demonstrates.
> > >>> >
> > >>> > Python would also be a high-coverage target. I'm even less sure
> what
> > >>> to do
> > >>> > for python, as it has no compile-time type checking, but perhaps a
> > >>> > squants-like python class system would add value. Maybe python's
> new
> > >>> > type-hints feature could be leveraged?
> > >>> >
> > >>> > Regarding unit expression representation, I'm not unhappy with what
> > >>> I've
> > >>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies
> > >>> that
> > >>> > would need addressing. It doesn't yet support standard unit
> > >>> abbreviations,
> > >>> > nor does it understand plurals (e.g. it can parse "second" but not
> > >>> > "seconds"). Since it's "unit" field is just a custom metadata key,
> > >>> there is
> > >>> > no enforcement. Parsers are currently instantiated via explicit
> lists
> > >>> of
> > >>> > types, which is a property I like, but that may not work well in a
> > >>> world
> > >>> > where multiple language bindings must be supported in a portable
> > >>> manner.
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <ni...@basj.es>
> wrote:
> > >>> >
> > >>> > > Hi,
> > >>> > >
> > >>> > > I attended your talk in Berlin and at the end I thought "too bad
> > >>> this is
> > >>> > > only Scala".
> > >>> > >
> > >>> > > I think it's a good idea to have this in Avro.
> > >>> > >
> > >>> > > The details will be tricky: How to encode the units in the schema
> > for
> > >>> > > example.
> > >>> > > Especially because of the automatic conversion you spoke about.
> > >>> > >
> > >>> > > Niels
> > >>> > >
> > >>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <ee...@redhat.com>
> > >>> wrote:
> > >>> > >
> > >>> > > > Hi Avro community,
> > >>> > > >
> > >>> > > > Recently I have been experimenting with avro schema that are
> > >>> extended
> > >>> > > with
> > >>> > > > a "unit" field. By "unit" I mean expressions like "second", or
> > >>> > > "megabyte" -
> > >>> > > > that is "units of measure".
> > >>> > > >
> > >>> > > > I delivered a short talk on my experiments at Berlin Buzzwords,
> > >>> which
> > >>> > can
> > >>> > > > be viewed here:
> > >>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
> > >>> > > > I also wrote a short blog post that may be faster to ingest:
> > >>> > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
> > >>> > > >
> > >>> > > > I received some audience interest in making this concept "first
> > >>> class"
> > >>> > > for
> > >>> > > > avro, and so I'm writing to see what the avro dev community
> > thinks
> > >>> of
> > >>> > the
> > >>> > > > idea. One issue is that this kind of unit checking is currently
> > >>> only
> > >>> > > > available for Scala (and specifically scala 2.13 +).
> > >>> > > >
> > >>> > > > The Scala project itself is here:
> > >>> > > > https://github.com/erikerlandson/coulomb
> > >>> > > >
> > >>> > > > Cheers,
> > >>> > > > Erik
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
>

Re: supporting a "unit" field for avro schema

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  I've been thinking about this and I generally like the idea of
stronger types with units :D

I have some questions about what you are thinking of when you say "first
class concept" in Avro:
- Would you expect a writer schema that wrote a Fahrenheit field and a
reader schema that reads Celsius to interact transparently with generic
data?
- What about conversions that lose precision (i.e., if the above conversion
was on an INT field)?
- How much of "unit" support should be mandatory in the spec for cross
language operation?  (a unit-aware Scala writer with a Fahrenheit field and
a non-unit-aware reader with a Celsius field).
- To what degree would a generic reader of Avro data be required to support
quantity wrappers (i.e. how can we opt-in/opt-out cleanly from being
unit-aware)?

At scale, I'd be particularly keen to see the conversion detection (between
two schemas / fields / quantities) take place once, and then the
calculation reused for all of the subsequent datum passing through, but I'm
not sure how that would work.

We have some experience with passing a lot of client data through Avro, and
we use generic data quite a bit -- I'd be tempted to think of "float
(metres)" as a distinct type from "float (minutes)", but it would be a huge
(but potentially interesting) change for the way we look at data.  That
being said, as far as units go, we see a lot more unitless values (quantity
of items, percents and other ratios, ratings).  The most frequent numeric
values with units that we see are probably money or geolocation (in
practice, already normalized to lat/long -- although I just learned about
UTM!).  Surprisingly, there's not as much SI-type unit data as you might
expect.

I can definitely see the value of using a "unit" annotation in a generated
specific record for a supported language -- as proven by your scala work!
That might be an easy first target while working out what a first-class
concept in the spec would entail.  I missed Berlin Buzzwords by a day, but
enjoyed the video, thanks!

Ryan



On Tue, Jul 16, 2019 at 1:24 AM Erik Erlandson <ee...@redhat.com> wrote:

> If I'm interpreting the situation correctly, there is an "Avro Enhancement
> Proposal", but none have been filed in nearly a decade:
> https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals
>
> As a start, I submitted a jira to track this idea:
> https://issues.apache.org/jira/browse/AVRO-2474
>
>
>
> On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <ee...@redhat.com>
> wrote:
>
> >
> > What should I do to move this forward? Does Avro have a PIP process?
> >
> >
> > On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <ee...@redhat.com>
> > wrote:
> >
> >>
> >> Regarding schema, my proposal for fingerprints would be that units are
> >> fingerprinted based on their canonical form, as defined here
> >> <
> http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/
> >.
> >> Any two unit expressions having the same canonical form (including the
> >> corresponding coefficients) are exactly equivalent, and so their
> >> fingerprints can be the same. Possibly the unit could be stored on the
> >> schema in canonical form by convention, although canonical forms are
> >> frequently not as intuitive to humans and so in that case the
> documentation
> >> value of the unit might be reduced for humans examining the schema.
> >>
> >> For schema evolution, a unit change such that the previous and new unit
> >> are convertable (also defined as at the above link) would be well
> defined,
> >> and automatic transformation would just be the correct unit conversion
> >> (e.g. seconds to milliseconds). If the unit changes to a non-convertable
> >> unit (e.g. seconds to bytes) then no automatic transformation exists,
> and
> >> attempting to resolve the old and new schema would be an error. Note
> that
> >> establishing the conversion assumes that both original and new schemas
> are
> >> available at read time.
> >>
> >>
> >> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <ni...@basj.es> wrote:
> >>
> >>> I think we should approach this idea in two parts:
> >>>
> >>> 1) The schema. Things like does a different unit mean a different
> schema
> >>> fingerprint even though the bytes remain the same. What does a
> different
> >>> unit mean for schema evolution.
> >>>
> >>> 2) Language specifics. Scala has different possibilities than Java.
> >>>
> >>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <ee...@redhat.com>
> wrote:
> >>>
> >>> > I've been puzzling over what can be done to support this in more
> >>> > widely-used languages. The dilemma relative to the current language
> >>> > ecosystem is that languages with "modern" type systems (Haskell,
> Rust,
> >>> > Scala, etc) capable of supporting compile-time unit checking, in the
> >>> > particular style I've been exploring, are not yet widely used.
> >>> >
> >>> > With respect to Java, a couple approaches are plausible. One is to
> >>> enhance
> >>> > the language, for example with Java-8 compiler plugins. Another might
> >>> be to
> >>> > implement a unit type system similar to squants
> >>> > <https://github.com/typelevel/squants>. This style of unit type
> >>> system is
> >>> > not as flexible or intuitive as what can be done with Scala's latest
> >>> type
> >>> > system sorcery, but it would allow the community to build out a Java
> >>> native
> >>> > type system that supports compile-time unit analysis. And its
> coverage
> >>> of
> >>> > standard units could be made very good, as squants itself
> demonstrates.
> >>> >
> >>> > Python would also be a high-coverage target. I'm even less sure what
> >>> to do
> >>> > for python, as it has no compile-time type checking, but perhaps a
> >>> > squants-like python class system would add value. Maybe python's new
> >>> > type-hints feature could be leveraged?
> >>> >
> >>> > Regarding unit expression representation, I'm not unhappy with what
> >>> I've
> >>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies
> >>> that
> >>> > would need addressing. It doesn't yet support standard unit
> >>> abbreviations,
> >>> > nor does it understand plurals (e.g. it can parse "second" but not
> >>> > "seconds"). Since it's "unit" field is just a custom metadata key,
> >>> there is
> >>> > no enforcement. Parsers are currently instantiated via explicit lists
> >>> of
> >>> > types, which is a property I like, but that may not work well in a
> >>> world
> >>> > where multiple language bindings must be supported in a portable
> >>> manner.
> >>> >
> >>> >
> >>> >
> >>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <ni...@basj.es> wrote:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > I attended your talk in Berlin and at the end I thought "too bad
> >>> this is
> >>> > > only Scala".
> >>> > >
> >>> > > I think it's a good idea to have this in Avro.
> >>> > >
> >>> > > The details will be tricky: How to encode the units in the schema
> for
> >>> > > example.
> >>> > > Especially because of the automatic conversion you spoke about.
> >>> > >
> >>> > > Niels
> >>> > >
> >>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <ee...@redhat.com>
> >>> wrote:
> >>> > >
> >>> > > > Hi Avro community,
> >>> > > >
> >>> > > > Recently I have been experimenting with avro schema that are
> >>> extended
> >>> > > with
> >>> > > > a "unit" field. By "unit" I mean expressions like "second", or
> >>> > > "megabyte" -
> >>> > > > that is "units of measure".
> >>> > > >
> >>> > > > I delivered a short talk on my experiments at Berlin Buzzwords,
> >>> which
> >>> > can
> >>> > > > be viewed here:
> >>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
> >>> > > > I also wrote a short blog post that may be faster to ingest:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
> >>> > > >
> >>> > > > I received some audience interest in making this concept "first
> >>> class"
> >>> > > for
> >>> > > > avro, and so I'm writing to see what the avro dev community
> thinks
> >>> of
> >>> > the
> >>> > > > idea. One issue is that this kind of unit checking is currently
> >>> only
> >>> > > > available for Scala (and specifically scala 2.13 +).
> >>> > > >
> >>> > > > The Scala project itself is here:
> >>> > > > https://github.com/erikerlandson/coulomb
> >>> > > >
> >>> > > > Cheers,
> >>> > > > Erik
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>

Re: supporting a "unit" field for avro schema

Posted by Erik Erlandson <ee...@redhat.com>.
If I'm interpreting the situation correctly, there is an "Avro Enhancement
Proposal", but none have been filed in nearly a decade:
https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals

As a start, I submitted a jira to track this idea:
https://issues.apache.org/jira/browse/AVRO-2474



On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <ee...@redhat.com> wrote:

>
> What should I do to move this forward? Does Avro have a PIP process?
>
>
> On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <ee...@redhat.com>
> wrote:
>
>>
>> Regarding schema, my proposal for fingerprints would be that units are
>> fingerprinted based on their canonical form, as defined here
>> <http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>.
>> Any two unit expressions having the same canonical form (including the
>> corresponding coefficients) are exactly equivalent, and so their
>> fingerprints can be the same. Possibly the unit could be stored on the
>> schema in canonical form by convention, although canonical forms are
>> frequently not as intuitive to humans and so in that case the documentation
>> value of the unit might be reduced for humans examining the schema.
>>
>> For schema evolution, a unit change such that the previous and new unit
>> are convertable (also defined as at the above link) would be well defined,
>> and automatic transformation would just be the correct unit conversion
>> (e.g. seconds to milliseconds). If the unit changes to a non-convertable
>> unit (e.g. seconds to bytes) then no automatic transformation exists, and
>> attempting to resolve the old and new schema would be an error. Note that
>> establishing the conversion assumes that both original and new schemas are
>> available at read time.
>>
>>
>> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <ni...@basj.es> wrote:
>>
>>> I think we should approach this idea in two parts:
>>>
>>> 1) The schema. Things like does a different unit mean a different schema
>>> fingerprint even though the bytes remain the same. What does a different
>>> unit mean for schema evolution.
>>>
>>> 2) Language specifics. Scala has different possibilities than Java.
>>>
>>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <ee...@redhat.com> wrote:
>>>
>>> > I've been puzzling over what can be done to support this in more
>>> > widely-used languages. The dilemma relative to the current language
>>> > ecosystem is that languages with "modern" type systems (Haskell, Rust,
>>> > Scala, etc) capable of supporting compile-time unit checking, in the
>>> > particular style I've been exploring, are not yet widely used.
>>> >
>>> > With respect to Java, a couple approaches are plausible. One is to
>>> enhance
>>> > the language, for example with Java-8 compiler plugins. Another might
>>> be to
>>> > implement a unit type system similar to squants
>>> > <https://github.com/typelevel/squants>. This style of unit type
>>> system is
>>> > not as flexible or intuitive as what can be done with Scala's latest
>>> type
>>> > system sorcery, but it would allow the community to build out a Java
>>> native
>>> > type system that supports compile-time unit analysis. And its coverage
>>> of
>>> > standard units could be made very good, as squants itself demonstrates.
>>> >
>>> > Python would also be a high-coverage target. I'm even less sure what
>>> to do
>>> > for python, as it has no compile-time type checking, but perhaps a
>>> > squants-like python class system would add value. Maybe python's new
>>> > type-hints feature could be leveraged?
>>> >
>>> > Regarding unit expression representation, I'm not unhappy with what
>>> I've
>>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies
>>> that
>>> > would need addressing. It doesn't yet support standard unit
>>> abbreviations,
>>> > nor does it understand plurals (e.g. it can parse "second" but not
>>> > "seconds"). Since it's "unit" field is just a custom metadata key,
>>> there is
>>> > no enforcement. Parsers are currently instantiated via explicit lists
>>> of
>>> > types, which is a property I like, but that may not work well in a
>>> world
>>> > where multiple language bindings must be supported in a portable
>>> manner.
>>> >
>>> >
>>> >
>>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <ni...@basj.es> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > I attended your talk in Berlin and at the end I thought "too bad
>>> this is
>>> > > only Scala".
>>> > >
>>> > > I think it's a good idea to have this in Avro.
>>> > >
>>> > > The details will be tricky: How to encode the units in the schema for
>>> > > example.
>>> > > Especially because of the automatic conversion you spoke about.
>>> > >
>>> > > Niels
>>> > >
>>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <ee...@redhat.com>
>>> wrote:
>>> > >
>>> > > > Hi Avro community,
>>> > > >
>>> > > > Recently I have been experimenting with avro schema that are
>>> extended
>>> > > with
>>> > > > a "unit" field. By "unit" I mean expressions like "second", or
>>> > > "megabyte" -
>>> > > > that is "units of measure".
>>> > > >
>>> > > > I delivered a short talk on my experiments at Berlin Buzzwords,
>>> which
>>> > can
>>> > > > be viewed here:
>>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
>>> > > > I also wrote a short blog post that may be faster to ingest:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
>>> > > >
>>> > > > I received some audience interest in making this concept "first
>>> class"
>>> > > for
>>> > > > avro, and so I'm writing to see what the avro dev community thinks
>>> of
>>> > the
>>> > > > idea. One issue is that this kind of unit checking is currently
>>> only
>>> > > > available for Scala (and specifically scala 2.13 +).
>>> > > >
>>> > > > The Scala project itself is here:
>>> > > > https://github.com/erikerlandson/coulomb
>>> > > >
>>> > > > Cheers,
>>> > > > Erik
>>> > > >
>>> > >
>>> >
>>>
>>