You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by Christophe Le Saëc <ch...@gmail.com> on 2022/11/10 07:55:27 UTC

[DISCUSS][VOTE] Naming rules

Hello community,
JIRA ticket AVRO-3532 <https://issues.apache.org/jira/browse/AVRO-3532>
show the issue.
Formal naming rules
<https://avro.apache.org/docs/1.11.1/specification/#names> described in
documentation are more restrictives than effective control in Java (And C#).
Indeed, java code
<https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1602-L1607>,
by using Character.isLetter & Character.isLetterOrDigit methods, accepts
accent like  "éàç", chinese/japanese alphabet (我) ...; (*and this
charactesr are also valid in standard java code in field/method/class name*).

This situation is like that since at least 1.8.2 version (*i didn't look at
older*).

So, discussion is to choose between

   1. "change the documentation" (and adapt module as proposed in this PR
   for RUST <https://github.com/apache/avro/pull/1787> and this other for C
   <https://github.com/apache/avro/pull/1798>)
   2. change the code (in Java and C# at least) to be conformed to
   documentation.


Personnally, i'm in favor of the first option. If some Java programmer are
like me and rely more on code source than on official documentation, and
already use accent in name, they will have a breaking change if we choose
second option. First option would not create breaking change; for rust and
C, change will preserve backward compatibility.

WDYT ?

Re: [DISCUSS][VOTE] Naming rules

Posted by Christophe Le Saëc <ch...@gmail.com>.

Another point, could we mimic this java rules
<https://docs.oracle.com/javase/tutorial/java/nutsandbolts/variables.html>
? (excluding 'dot' and 'space' from names)

Le ven. 25 nov. 2022 à 17:28, Christophe Le Saëc <ch...@gmail.com> a
écrit :

> Thanks for sharing the url https://langsec.org/occupy/, very interesting
> site.
>
> * internal aspects like unicode normalisation* => Yes, may be main
> argument to keep specification like this and fix validation code. Does
> someone knows how java compiler validates names (*for method, variables
> ...*), as it accepts unicode name ? could it be a solution for Avro ?
>
> This 2 arguments would also be valuables for properties (names & values
> when String) contained in JsonProperties class which is parent of Schema &
> Field classes ?
>
> So, why let one without control (*properties name and value*) and second
> with restrictive control (*field name*) ?
>
> Best regards,
> Christophe.
>
> Le mer. 23 nov. 2022 à 20:20, Ryan Skraba <ry...@skraba.com> a écrit :
>
>> Hello!  I have a specific opinion about the "Robustness Principle",
>> especially in this case!
>>
>> "Accepting liberally, generating strictly" (the paraphrasing of
>> Postel's idea) has it's place, and might be a good principle for
>> binary encoding and decoding.  It's not so great for "accepting and
>> generating schemas".  In this case, it's led directly to this debate:
>> accepting "invalid" names has become one facto standard for a
>> _certain_ category of users, who are now blocked from interoperating
>> with other language SDKs (and potentially future versions of their own
>> SDK, if "fixed").
>>
>> > If we can use a "non rigorous validation" and
>> > it can run wthout bugs, why switch to a rigorous validation mode that
>> would
>> > follow current specification and not change the specification to "accept
>> > schemas as liberally as possible" (meaning, while it doesn't generate
>> bugs).
>>
>> Here's where I think the logic is faulty: even if we don't count
>> interoperability failures as a bug, I'm not convinced that using names
>> outside the specification run without bugs!  There's several things to
>> think about: internal aspects like unicode normalisation, internal
>> features like schema evolution (which might actually be OK), but
>> especially external ones like downstream projects and tools.
>>
>> As it is, if you follow the specification, Java and Python are
>> interoperable and there's a certain guarantee that existing libraries
>> and projects can count on.
>>
>> The configuration approach is one that would allow upstream projects
>> to continue working with out-of-spec names, while alerting them that
>> these could cause interoperability problems outside of their current
>> cases!  One thing for certain, the specification should allow invalid
>> names for "aliases" so that users can migrate away from these issues.
>>
>> A slightly related resource: https://langsec.org/occupy/
>>
>> All my best, Ryan
>>
>>
>> On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <ch...@gmail.com>
>> wrote:
>> >
>> > Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed
>> shows
>> > that name should not contains space (pb when generate java code) nor dot
>> > (pb to separate names in a path).
>> >
>> > AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last
>> comment
>> > reinforce rules for dot and contains a nice principle : "accept schemas
>> as
>> > liberally as possible"
>> >
>> >
>> > ** allowing two language SDKs to implement the spec differently will
>> make
>> > users unhappy about cross-platform, cross-language compatibility.*
>> > -> Indeed, that's the case with current version, where Java and C#
>> accept
>> > accents when C and Rust strictly follow the spec.
>> >
>> > Others possibilities :
>> > - *putting human-readable or internationalised names in other metadata
>> > properties* : Yes, this can already be done on record fields for
>> example as
>> > field is a JsonProperties class (and we use it already in some case,
>> that's
>> > help).
>> > - *using configuration / environment / system properties to turn
>> rigorous
>> > spec validation on and off* : If we can use a "non rigorous validation"
>> and
>> > it can run wthout bugs, why switch to a rigorous validation mode that
>> would
>> > follow current specification and not change the specification to "accept
>> > schemas as liberally as possible" (meaning, while it doesn't generate
>> bugs).
>> >
>> >
>> >
>> > *My preference would be to *tighten* the SDKs to match the existing Avro
>> > spec, and provide language-specific ways to easily disable validating
>> names
>> > if desired*
>> > Personnally, i like the idea to have mandatory name control you can't
>> > deactivate, to ensure it won't generate bug (For Java code generation
>> > mainly and to be able to separate name and namespace), but control
>> > specification should be limited to ban names that would generate a bug
>> (and
>> > not a rule that seems to have no real reason, until it would be explain
>> in
>> > doc).
>> >
>> > Best regards,
>> > Christophe.
>> >
>> >
>> >
>> > Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <ry...@skraba.com> a écrit :
>> >
>> > > Hello!  Here's a couple of related JIRA from the past that we can use
>> > > to inform our discussion:
>> > >
>> > > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java
>> > > SDK accepts.
>> > > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
>> > > accepting UTF-8 that didn't quite get enough follow-up to make it into
>> > > the spec!
>> > >
>> > > We're been in the current (unsatisfactory) state for so long because:
>> > >
>> > > * making a change to an SDK changing its behaviour (even to "fix it")
>> > > will make users unhappy about backwards/forwards version
>> > > compatibility, and
>> > > * allowing two language SDKs to implement the spec differently will
>> > > make users unhappy about cross-platform, cross-language compatibility.
>> > >
>> > > In my opinion, with modern streaming and event processing, we have to
>> > > take the latter into account!
>> > >
>> > > There were a couple of other options than the two you propose in the
>> > > original discussion thread (such as putting human-readable or
>> > > internationalised names in other metadata properties, or using
>> > > configuration / environment / system properties to turn rigorous spec
>> > > validation on and off).  Have you given them any consideration for
>> > > your use case?
>> > >
>> > > My preference would be to *tighten* the SDKs to match the existing
>> > > Avro spec, and provide language-specific ways to easily disable
>> > > validating names if desired.  There's some precedence for this in the
>> > > Schema.Parser#validate method.
>> > >
>> > > There's a bit more going on here that's worth doing right for the
>> future!
>> > >
>> > > All my best, Ryan
>> > >
>> > >
>> > >
>> > > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
>> > > <os...@westravanholthe.nl> wrote:
>> > > >
>> > > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <
>> chlesaec@gmail.com>
>> > > wrote:
>> > > >
>> > > > > So, discussion is to choose between
>> > > > >
>> > > > >    1. "change the documentation" (and adapt module as proposed in
>> this
>> > > PR
>> > > > >    for RUST <https://github.com/apache/avro/pull/1787> and this
>> other
>> > > for
>> > > > > C
>> > > > >    <https://github.com/apache/avro/pull/1798>)
>> > > > >    2. change the code (in Java and C# at least) to be conformed to
>> > > > >    documentation.
>> > > > >
>> > > >
>> > > > For compatibility, I like option 1. If we're to change naming
>> rules, I'd
>> > > > vote for logging warnings before tightening the rules.
>> > > >
>> > > > Kind regards,
>> > > > Oscar
>> > > >
>> > > > --
>> > > >
>> > > > ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>
>> > >
>>
>

Re: [DISCUSS][VOTE] Naming rules

Posted by Christophe Le Saëc <ch...@gmail.com>.

Thanks for sharing the url https://langsec.org/occupy/, very interesting
site.

* internal aspects like unicode normalisation* => Yes, may be main argument
to keep specification like this and fix validation code. Does someone knows
how java compiler validates names (*for method, variables ...*), as it
accepts unicode name ? could it be a solution for Avro ?

This 2 arguments would also be valuables for properties (names & values
when String) contained in JsonProperties class which is parent of Schema &
Field classes ?

So, why let one without control (*properties name and value*) and second
with restrictive control (*field name*) ?

Best regards,
Christophe.

Le mer. 23 nov. 2022 à 20:20, Ryan Skraba <ry...@skraba.com> a écrit :

> Hello!  I have a specific opinion about the "Robustness Principle",
> especially in this case!
>
> "Accepting liberally, generating strictly" (the paraphrasing of
> Postel's idea) has it's place, and might be a good principle for
> binary encoding and decoding.  It's not so great for "accepting and
> generating schemas".  In this case, it's led directly to this debate:
> accepting "invalid" names has become one facto standard for a
> _certain_ category of users, who are now blocked from interoperating
> with other language SDKs (and potentially future versions of their own
> SDK, if "fixed").
>
> > If we can use a "non rigorous validation" and
> > it can run wthout bugs, why switch to a rigorous validation mode that
> would
> > follow current specification and not change the specification to "accept
> > schemas as liberally as possible" (meaning, while it doesn't generate
> bugs).
>
> Here's where I think the logic is faulty: even if we don't count
> interoperability failures as a bug, I'm not convinced that using names
> outside the specification run without bugs!  There's several things to
> think about: internal aspects like unicode normalisation, internal
> features like schema evolution (which might actually be OK), but
> especially external ones like downstream projects and tools.
>
> As it is, if you follow the specification, Java and Python are
> interoperable and there's a certain guarantee that existing libraries
> and projects can count on.
>
> The configuration approach is one that would allow upstream projects
> to continue working with out-of-spec names, while alerting them that
> these could cause interoperability problems outside of their current
> cases!  One thing for certain, the specification should allow invalid
> names for "aliases" so that users can migrate away from these issues.
>
> A slightly related resource: https://langsec.org/occupy/
>
> All my best, Ryan
>
>
> On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <ch...@gmail.com>
> wrote:
> >
> > Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed shows
> > that name should not contains space (pb when generate java code) nor dot
> > (pb to separate names in a path).
> >
> > AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last
> comment
> > reinforce rules for dot and contains a nice principle : "accept schemas
> as
> > liberally as possible"
> >
> >
> > ** allowing two language SDKs to implement the spec differently will make
> > users unhappy about cross-platform, cross-language compatibility.*
> > -> Indeed, that's the case with current version, where Java and C# accept
> > accents when C and Rust strictly follow the spec.
> >
> > Others possibilities :
> > - *putting human-readable or internationalised names in other metadata
> > properties* : Yes, this can already be done on record fields for example
> as
> > field is a JsonProperties class (and we use it already in some case,
> that's
> > help).
> > - *using configuration / environment / system properties to turn rigorous
> > spec validation on and off* : If we can use a "non rigorous validation"
> and
> > it can run wthout bugs, why switch to a rigorous validation mode that
> would
> > follow current specification and not change the specification to "accept
> > schemas as liberally as possible" (meaning, while it doesn't generate
> bugs).
> >
> >
> >
> > *My preference would be to *tighten* the SDKs to match the existing Avro
> > spec, and provide language-specific ways to easily disable validating
> names
> > if desired*
> > Personnally, i like the idea to have mandatory name control you can't
> > deactivate, to ensure it won't generate bug (For Java code generation
> > mainly and to be able to separate name and namespace), but control
> > specification should be limited to ban names that would generate a bug
> (and
> > not a rule that seems to have no real reason, until it would be explain
> in
> > doc).
> >
> > Best regards,
> > Christophe.
> >
> >
> >
> > Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <ry...@skraba.com> a écrit :
> >
> > > Hello!  Here's a couple of related JIRA from the past that we can use
> > > to inform our discussion:
> > >
> > > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java
> > > SDK accepts.
> > > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
> > > accepting UTF-8 that didn't quite get enough follow-up to make it into
> > > the spec!
> > >
> > > We're been in the current (unsatisfactory) state for so long because:
> > >
> > > * making a change to an SDK changing its behaviour (even to "fix it")
> > > will make users unhappy about backwards/forwards version
> > > compatibility, and
> > > * allowing two language SDKs to implement the spec differently will
> > > make users unhappy about cross-platform, cross-language compatibility.
> > >
> > > In my opinion, with modern streaming and event processing, we have to
> > > take the latter into account!
> > >
> > > There were a couple of other options than the two you propose in the
> > > original discussion thread (such as putting human-readable or
> > > internationalised names in other metadata properties, or using
> > > configuration / environment / system properties to turn rigorous spec
> > > validation on and off).  Have you given them any consideration for
> > > your use case?
> > >
> > > My preference would be to *tighten* the SDKs to match the existing
> > > Avro spec, and provide language-specific ways to easily disable
> > > validating names if desired.  There's some precedence for this in the
> > > Schema.Parser#validate method.
> > >
> > > There's a bit more going on here that's worth doing right for the
> future!
> > >
> > > All my best, Ryan
> > >
> > >
> > >
> > > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
> > > <os...@westravanholthe.nl> wrote:
> > > >
> > > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <chlesaec@gmail.com
> >
> > > wrote:
> > > >
> > > > > So, discussion is to choose between
> > > > >
> > > > >    1. "change the documentation" (and adapt module as proposed in
> this
> > > PR
> > > > >    for RUST <https://github.com/apache/avro/pull/1787> and this
> other
> > > for
> > > > > C
> > > > >    <https://github.com/apache/avro/pull/1798>)
> > > > >    2. change the code (in Java and C# at least) to be conformed to
> > > > >    documentation.
> > > > >
> > > >
> > > > For compatibility, I like option 1. If we're to change naming rules,
> I'd
> > > > vote for logging warnings before tightening the rules.
> > > >
> > > > Kind regards,
> > > > Oscar
> > > >
> > > > --
> > > >
> > > > ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>
> > >
>

Re: [DISCUSS][VOTE] Naming rules

Posted by Ryan Skraba <ry...@skraba.com>.

Hello!  I have a specific opinion about the "Robustness Principle",
especially in this case!

"Accepting liberally, generating strictly" (the paraphrasing of
Postel's idea) has it's place, and might be a good principle for
binary encoding and decoding.  It's not so great for "accepting and
generating schemas".  In this case, it's led directly to this debate:
accepting "invalid" names has become one facto standard for a
_certain_ category of users, who are now blocked from interoperating
with other language SDKs (and potentially future versions of their own
SDK, if "fixed").

> If we can use a "non rigorous validation" and
> it can run wthout bugs, why switch to a rigorous validation mode that would
> follow current specification and not change the specification to "accept
> schemas as liberally as possible" (meaning, while it doesn't generate bugs).

Here's where I think the logic is faulty: even if we don't count
interoperability failures as a bug, I'm not convinced that using names
outside the specification run without bugs!  There's several things to
think about: internal aspects like unicode normalisation, internal
features like schema evolution (which might actually be OK), but
especially external ones like downstream projects and tools.

As it is, if you follow the specification, Java and Python are
interoperable and there's a certain guarantee that existing libraries
and projects can count on.

The configuration approach is one that would allow upstream projects
to continue working with out-of-spec names, while alerting them that
these could cause interoperability problems outside of their current
cases!  One thing for certain, the specification should allow invalid
names for "aliases" so that users can migrate away from these issues.

A slightly related resource: https://langsec.org/occupy/

All my best, Ryan


On Tue, Nov 15, 2022 at 4:41 PM Christophe Le Saëc <ch...@gmail.com> wrote:
>
> Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed shows
> that name should not contains space (pb when generate java code) nor dot
> (pb to separate names in a path).
>
> AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last comment
> reinforce rules for dot and contains a nice principle : "accept schemas as
> liberally as possible"
>
>
> ** allowing two language SDKs to implement the spec differently will make
> users unhappy about cross-platform, cross-language compatibility.*
> -> Indeed, that's the case with current version, where Java and C# accept
> accents when C and Rust strictly follow the spec.
>
> Others possibilities :
> - *putting human-readable or internationalised names in other metadata
> properties* : Yes, this can already be done on record fields for example as
> field is a JsonProperties class (and we use it already in some case, that's
> help).
> - *using configuration / environment / system properties to turn rigorous
> spec validation on and off* : If we can use a "non rigorous validation" and
> it can run wthout bugs, why switch to a rigorous validation mode that would
> follow current specification and not change the specification to "accept
> schemas as liberally as possible" (meaning, while it doesn't generate bugs).
>
>
>
> *My preference would be to *tighten* the SDKs to match the existing Avro
> spec, and provide language-specific ways to easily disable validating names
> if desired*
> Personnally, i like the idea to have mandatory name control you can't
> deactivate, to ensure it won't generate bug (For Java code generation
> mainly and to be able to separate name and namespace), but control
> specification should be limited to ban names that would generate a bug (and
> not a rule that seems to have no real reason, until it would be explain in
> doc).
>
> Best regards,
> Christophe.
>
>
>
> Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <ry...@skraba.com> a écrit :
>
> > Hello!  Here's a couple of related JIRA from the past that we can use
> > to inform our discussion:
> >
> > * AVRO-2659 demonstrates a pretty disastrous schema name that the Java
> > SDK accepts.
> > * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
> > accepting UTF-8 that didn't quite get enough follow-up to make it into
> > the spec!
> >
> > We're been in the current (unsatisfactory) state for so long because:
> >
> > * making a change to an SDK changing its behaviour (even to "fix it")
> > will make users unhappy about backwards/forwards version
> > compatibility, and
> > * allowing two language SDKs to implement the spec differently will
> > make users unhappy about cross-platform, cross-language compatibility.
> >
> > In my opinion, with modern streaming and event processing, we have to
> > take the latter into account!
> >
> > There were a couple of other options than the two you propose in the
> > original discussion thread (such as putting human-readable or
> > internationalised names in other metadata properties, or using
> > configuration / environment / system properties to turn rigorous spec
> > validation on and off).  Have you given them any consideration for
> > your use case?
> >
> > My preference would be to *tighten* the SDKs to match the existing
> > Avro spec, and provide language-specific ways to easily disable
> > validating names if desired.  There's some precedence for this in the
> > Schema.Parser#validate method.
> >
> > There's a bit more going on here that's worth doing right for the future!
> >
> > All my best, Ryan
> >
> >
> >
> > On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
> > <os...@westravanholthe.nl> wrote:
> > >
> > > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <ch...@gmail.com>
> > wrote:
> > >
> > > > So, discussion is to choose between
> > > >
> > > >    1. "change the documentation" (and adapt module as proposed in this
> > PR
> > > >    for RUST <https://github.com/apache/avro/pull/1787> and this other
> > for
> > > > C
> > > >    <https://github.com/apache/avro/pull/1798>)
> > > >    2. change the code (in Java and C# at least) to be conformed to
> > > >    documentation.
> > > >
> > >
> > > For compatibility, I like option 1. If we're to change naming rules, I'd
> > > vote for logging warnings before tightening the rules.
> > >
> > > Kind regards,
> > > Oscar
> > >
> > > --
> > >
> > > ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>
> >

Re: [DISCUSS][VOTE] Naming rules

Posted by Christophe Le Saëc <ch...@gmail.com>.

Avro-2659 <https://issues.apache.org/jira/browse/AVRO-2659> indeed shows
that name should not contains space (pb when generate java code) nor dot
(pb to separate names in a path).

AVRO-1022 <https://issues.apache.org/jira/browse/AVRO-1022>: last comment
reinforce rules for dot and contains a nice principle : "accept schemas as
liberally as possible"


** allowing two language SDKs to implement the spec differently will make
users unhappy about cross-platform, cross-language compatibility.*
-> Indeed, that's the case with current version, where Java and C# accept
accents when C and Rust strictly follow the spec.

Others possibilities :
- *putting human-readable or internationalised names in other metadata
properties* : Yes, this can already be done on record fields for example as
field is a JsonProperties class (and we use it already in some case, that's
help).
- *using configuration / environment / system properties to turn rigorous
spec validation on and off* : If we can use a "non rigorous validation" and
it can run wthout bugs, why switch to a rigorous validation mode that would
follow current specification and not change the specification to "accept
schemas as liberally as possible" (meaning, while it doesn't generate bugs).



*My preference would be to *tighten* the SDKs to match the existing Avro
spec, and provide language-specific ways to easily disable validating names
if desired*
Personnally, i like the idea to have mandatory name control you can't
deactivate, to ensure it won't generate bug (For Java code generation
mainly and to be able to separate name and namespace), but control
specification should be limited to ban names that would generate a bug (and
not a rule that seems to have no real reason, until it would be explain in
doc).

Best regards,
Christophe.



Le jeu. 10 nov. 2022 à 19:15, Ryan Skraba <ry...@skraba.com> a écrit :

> Hello!  Here's a couple of related JIRA from the past that we can use
> to inform our discussion:
>
> * AVRO-2659 demonstrates a pretty disastrous schema name that the Java
> SDK accepts.
> * AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
> accepting UTF-8 that didn't quite get enough follow-up to make it into
> the spec!
>
> We're been in the current (unsatisfactory) state for so long because:
>
> * making a change to an SDK changing its behaviour (even to "fix it")
> will make users unhappy about backwards/forwards version
> compatibility, and
> * allowing two language SDKs to implement the spec differently will
> make users unhappy about cross-platform, cross-language compatibility.
>
> In my opinion, with modern streaming and event processing, we have to
> take the latter into account!
>
> There were a couple of other options than the two you propose in the
> original discussion thread (such as putting human-readable or
> internationalised names in other metadata properties, or using
> configuration / environment / system properties to turn rigorous spec
> validation on and off).  Have you given them any consideration for
> your use case?
>
> My preference would be to *tighten* the SDKs to match the existing
> Avro spec, and provide language-specific ways to easily disable
> validating names if desired.  There's some precedence for this in the
> Schema.Parser#validate method.
>
> There's a bit more going on here that's worth doing right for the future!
>
> All my best, Ryan
>
>
>
> On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
> <os...@westravanholthe.nl> wrote:
> >
> > On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <ch...@gmail.com>
> wrote:
> >
> > > So, discussion is to choose between
> > >
> > >    1. "change the documentation" (and adapt module as proposed in this
> PR
> > >    for RUST <https://github.com/apache/avro/pull/1787> and this other
> for
> > > C
> > >    <https://github.com/apache/avro/pull/1798>)
> > >    2. change the code (in Java and C# at least) to be conformed to
> > >    documentation.
> > >
> >
> > For compatibility, I like option 1. If we're to change naming rules, I'd
> > vote for logging warnings before tightening the rules.
> >
> > Kind regards,
> > Oscar
> >
> > --
> >
> > ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>
>

Re: [DISCUSS][VOTE] Naming rules

Posted by Ryan Skraba <ry...@skraba.com>.

Hello!  Here's a couple of related JIRA from the past that we can use
to inform our discussion:

* AVRO-2659 demonstrates a pretty disastrous schema name that the Java
SDK accepts.
* AVRO-1022 (10 years ago!) has a somewhat tepid discussion about
accepting UTF-8 that didn't quite get enough follow-up to make it into
the spec!

We're been in the current (unsatisfactory) state for so long because:

* making a change to an SDK changing its behaviour (even to "fix it")
will make users unhappy about backwards/forwards version
compatibility, and
* allowing two language SDKs to implement the spec differently will
make users unhappy about cross-platform, cross-language compatibility.

In my opinion, with modern streaming and event processing, we have to
take the latter into account!

There were a couple of other options than the two you propose in the
original discussion thread (such as putting human-readable or
internationalised names in other metadata properties, or using
configuration / environment / system properties to turn rigorous spec
validation on and off).  Have you given them any consideration for
your use case?

My preference would be to *tighten* the SDKs to match the existing
Avro spec, and provide language-specific ways to easily disable
validating names if desired.  There's some precedence for this in the
Schema.Parser#validate method.

There's a bit more going on here that's worth doing right for the future!

All my best, Ryan

On Thu, Nov 10, 2022 at 4:53 PM Oscar Westra van Holthe - Kind
<os...@westravanholthe.nl> wrote:
>
> On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <ch...@gmail.com> wrote:
>
> > So, discussion is to choose between
> >
> >    1. "change the documentation" (and adapt module as proposed in this PR
> >    for RUST <https://github.com/apache/avro/pull/1787> and this other for
> > C
> >    <https://github.com/apache/avro/pull/1798>)
> >    2. change the code (in Java and C# at least) to be conformed to
> >    documentation.
> >
>
> For compatibility, I like option 1. If we're to change naming rules, I'd
> vote for logging warnings before tightening the rules.
>
> Kind regards,
> Oscar
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>

Re: [DISCUSS][VOTE] Naming rules

Posted by Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>.

On Thu, 10 Nov 2022 at 08:55, Christophe Le Saëc <ch...@gmail.com> wrote:

> So, discussion is to choose between
>
>    1. "change the documentation" (and adapt module as proposed in this PR
>    for RUST <https://github.com/apache/avro/pull/1787> and this other for
> C
>    <https://github.com/apache/avro/pull/1798>)
>    2. change the code (in Java and C# at least) to be conformed to
>    documentation.
>

For compatibility, I like option 1. If we're to change naming rules, I'd
vote for logging warnings before tightening the rules.

Kind regards,
Oscar

-- 

✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>