You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Scott Belden <sc...@gmail.com> on 2021/06/01 02:58:12 UTC

Java implementation regarding references seems to be not in accordance with the specification

In the names section of the specification (
https://avro.apache.org/docs/current/spec.html#names) it currently states
the following:

References to previously defined names are as in the latter two cases
above: if they contain a dot they are a fullname, if they do not contain a
dot, the namespace is the namespace of the enclosing definition.

However, it seems currently the Java specification does not consider the
enclosing namespace. For example, see the following ticket:
https://issues.apache.org/jira/browse/AVRO-3118.

The subrecord "row" is being specified as a null namespace so the fullname
is just "row". However, the "field_b" in the original schema has type "row"
and according to the spec it doesn't have a dot in the name and so it
should use the enclosing namespace of "my_ns" and should have the
fullname of "my_ns.row". These two fullnames do not match and so the schema
should fail to parse since "my_ns.row" is not defined.

Changing the Java code to conform to the specification would be a breaking
change. Alternatively, the specification could be changed to say the
following:

References to previously defined names are always treated as fullnames. If
the reference does not have a dot then it is considered to have a null
namespace.

What do others think should be done?

-Scott

Re: Java implementation regarding references seems to be not in accordance with the specification

Posted by Spencer Nelson <s...@spencerwnelson.com>.
I think this proposed change would be an improvement.

The current spec reads ambiguously to me, because "the namespace of the
enclosing definition" isn't particularly clear. A few months ago, I asked
for precision on a related question - does a fully-qualified name imply
creation of a namespace? [0] - and got no answer. I also asked whether
"enclosing" is a transitive property that flows up through null-namespaced
types, and that wasn't clear either.

I think Scott's proposed change here is a lot clearer, and it's also much
easier to implement. The "namespace of the enclosing definition"
requirement makes name resolution context-dependent, so you need to keep a
stack to figure out how to resolve them during schema parsing. The proposal
changes that to a flat lookup table which is much simpler and faster.

The proposed change is also appealing because without it, it's impossible
to reference a null-namespaced type from within a namespaced type, since
named type references are plain strings and have no room for specifying a
null namespace.

I think backwards compatibility will be the hard part, though. Changing the
specification will be a breaking change for all non-Java implementations. I
don't know how rough that will be in practice.

[0]:
http://mail-archives.apache.org/mod_mbox/avro-dev/202103.mbox/%3cCAB6dobWX1=_FcTGvgM-d5r17PV_69u27TDZvLJMWc+aizowDiw@mail.gmail.com%3e


On Mon, May 31, 2021 at 7:58 PM Scott Belden <sc...@gmail.com> wrote:

> In the names section of the specification (
> https://avro.apache.org/docs/current/spec.html#names) it currently states
> the following:
>
> References to previously defined names are as in the latter two cases
> above: if they contain a dot they are a fullname, if they do not contain a
> dot, the namespace is the namespace of the enclosing definition.
>
> However, it seems currently the Java specification does not consider the
> enclosing namespace. For example, see the following ticket:
> https://issues.apache.org/jira/browse/AVRO-3118.
>
> The subrecord "row" is being specified as a null namespace so the fullname
> is just "row". However, the "field_b" in the original schema has type "row"
> and according to the spec it doesn't have a dot in the name and so it
> should use the enclosing namespace of "my_ns" and should have the
> fullname of "my_ns.row". These two fullnames do not match and so the schema
> should fail to parse since "my_ns.row" is not defined.
>
> Changing the Java code to conform to the specification would be a breaking
> change. Alternatively, the specification could be changed to say the
> following:
>
> References to previously defined names are always treated as fullnames. If
> the reference does not have a dot then it is considered to have a null
> namespace.
>
> What do others think should be done?
>
> -Scott
>