You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Raymie Stata <ra...@gmail.com> on 2018/11/30 06:23:23 UTC

Radical idea for 1.9.0

I understand we've been willing to introduce backward-incompatible API
changes (not file-format changes) into minor release versions.  If so,
here's an idea for consideration:

Let's eliminate recursive records from Avro 1.9.x.  Recursion
introduces a _lot_ of complexity into many parts of the Avro code
base.  We could vastly simplify the code base, and probably speed
things up, by getting rid of this feature.

The specific proposal would be that Avro 1.9.x would refuse to accept
recursive records, and thus would not be able to read binary files
written by older versions of Avro.  I haven't heard of anyone actually
using them, so I don't think this would be a problem.

Re: Radical idea for 1.9.0

Posted by Doug Cutting <cu...@gmail.com>.
First, this creates a data incompatibility, not just an API
incompatibility, so it should not be permitted in 1.9.0.  Apps that worked,
even when updated for API changes, will not be able to read data they could
before they upgraded.

Second, folks might actually use this feature in reasonable ways.  For
example, Avro provides a recursive schema for arbitrary Json data is in the
class org.apache.avro.data.Json.  This class is used by at least some code
on Github.

https://github.com/search?q=%22org.apache.avro.data.Json%22&type=Code

Many of these are probably dead code, or forks of Avro itself, but a few
appear to be valid uses.

Does anyone reading this list use this feature?

Thanks,

Doug

On Fri, Nov 30, 2018 at 10:22 AM Raymie Stata <ra...@gmail.com> wrote:

> I understand we've been willing to introduce backward-incompatible API
> changes (not file-format changes) into minor release versions.  If so,
> here's an idea for consideration:
>
> Let's eliminate recursive records from Avro 1.9.x.  Recursion
> introduces a _lot_ of complexity into many parts of the Avro code
> base.  We could vastly simplify the code base, and probably speed
> things up, by getting rid of this feature.
>
> The specific proposal would be that Avro 1.9.x would refuse to accept
> recursive records, and thus would not be able to read binary files
> written by older versions of Avro.  I haven't heard of anyone actually
> using them, so I don't think this would be a problem.
>

Re: Radical idea for 1.9.0

Posted by Raymie Stata <rs...@yahoo.com.INVALID>.
(Thanks for the [Discuss] tip.)

Recursion is primarily a code-maintenance problem.  I'm not sure how
to quantify the complexity, but certainly Schema.java itself has a lot
of logic in it to deal with recursion, as do all the "Grammar
Generator" classes plus Generic Data -- the classes that are
performance sensitive.

I don't think that recursion is inherently expensive performance-wise,
but by making the encoding/resolution/decoding logic unnecessarily
complicated, it makes it difficult to implement more aggressive
strategies for higher-performance (e.g., dynamic code generation).  On
a similar note, the complexity of recursion could make it hard to add
new features to Avro.

I've yet to see compelling uses of recursion surface in this thread.
Perhaps we deprecate recursion in 1.9, with the goal of eliminating it
in 1.10?  (Specifically, we write error messages to stderr when we
parse recursive types -- with a flag to silence those message in case
they get in someone's way.)  If this deprecation creates howls of
complaint because the feature is more useful than this thread seems to
suggest, then we can keep it in.


On Tue, Dec 4, 2018 at 9:25 AM Sean Busbey <bu...@cloudera.com.invalid> wrote:
>
> In the future please use "[DISCUSS]" at the start of your subject line
> for these kinds of proposals. that'll get more folks to see the
> discussion, e.g. when they filter this list.
>
> as a point of clarification, 1.9.0 is a major version change for the
> Avro project. the "1" is a file format version. that's why API
> incompatibilities are allowed in a new 1.y version. As Doug mentioned,
> "would not be able to read binary files" means you're talking about a
> file format incompatibility, which I don't think we should do.
>
> Are you interested in removing this feature to improve code
> maintenance or to improve performance? both?
>
> Can you quantify the amount of complexity you're referring to?
> On Fri, Nov 30, 2018 at 12:22 PM Raymie Stata <ra...@gmail.com> wrote:
> >
> > I understand we've been willing to introduce backward-incompatible API
> > changes (not file-format changes) into minor release versions.  If so,
> > here's an idea for consideration:
> >
> > Let's eliminate recursive records from Avro 1.9.x.  Recursion
> > introduces a _lot_ of complexity into many parts of the Avro code
> > base.  We could vastly simplify the code base, and probably speed
> > things up, by getting rid of this feature.
> >
> > The specific proposal would be that Avro 1.9.x would refuse to accept
> > recursive records, and thus would not be able to read binary files
> > written by older versions of Avro.  I haven't heard of anyone actually
> > using them, so I don't think this would be a problem.
>
>
>
> --
> busbey

Re: Radical idea for 1.9.0

Posted by Sean Busbey <bu...@cloudera.com.INVALID>.
In the future please use "[DISCUSS]" at the start of your subject line
for these kinds of proposals. that'll get more folks to see the
discussion, e.g. when they filter this list.

as a point of clarification, 1.9.0 is a major version change for the
Avro project. the "1" is a file format version. that's why API
incompatibilities are allowed in a new 1.y version. As Doug mentioned,
"would not be able to read binary files" means you're talking about a
file format incompatibility, which I don't think we should do.

Are you interested in removing this feature to improve code
maintenance or to improve performance? both?

Can you quantify the amount of complexity you're referring to?
On Fri, Nov 30, 2018 at 12:22 PM Raymie Stata <ra...@gmail.com> wrote:
>
> I understand we've been willing to introduce backward-incompatible API
> changes (not file-format changes) into minor release versions.  If so,
> here's an idea for consideration:
>
> Let's eliminate recursive records from Avro 1.9.x.  Recursion
> introduces a _lot_ of complexity into many parts of the Avro code
> base.  We could vastly simplify the code base, and probably speed
> things up, by getting rid of this feature.
>
> The specific proposal would be that Avro 1.9.x would refuse to accept
> recursive records, and thus would not be able to read binary files
> written by older versions of Avro.  I haven't heard of anyone actually
> using them, so I don't think this would be a problem.



-- 
busbey

Re: Radical idea for 1.9.0

Posted by Rob Turner <ro...@gmail.com>.
I think that recursive schemas are a powerful feature of Avro that enable
the modelling of hierarchies,
a pretty common data structure. l myself have used this feature in a large
scale system before.
The recursion is handled elegantly and naturally in the code with recursive
functions so I don't think
it adds much complexity.

On Fri, 30 Nov 2018 at 22:12, Michael A. Smith <mi...@smith-li.com> wrote:

> I’m against this proposal. Sure, recursion adds complexity, but recursive
> types are also extremely powerful and one of the most interesting features
> of a tool like this. I have been experimenting in the other direction;
> considering a way to compose avro schema descriptions in avro. Recursion is
> crucial for that, so that we can lay out possible types as a union of named
> types that include themselves.
>
> Granted, it’s an experiment that would also imply a compatibility break
> with previous avros, but it opens what I think is an interesting set of
> doors instead of closing them.
>
> My 2 cents.
>
> On Fri, Nov 30, 2018 at 13:22 Raymie Stata <ra...@gmail.com> wrote:
>
> > I understand we've been willing to introduce backward-incompatible API
> > changes (not file-format changes) into minor release versions.  If so,
> > here's an idea for consideration:
> >
> > Let's eliminate recursive records from Avro 1.9.x.  Recursion
> > introduces a _lot_ of complexity into many parts of the Avro code
> > base.  We could vastly simplify the code base, and probably speed
> > things up, by getting rid of this feature.
> >
> > The specific proposal would be that Avro 1.9.x would refuse to accept
> > recursive records, and thus would not be able to read binary files
> > written by older versions of Avro.  I haven't heard of anyone actually
> > using them, so I don't think this would be a problem.
> >
>

Re: Radical idea for 1.9.0

Posted by Dhasharath Shrivathsa <dh...@radix.bio>.
Perhaps I don’t understand you. What is “a recursive record”? AFAIU real
data can not be recursive. (I would love to be shown to be wrong about
this.)

Real data is very recursive, consider the canonical definitons for
List/Tree, and anything with cons/cdr
https://en.wikipedia.org/wiki/Cons

JSON is a recursive record. At any point, a Json object contains a map from
a string to a Json object. The recursion is there for most AST like
descriptions.

Denormalizing this data means that you'd insert parent/child pointers and
emit multiple messages, but this kinda sucks, since you'd end up writing
something like a WITH RECURSIVE SQL query or similar to be able to unmelt
the data back into it's recursive form.

Incedentally, with the stuff in AVRO-530, you can write a transform to take
a true recursive type and turn it into a sorta-recursive Avro type. See my
issue here: https://github.com/sksamuel/avro4s/issues/307 since generalized
to arbitrary fixpoint types.

Instead of straight indexing, to index lower than the toplevel the thing to
use would be a F-Algebra/visitor pattern, as well as binary serialization.
Most data that's not recursive would simply apply the F-Algebra once, but in
the case of recursion, you'd apply it multiple times to generate the binary
serialization, and annotate with something like a coelgot algebra to give
you a toplevel schema.
Between AVRO-530 + AVRO-248, the sketch of what to do is already there.
I'm confident I could do this in Haskell/Scala, but I don't know Java so
can't contribute to Avro.






--
Sent from: http://apache-avro.679487.n3.nabble.com/Avro-Developers-f679485.html

Re: Radical idea for 1.9.0

Posted by "Michael A. Smith" <mi...@smith-li.com>.
> using Avro
to store millions to billions of non-recursive records

Perhaps I don’t understand you. What is “a recursive record”? AFAIU real
data can not be recursive. (I would love to be shown to be wrong about
this.) Given any real data, proportional coffee, sandwiches and time, I can
write an avro schema for that data without using recursion.

However, for some real data, writing such a schema would be onerous,
repetitive and error prone. Updating that schema would be likewise a chore.

So, do we want to make the avro codebase easier to support, or do we want
to make (some not-insignificant fraction of) avro schema easier to write?

I’m in the latter camp.

-Michael Smith

On Sat, Dec 1, 2018 at 06:29 Raymie Stata <rs...@yahoo.com.invalid> wrote:

> (Keep the following in mind: perhaps 95%+ of Avro users do not depend
> on recursion but don't understand the opportunity costs of maintaining
> it (and thus won't speak up on this thread); the remaining 5% who
> depend on recursion are highly motivated to speak out against it's
> removal.  In such a world, I'm speaking for 95% of the Avro community
> :-)
>
> So far, the main justification of retaining recursion in Avro seems to
> be that it allows Avro to be a binary representation of JSON using the
> schema "share/schemas/org/apache/avro/data/Json.avsc".  This
> justification is a bit odd.  The founding philosophy of Avro is that
> data should have schemas (and those schemas should be able to evolve).
> The Avro-as-binary-rep-for-JSON argument is really the following:
> "recursion in Avro is good because it allows us to model JSON which
> allows us to model data with no schema."  Grumble.
>
> But let's leave aside philosophical arguments regarding static vs
> dynamic typing.  Let's consider communities. Json.avsc was committed
> in 2011 and hasn't changed since.  Clients who are using Avro as a
> binary representation of JSON can continue to depend on the 1.8.x line
> of Avro.  If that community of users is big-enough/mission-critical
> enough to maintain 1.8.x going forward, then all power to them. 1.8.x
> can live forever.
>
> In the meantime, for the 95% of the Avro community that is using Avro
> to store millions to billions of non-recursive records, and who are
> tired with putting up with the (opportunity) costs of supporting
> recursion that they never use, let's move on.
>
> On Fri, Nov 30, 2018 at 2:33 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
> >
> > I've used recursion in the past to use Avro to get a binary
> representation
> > of JSON. Given the popularity of JSON and the fact that Avro includes
> > support for converting it, I think it makes sense to continue allowing
> > recursive schemas.
> >
> > On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith <mi...@smith-li.com>
> > wrote:
> >
> > > I’m against this proposal. Sure, recursion adds complexity, but
> recursive
> > > types are also extremely powerful and one of the most interesting
> features
> > > of a tool like this. I have been experimenting in the other direction;
> > > considering a way to compose avro schema descriptions in avro.
> Recursion is
> > > crucial for that, so that we can lay out possible types as a union of
> named
> > > types that include themselves.
> > >
> > > Granted, it’s an experiment that would also imply a compatibility break
> > > with previous avros, but it opens what I think is an interesting set of
> > > doors instead of closing them.
> > >
> > > My 2 cents.
> > >
> > > On Fri, Nov 30, 2018 at 13:22 Raymie Stata <ra...@gmail.com> wrote:
> > >
> > > > I understand we've been willing to introduce backward-incompatible
> API
> > > > changes (not file-format changes) into minor release versions.  If
> so,
> > > > here's an idea for consideration:
> > > >
> > > > Let's eliminate recursive records from Avro 1.9.x.  Recursion
> > > > introduces a _lot_ of complexity into many parts of the Avro code
> > > > base.  We could vastly simplify the code base, and probably speed
> > > > things up, by getting rid of this feature.
> > > >
> > > > The specific proposal would be that Avro 1.9.x would refuse to accept
> > > > recursive records, and thus would not be able to read binary files
> > > > written by older versions of Avro.  I haven't heard of anyone
> actually
> > > > using them, so I don't think this would be a problem.
> > > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>

Re: Radical idea for 1.9.0

Posted by Raymie Stata <rs...@yahoo.com.INVALID>.
(Keep the following in mind: perhaps 95%+ of Avro users do not depend
on recursion but don't understand the opportunity costs of maintaining
it (and thus won't speak up on this thread); the remaining 5% who
depend on recursion are highly motivated to speak out against it's
removal.  In such a world, I'm speaking for 95% of the Avro community
:-)

So far, the main justification of retaining recursion in Avro seems to
be that it allows Avro to be a binary representation of JSON using the
schema "share/schemas/org/apache/avro/data/Json.avsc".  This
justification is a bit odd.  The founding philosophy of Avro is that
data should have schemas (and those schemas should be able to evolve).
The Avro-as-binary-rep-for-JSON argument is really the following:
"recursion in Avro is good because it allows us to model JSON which
allows us to model data with no schema."  Grumble.

But let's leave aside philosophical arguments regarding static vs
dynamic typing.  Let's consider communities. Json.avsc was committed
in 2011 and hasn't changed since.  Clients who are using Avro as a
binary representation of JSON can continue to depend on the 1.8.x line
of Avro.  If that community of users is big-enough/mission-critical
enough to maintain 1.8.x going forward, then all power to them. 1.8.x
can live forever.

In the meantime, for the 95% of the Avro community that is using Avro
to store millions to billions of non-recursive records, and who are
tired with putting up with the (opportunity) costs of supporting
recursion that they never use, let's move on.

On Fri, Nov 30, 2018 at 2:33 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
>
> I've used recursion in the past to use Avro to get a binary representation
> of JSON. Given the popularity of JSON and the fact that Avro includes
> support for converting it, I think it makes sense to continue allowing
> recursive schemas.
>
> On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith <mi...@smith-li.com>
> wrote:
>
> > I’m against this proposal. Sure, recursion adds complexity, but recursive
> > types are also extremely powerful and one of the most interesting features
> > of a tool like this. I have been experimenting in the other direction;
> > considering a way to compose avro schema descriptions in avro. Recursion is
> > crucial for that, so that we can lay out possible types as a union of named
> > types that include themselves.
> >
> > Granted, it’s an experiment that would also imply a compatibility break
> > with previous avros, but it opens what I think is an interesting set of
> > doors instead of closing them.
> >
> > My 2 cents.
> >
> > On Fri, Nov 30, 2018 at 13:22 Raymie Stata <ra...@gmail.com> wrote:
> >
> > > I understand we've been willing to introduce backward-incompatible API
> > > changes (not file-format changes) into minor release versions.  If so,
> > > here's an idea for consideration:
> > >
> > > Let's eliminate recursive records from Avro 1.9.x.  Recursion
> > > introduces a _lot_ of complexity into many parts of the Avro code
> > > base.  We could vastly simplify the code base, and probably speed
> > > things up, by getting rid of this feature.
> > >
> > > The specific proposal would be that Avro 1.9.x would refuse to accept
> > > recursive records, and thus would not be able to read binary files
> > > written by older versions of Avro.  I haven't heard of anyone actually
> > > using them, so I don't think this would be a problem.
> > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

Re: Radical idea for 1.9.0

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I've used recursion in the past to use Avro to get a binary representation
of JSON. Given the popularity of JSON and the fact that Avro includes
support for converting it, I think it makes sense to continue allowing
recursive schemas.

On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith <mi...@smith-li.com>
wrote:

> I’m against this proposal. Sure, recursion adds complexity, but recursive
> types are also extremely powerful and one of the most interesting features
> of a tool like this. I have been experimenting in the other direction;
> considering a way to compose avro schema descriptions in avro. Recursion is
> crucial for that, so that we can lay out possible types as a union of named
> types that include themselves.
>
> Granted, it’s an experiment that would also imply a compatibility break
> with previous avros, but it opens what I think is an interesting set of
> doors instead of closing them.
>
> My 2 cents.
>
> On Fri, Nov 30, 2018 at 13:22 Raymie Stata <ra...@gmail.com> wrote:
>
> > I understand we've been willing to introduce backward-incompatible API
> > changes (not file-format changes) into minor release versions.  If so,
> > here's an idea for consideration:
> >
> > Let's eliminate recursive records from Avro 1.9.x.  Recursion
> > introduces a _lot_ of complexity into many parts of the Avro code
> > base.  We could vastly simplify the code base, and probably speed
> > things up, by getting rid of this feature.
> >
> > The specific proposal would be that Avro 1.9.x would refuse to accept
> > recursive records, and thus would not be able to read binary files
> > written by older versions of Avro.  I haven't heard of anyone actually
> > using them, so I don't think this would be a problem.
> >
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Radical idea for 1.9.0

Posted by "Michael A. Smith" <mi...@smith-li.com>.
I’m against this proposal. Sure, recursion adds complexity, but recursive
types are also extremely powerful and one of the most interesting features
of a tool like this. I have been experimenting in the other direction;
considering a way to compose avro schema descriptions in avro. Recursion is
crucial for that, so that we can lay out possible types as a union of named
types that include themselves.

Granted, it’s an experiment that would also imply a compatibility break
with previous avros, but it opens what I think is an interesting set of
doors instead of closing them.

My 2 cents.

On Fri, Nov 30, 2018 at 13:22 Raymie Stata <ra...@gmail.com> wrote:

> I understand we've been willing to introduce backward-incompatible API
> changes (not file-format changes) into minor release versions.  If so,
> here's an idea for consideration:
>
> Let's eliminate recursive records from Avro 1.9.x.  Recursion
> introduces a _lot_ of complexity into many parts of the Avro code
> base.  We could vastly simplify the code base, and probably speed
> things up, by getting rid of this feature.
>
> The specific proposal would be that Avro 1.9.x would refuse to accept
> recursive records, and thus would not be able to read binary files
> written by older versions of Avro.  I haven't heard of anyone actually
> using them, so I don't think this would be a problem.
>