You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Jeff Hammerbacher (JIRA)" <ji...@apache.org> on 2010/02/18 21:23:27 UTC

[jira] Commented: (AVRO-419) Consistent laziness when resolving partially-compatible changes

    [ https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835383#action_12835383 ] 

Jeff Hammerbacher commented on AVRO-419:
----------------------------------------

Hey Raymie, your description appears to have been truncated ("I'm not sure if there are other a" is the end). I'd love to see the rest of the description, if you care to post it here.

> Consistent laziness when resolving partially-compatible changes
> ---------------------------------------------------------------
>
>                 Key: AVRO-419
>                 URL: https://issues.apache.org/jira/browse/AVRO-419
>             Project: Avro
>          Issue Type: Bug
>          Components: spec
>            Reporter: Raymie Stata
>
> Avro schema resolution is generally "lazy" when it comes to dealing with incompatible changes.  If the writer writes a union of "int" and "null", and the reader expects just an "int", Avro doesn't raise an exception unless the writer _actually_ writes a "null" (and the reader attempts to read it).
> This laziness is a powerful feature for supporting "forward compatibility" (old readers reading data written by new writers).  In the example just given, for example, we might decide at some point that a column needs to be "nullable" but there's a lot of old code that assumes that it's not.  When using old code, we can often arrange to avoid sending the old code any new records that have null-values in that column.  It's powerful to allow new writers to write against the nullable schema and allow readers to read those records.  (For this to be safe, it's also important that this be _checked,_ i.e., that a run time error is thrown is a bad value is passed to the reader.)
> Avro is lazy in many places (e.g., in the union example just given, and for enumerations).  But it's not _consistently_ lazy.  I propose we comb through the spec and make it lazy in all places we can, unless there's a compelling reason not to.
> Numeric types is one area where Avro is not consistently lazy.  I propose that we fairly liberally allow any change from one numeric type to another, and raise errors at runtime if bad values are found.  An "int" can be changed to a "long", for example, and an error is raised when a reader gets an out-of-bounds value.  A "double" can be changed to an "int", and an error is raised if the reader gets a non-integer value or an out-of-bounds value.  (I'm not sure if there are types beyond numerics where we could be more consistently lazy, but I decided to write this issue generically just in case.)
> (One might object that these checks are expensive, but note that they are only needed when the reader and writer specs don't agree.  Thus, if these checks are induced, then the system designer _wanted_ these checks, we're only adding value here, not inducing costs.)
> I'm not sure if there are other a

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.