You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Bart Verwilst <li...@verwilst.be> on 2012/12/19 11:01:17 UTC

Changing Avro schemas for daily imports

Hello!

Every night, we fetch mysql rows with a timestamp of the day before, 
and store them into avro, creating a Y-M-d.avro file on HDFS daily.

This is the schema:

{
   "namespace": "asp",
   "type": "record",
   "name": "trace",
   "fields": [
     {
       "type": "long",
       "name": "id"
     },
     {
       "type": "long",
       "name": "timestamp"
     },
     {
       "type": [
         "int",
         "null"
       ],
       "name": "latitude"
     },
     {
       "type": [
         "int",
         "null"
       ],
       "name": "longitude"
     }
   ]
}

Now i would like to change timestamp so it can be null as well. The 
plan is to just change the timestamp type in the schema starting from 
the next day. I'm pretty sure it won't affect lookups in any way ( like 
when using Pig ), but I thought I would ask to be sure ( since the 
structure itself doesn't change, only the type ). I wouldn't want to run 
into a gotcha after months of importing with my adjusted schema. :)

Thanks in advance!

Kind regards,

Bart

Re: Changing Avro schemas for daily imports

Posted by Martin Kleppmann <ma...@rapportive.com>.
It depends how you are reading the data. It is possible to change
"type": "long" to "type": ["long", "null"], but you need to make sure
that any code reading the data can handle those nulls that may now
appear in the data.

For example, if you've generated Java code from the Avro schema for
reading the data, you should first update that reading code to the new
schema, before you start writing data in the new schema. A reader for
["long", "null"] can handle data written with schema "long", but not
vice versa.

With pig, since it's dynamically typed, I think it ought to just work.
Please someone correct me if I'm wrong.

Martin

On 19 December 2012 02:01, Bart Verwilst <li...@verwilst.be> wrote:
> Hello!
>
> Every night, we fetch mysql rows with a timestamp of the day before, and
> store them into avro, creating a Y-M-d.avro file on HDFS daily.
>
> This is the schema:
>
> {
>   "namespace": "asp",
>   "type": "record",
>   "name": "trace",
>   "fields": [
>     {
>       "type": "long",
>       "name": "id"
>     },
>     {
>       "type": "long",
>       "name": "timestamp"
>     },
>     {
>       "type": [
>         "int",
>         "null"
>       ],
>       "name": "latitude"
>     },
>     {
>       "type": [
>         "int",
>         "null"
>       ],
>       "name": "longitude"
>     }
>   ]
> }
>
> Now i would like to change timestamp so it can be null as well. The plan is
> to just change the timestamp type in the schema starting from the next day.
> I'm pretty sure it won't affect lookups in any way ( like when using Pig ),
> but I thought I would ask to be sure ( since the structure itself doesn't
> change, only the type ). I wouldn't want to run into a gotcha after months
> of importing with my adjusted schema. :)
>
> Thanks in advance!
>
> Kind regards,
>
> Bart