You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Bart Verwilst <li...@verwilst.be> on 2012/12/19 11:01:17 UTC
Changing Avro schemas for daily imports
Hello!
Every night, we fetch mysql rows with a timestamp of the day before,
and store them into avro, creating a Y-M-d.avro file on HDFS daily.
This is the schema:
{
"namespace": "asp",
"type": "record",
"name": "trace",
"fields": [
{
"type": "long",
"name": "id"
},
{
"type": "long",
"name": "timestamp"
},
{
"type": [
"int",
"null"
],
"name": "latitude"
},
{
"type": [
"int",
"null"
],
"name": "longitude"
}
]
}
Now i would like to change timestamp so it can be null as well. The
plan is to just change the timestamp type in the schema starting from
the next day. I'm pretty sure it won't affect lookups in any way ( like
when using Pig ), but I thought I would ask to be sure ( since the
structure itself doesn't change, only the type ). I wouldn't want to run
into a gotcha after months of importing with my adjusted schema. :)
Thanks in advance!
Kind regards,
Bart
Re: Changing Avro schemas for daily imports
Posted by Martin Kleppmann <ma...@rapportive.com>.
It depends how you are reading the data. It is possible to change
"type": "long" to "type": ["long", "null"], but you need to make sure
that any code reading the data can handle those nulls that may now
appear in the data.
For example, if you've generated Java code from the Avro schema for
reading the data, you should first update that reading code to the new
schema, before you start writing data in the new schema. A reader for
["long", "null"] can handle data written with schema "long", but not
vice versa.
With pig, since it's dynamically typed, I think it ought to just work.
Please someone correct me if I'm wrong.
Martin
On 19 December 2012 02:01, Bart Verwilst <li...@verwilst.be> wrote:
> Hello!
>
> Every night, we fetch mysql rows with a timestamp of the day before, and
> store them into avro, creating a Y-M-d.avro file on HDFS daily.
>
> This is the schema:
>
> {
> "namespace": "asp",
> "type": "record",
> "name": "trace",
> "fields": [
> {
> "type": "long",
> "name": "id"
> },
> {
> "type": "long",
> "name": "timestamp"
> },
> {
> "type": [
> "int",
> "null"
> ],
> "name": "latitude"
> },
> {
> "type": [
> "int",
> "null"
> ],
> "name": "longitude"
> }
> ]
> }
>
> Now i would like to change timestamp so it can be null as well. The plan is
> to just change the timestamp type in the schema starting from the next day.
> I'm pretty sure it won't affect lookups in any way ( like when using Pig ),
> but I thought I would ask to be sure ( since the structure itself doesn't
> change, only the type ). I wouldn't want to run into a gotcha after months
> of importing with my adjusted schema. :)
>
> Thanks in advance!
>
> Kind regards,
>
> Bart