You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2014/04/01 16:47:14 UTC

Schema evolution in Gora

Hi Folks,
I've ended up in a conversation [0] over on user@avro regarding Schema
evolution.
Right now our workflow is as follows

 * write .avsc schema and use GoraCompiler to generate Persistent data
beans.
 * use the Persistent class whenever we wish to read to or write from the
data.

AFAICT, as explained in [0], this presents us with a problem. Namely that
we have very sketchy support to Schema evolution over time.

We narrowly avoided minor situation over in Nutch when we added a 'batchId'
Field to our WebPage Schema as some Tools when attempting to read Field's
which were simply not present for some records.

So this thread is opened to discussion surrounding what we can/must do to
improve this.
Should we store the Schema along with the data?
Should we store a Hash of the Schema along with the data?
Should we support Schema versioning?
Should we support Schema fingerprinting?

Of course this is something for the 0.5-SNAPSHOT development drive but it
is something which we need to sort out as time goes on.

Ta
Lewis

[0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html

-- 
*Lewis*

Re: Schema evolution in Gora

Posted by Henry Saputra <he...@gmail.com>.
+1 to start, but at this point there is no solution yet so looks like
it is open for solution proposal.


- Henry

On Tue, Jul 22, 2014 at 9:43 AM, Talat Uyarer <ta...@uyarer.com> wrote:
> Hi Folks,
>
> Wdyt ? We should solve this problem for stable deserialization and
> serialization. If we decide any solution, I can work on it. I have
> time.
>
> Talat
>
> 2014-04-10 14:48 GMT+03:00 Alparslan Avcı <al...@agmlab.com>:
>> Hi folks,
>>
>> I also think that "schema evolution over time" is an important problem that
>> we should handle. Because of this, it is really hard to extend the data
>> schema on any application which uses Gora. We've experienced this in Nutch.
>>
>> About proposedsolutions;
>>
>> - "Should we store the Schema along with the data?"-> IMHO, we should store
>> the schema but we should also discuss about the way that we store. Talat's
>> 'recipe' can be a good option for this, and moreover; I think of storing all
>> field schemas separately instead of storing persistent schema in one piece.
>> Although storing every field schema is more complex than storing only one
>> big persistent schema, it will give us more extensibility and ease at
>> back-compatibility. And again for field schemas, we should discuss the way
>> of storing (serialized/not serialized?, store to where?, etc.).
>>
>> - "Should we store a Hash of the Schema along with the data? Should we
>> support Schema versioning? Should we support Schema fingerprinting?" -> We
>> can need to support schema versioning, since it may help to compare
>> evaluated schemas. But if we store the schema, we won't need to store the
>> hash, or support fingerprinting, I think.
>>
>>
>> Alparslan
>>
>>
>>
>> On 08-04-2014 14:57, Talat Uyarer wrote:
>>>
>>> Hi all,
>>>
>>> IMHO we can store a NEW field called "recipe of persistent" about
>>> written record. The Recipe field store information of which field has
>>> been serialized with which serializer. It is stored as a serialized
>>> with string serializer. Every getting datas from store It is
>>> deserialized. And that object of data is generated from this recipe's
>>> schema. The recipe field store similar with persistent's schema but it
>>> has some different definition and extra information about fields. For
>>> example in schema of persistent has a union field similar to below:
>>>
>>> {"name": "name", "type": ["null","string"],"default":null}
>>>
>>> If it is serialized by string serializer. it is written in the recipe
>>> field
>>>
>>> {"name": "name", "type": "string","default":null}
>>>
>>> Thus name field can be deserialized without persistent's schema.
>>> Another benefit: If persistent's schema is changed, we can still
>>> deserialize without any information.
>>>
>>> I hope I can be understandable. :)
>>>
>>> Talat
>>>
>>> 2014-04-08 12:11 GMT+03:00 Henry Saputra <he...@gmail.com>:
>>>>
>>>> Technically it was named after a dog, hence the logo, which just happen
>>>> to
>>>> match that abbreviation :)
>>>>
>>>> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
>>>> renatoj.marroquin@gmail.com> wrote:
>>>>
>>>>> Hi Lewis,
>>>>>
>>>>> This is for sure a very interesting and something that GORA should deal
>>>>> with.
>>>>> It is funny that only now I found out that GORA actually means "Generic
>>>>> Object Representation using Avro". This means that we will always have
>>>>> to
>>>>> use Avro for everything? Never mind, we all can discuss about this when
>>>>> the
>>>>> time comes.
>>>>> For the little reading I did about data evolution,  :
>>>>> - Schema along with data -> This could be done in a similar way as we
>>>>> are
>>>>> approaching the union fields i.e. append an extra field to the data with
>>>>> its schema, deserialize the schema, and then check if the data can
>>>>> actually
>>>>> suffice the query or not. Of course this would be part of 0.5 :)
>>>>> - Hash of the Schema along with the data, Schema versioning, Schema
>>>>> fingerprinting ->
>>>>> This needs some way of looking up saved schemas (versions, hashes, or
>>>>> schema fingerprints).
>>>>>
>>>>>
>>>>> Renato M.
>>>>>
>>>>>
>>>>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney
>>>>> <lewis.mcgibbney@gmail.com<javascript:;>
>>>>>>
>>>>>> :
>>>>>> Hi Folks,
>>>>>> I've ended up in a conversation [0] over on user@avro regarding Schema
>>>>>> evolution.
>>>>>> Right now our workflow is as follows
>>>>>>
>>>>>>   * write .avsc schema and use GoraCompiler to generate Persistent data
>>>>>> beans.
>>>>>>   * use the Persistent class whenever we wish to read to or write from
>>>>>> the
>>>>>> data.
>>>>>>
>>>>>> AFAICT, as explained in [0], this presents us with a problem. Namely
>>>>>> that
>>>>>> we have very sketchy support to Schema evolution over time.
>>>>>>
>>>>>> We narrowly avoided minor situation over in Nutch when we added a
>>>>>
>>>>> 'batchId'
>>>>>>
>>>>>> Field to our WebPage Schema as some Tools when attempting to read
>>>>>> Field's
>>>>>> which were simply not present for some records.
>>>>>>
>>>>>> So this thread is opened to discussion surrounding what we can/must do
>>>>>> to
>>>>>> improve this.
>>>>>> Should we store the Schema along with the data?
>>>>>> Should we store a Hash of the Schema along with the data?
>>>>>> Should we support Schema versioning?
>>>>>> Should we support Schema fingerprinting?
>>>>>>
>>>>>> Of course this is something for the 0.5-SNAPSHOT development drive but
>>>>>> it
>>>>>> is something which we need to sort out as time goes on.
>>>>>>
>>>>>> Ta
>>>>>> Lewis
>>>>>>
>>>>>> [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
>>>>>>
>>>>>> --
>>>>>> *Lewis*
>>>>>>
>>>
>>>
>>
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Schema evolution in Gora

Posted by Talat Uyarer <ta...@uyarer.com>.
Hi Folks,

Wdyt ? We should solve this problem for stable deserialization and
serialization. If we decide any solution, I can work on it. I have
time.

Talat

2014-04-10 14:48 GMT+03:00 Alparslan Avcı <al...@agmlab.com>:
> Hi folks,
>
> I also think that "schema evolution over time" is an important problem that
> we should handle. Because of this, it is really hard to extend the data
> schema on any application which uses Gora. We've experienced this in Nutch.
>
> About proposedsolutions;
>
> - "Should we store the Schema along with the data?"-> IMHO, we should store
> the schema but we should also discuss about the way that we store. Talat's
> 'recipe' can be a good option for this, and moreover; I think of storing all
> field schemas separately instead of storing persistent schema in one piece.
> Although storing every field schema is more complex than storing only one
> big persistent schema, it will give us more extensibility and ease at
> back-compatibility. And again for field schemas, we should discuss the way
> of storing (serialized/not serialized?, store to where?, etc.).
>
> - "Should we store a Hash of the Schema along with the data? Should we
> support Schema versioning? Should we support Schema fingerprinting?" -> We
> can need to support schema versioning, since it may help to compare
> evaluated schemas. But if we store the schema, we won't need to store the
> hash, or support fingerprinting, I think.
>
>
> Alparslan
>
>
>
> On 08-04-2014 14:57, Talat Uyarer wrote:
>>
>> Hi all,
>>
>> IMHO we can store a NEW field called "recipe of persistent" about
>> written record. The Recipe field store information of which field has
>> been serialized with which serializer. It is stored as a serialized
>> with string serializer. Every getting datas from store It is
>> deserialized. And that object of data is generated from this recipe's
>> schema. The recipe field store similar with persistent's schema but it
>> has some different definition and extra information about fields. For
>> example in schema of persistent has a union field similar to below:
>>
>> {"name": "name", "type": ["null","string"],"default":null}
>>
>> If it is serialized by string serializer. it is written in the recipe
>> field
>>
>> {"name": "name", "type": "string","default":null}
>>
>> Thus name field can be deserialized without persistent's schema.
>> Another benefit: If persistent's schema is changed, we can still
>> deserialize without any information.
>>
>> I hope I can be understandable. :)
>>
>> Talat
>>
>> 2014-04-08 12:11 GMT+03:00 Henry Saputra <he...@gmail.com>:
>>>
>>> Technically it was named after a dog, hence the logo, which just happen
>>> to
>>> match that abbreviation :)
>>>
>>> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
>>> renatoj.marroquin@gmail.com> wrote:
>>>
>>>> Hi Lewis,
>>>>
>>>> This is for sure a very interesting and something that GORA should deal
>>>> with.
>>>> It is funny that only now I found out that GORA actually means "Generic
>>>> Object Representation using Avro". This means that we will always have
>>>> to
>>>> use Avro for everything? Never mind, we all can discuss about this when
>>>> the
>>>> time comes.
>>>> For the little reading I did about data evolution,  :
>>>> - Schema along with data -> This could be done in a similar way as we
>>>> are
>>>> approaching the union fields i.e. append an extra field to the data with
>>>> its schema, deserialize the schema, and then check if the data can
>>>> actually
>>>> suffice the query or not. Of course this would be part of 0.5 :)
>>>> - Hash of the Schema along with the data, Schema versioning, Schema
>>>> fingerprinting ->
>>>> This needs some way of looking up saved schemas (versions, hashes, or
>>>> schema fingerprints).
>>>>
>>>>
>>>> Renato M.
>>>>
>>>>
>>>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney
>>>> <lewis.mcgibbney@gmail.com<javascript:;>
>>>>>
>>>>> :
>>>>> Hi Folks,
>>>>> I've ended up in a conversation [0] over on user@avro regarding Schema
>>>>> evolution.
>>>>> Right now our workflow is as follows
>>>>>
>>>>>   * write .avsc schema and use GoraCompiler to generate Persistent data
>>>>> beans.
>>>>>   * use the Persistent class whenever we wish to read to or write from
>>>>> the
>>>>> data.
>>>>>
>>>>> AFAICT, as explained in [0], this presents us with a problem. Namely
>>>>> that
>>>>> we have very sketchy support to Schema evolution over time.
>>>>>
>>>>> We narrowly avoided minor situation over in Nutch when we added a
>>>>
>>>> 'batchId'
>>>>>
>>>>> Field to our WebPage Schema as some Tools when attempting to read
>>>>> Field's
>>>>> which were simply not present for some records.
>>>>>
>>>>> So this thread is opened to discussion surrounding what we can/must do
>>>>> to
>>>>> improve this.
>>>>> Should we store the Schema along with the data?
>>>>> Should we store a Hash of the Schema along with the data?
>>>>> Should we support Schema versioning?
>>>>> Should we support Schema fingerprinting?
>>>>>
>>>>> Of course this is something for the 0.5-SNAPSHOT development drive but
>>>>> it
>>>>> is something which we need to sort out as time goes on.
>>>>>
>>>>> Ta
>>>>> Lewis
>>>>>
>>>>> [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
>>>>>
>>>>> --
>>>>> *Lewis*
>>>>>
>>
>>
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Schema evolution in Gora

Posted by Alparslan Avcı <al...@agmlab.com>.
Hi folks,

I also think that "schema evolution over time" is an important problem 
that we should handle. Because of this, it is really hard to extend the 
data schema on any application which uses Gora. We've experienced this 
in Nutch.

About proposedsolutions;

- "Should we store the Schema along with the data?"-> IMHO, we should 
store the schema but we should also discuss about the way that we store. 
Talat's 'recipe' can be a good option for this, and moreover; I think of 
storing all field schemas separately instead of storing persistent 
schema in one piece. Although storing every field schema is more complex 
than storing only one big persistent schema, it will give us more 
extensibility and ease at back-compatibility. And again for field 
schemas, we should discuss the way of storing (serialized/not 
serialized?, store to where?, etc.).

- "Should we store a Hash of the Schema along with the data? Should we support Schema versioning? Should we support Schema fingerprinting?" -> We can need to support schema versioning, since it may help to compare evaluated schemas. But if we store the schema, we won't need to store the hash, or support fingerprinting, I think.


Alparslan


On 08-04-2014 14:57, Talat Uyarer wrote:
> Hi all,
>
> IMHO we can store a NEW field called "recipe of persistent" about
> written record. The Recipe field store information of which field has
> been serialized with which serializer. It is stored as a serialized
> with string serializer. Every getting datas from store It is
> deserialized. And that object of data is generated from this recipe's
> schema. The recipe field store similar with persistent's schema but it
> has some different definition and extra information about fields. For
> example in schema of persistent has a union field similar to below:
>
> {"name": "name", "type": ["null","string"],"default":null}
>
> If it is serialized by string serializer. it is written in the recipe field
>
> {"name": "name", "type": "string","default":null}
>
> Thus name field can be deserialized without persistent's schema.
> Another benefit: If persistent's schema is changed, we can still
> deserialize without any information.
>
> I hope I can be understandable. :)
>
> Talat
>
> 2014-04-08 12:11 GMT+03:00 Henry Saputra <he...@gmail.com>:
>> Technically it was named after a dog, hence the logo, which just happen to
>> match that abbreviation :)
>>
>> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
>> renatoj.marroquin@gmail.com> wrote:
>>
>>> Hi Lewis,
>>>
>>> This is for sure a very interesting and something that GORA should deal
>>> with.
>>> It is funny that only now I found out that GORA actually means "Generic
>>> Object Representation using Avro". This means that we will always have to
>>> use Avro for everything? Never mind, we all can discuss about this when the
>>> time comes.
>>> For the little reading I did about data evolution,  :
>>> - Schema along with data -> This could be done in a similar way as we are
>>> approaching the union fields i.e. append an extra field to the data with
>>> its schema, deserialize the schema, and then check if the data can actually
>>> suffice the query or not. Of course this would be part of 0.5 :)
>>> - Hash of the Schema along with the data, Schema versioning, Schema
>>> fingerprinting ->
>>> This needs some way of looking up saved schemas (versions, hashes, or
>>> schema fingerprints).
>>>
>>>
>>> Renato M.
>>>
>>>
>>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com<javascript:;>
>>>> :
>>>> Hi Folks,
>>>> I've ended up in a conversation [0] over on user@avro regarding Schema
>>>> evolution.
>>>> Right now our workflow is as follows
>>>>
>>>>   * write .avsc schema and use GoraCompiler to generate Persistent data
>>>> beans.
>>>>   * use the Persistent class whenever we wish to read to or write from the
>>>> data.
>>>>
>>>> AFAICT, as explained in [0], this presents us with a problem. Namely that
>>>> we have very sketchy support to Schema evolution over time.
>>>>
>>>> We narrowly avoided minor situation over in Nutch when we added a
>>> 'batchId'
>>>> Field to our WebPage Schema as some Tools when attempting to read Field's
>>>> which were simply not present for some records.
>>>>
>>>> So this thread is opened to discussion surrounding what we can/must do to
>>>> improve this.
>>>> Should we store the Schema along with the data?
>>>> Should we store a Hash of the Schema along with the data?
>>>> Should we support Schema versioning?
>>>> Should we support Schema fingerprinting?
>>>>
>>>> Of course this is something for the 0.5-SNAPSHOT development drive but it
>>>> is something which we need to sort out as time goes on.
>>>>
>>>> Ta
>>>> Lewis
>>>>
>>>> [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
>>>>
>>>> --
>>>> *Lewis*
>>>>
>
>


Re: Schema evolution in Gora

Posted by Talat Uyarer <ta...@uyarer.com>.
Hi all,

IMHO we can store a NEW field called "recipe of persistent" about
written record. The Recipe field store information of which field has
been serialized with which serializer. It is stored as a serialized
with string serializer. Every getting datas from store It is
deserialized. And that object of data is generated from this recipe's
schema. The recipe field store similar with persistent's schema but it
has some different definition and extra information about fields. For
example in schema of persistent has a union field similar to below:

{"name": "name", "type": ["null","string"],"default":null}

If it is serialized by string serializer. it is written in the recipe field

{"name": "name", "type": "string","default":null}

Thus name field can be deserialized without persistent's schema.
Another benefit: If persistent's schema is changed, we can still
deserialize without any information.

I hope I can be understandable. :)

Talat

2014-04-08 12:11 GMT+03:00 Henry Saputra <he...@gmail.com>:
> Technically it was named after a dog, hence the logo, which just happen to
> match that abbreviation :)
>
> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
>> Hi Lewis,
>>
>> This is for sure a very interesting and something that GORA should deal
>> with.
>> It is funny that only now I found out that GORA actually means "Generic
>> Object Representation using Avro". This means that we will always have to
>> use Avro for everything? Never mind, we all can discuss about this when the
>> time comes.
>> For the little reading I did about data evolution,  :
>> - Schema along with data -> This could be done in a similar way as we are
>> approaching the union fields i.e. append an extra field to the data with
>> its schema, deserialize the schema, and then check if the data can actually
>> suffice the query or not. Of course this would be part of 0.5 :)
>> - Hash of the Schema along with the data, Schema versioning, Schema
>> fingerprinting ->
>> This needs some way of looking up saved schemas (versions, hashes, or
>> schema fingerprints).
>>
>>
>> Renato M.
>>
>>
>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com<javascript:;>
>> >:
>>
>> > Hi Folks,
>> > I've ended up in a conversation [0] over on user@avro regarding Schema
>> > evolution.
>> > Right now our workflow is as follows
>> >
>> >  * write .avsc schema and use GoraCompiler to generate Persistent data
>> > beans.
>> >  * use the Persistent class whenever we wish to read to or write from the
>> > data.
>> >
>> > AFAICT, as explained in [0], this presents us with a problem. Namely that
>> > we have very sketchy support to Schema evolution over time.
>> >
>> > We narrowly avoided minor situation over in Nutch when we added a
>> 'batchId'
>> > Field to our WebPage Schema as some Tools when attempting to read Field's
>> > which were simply not present for some records.
>> >
>> > So this thread is opened to discussion surrounding what we can/must do to
>> > improve this.
>> > Should we store the Schema along with the data?
>> > Should we store a Hash of the Schema along with the data?
>> > Should we support Schema versioning?
>> > Should we support Schema fingerprinting?
>> >
>> > Of course this is something for the 0.5-SNAPSHOT development drive but it
>> > is something which we need to sort out as time goes on.
>> >
>> > Ta
>> > Lewis
>> >
>> > [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
>> >
>> > --
>> > *Lewis*
>> >
>>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Schema evolution in Gora

Posted by Henry Saputra <he...@gmail.com>.
Technically it was named after a dog, hence the logo, which just happen to
match that abbreviation :)

On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hi Lewis,
>
> This is for sure a very interesting and something that GORA should deal
> with.
> It is funny that only now I found out that GORA actually means "Generic
> Object Representation using Avro". This means that we will always have to
> use Avro for everything? Never mind, we all can discuss about this when the
> time comes.
> For the little reading I did about data evolution,  :
> - Schema along with data -> This could be done in a similar way as we are
> approaching the union fields i.e. append an extra field to the data with
> its schema, deserialize the schema, and then check if the data can actually
> suffice the query or not. Of course this would be part of 0.5 :)
> - Hash of the Schema along with the data, Schema versioning, Schema
> fingerprinting ->
> This needs some way of looking up saved schemas (versions, hashes, or
> schema fingerprints).
>
>
> Renato M.
>
>
> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com<javascript:;>
> >:
>
> > Hi Folks,
> > I've ended up in a conversation [0] over on user@avro regarding Schema
> > evolution.
> > Right now our workflow is as follows
> >
> >  * write .avsc schema and use GoraCompiler to generate Persistent data
> > beans.
> >  * use the Persistent class whenever we wish to read to or write from the
> > data.
> >
> > AFAICT, as explained in [0], this presents us with a problem. Namely that
> > we have very sketchy support to Schema evolution over time.
> >
> > We narrowly avoided minor situation over in Nutch when we added a
> 'batchId'
> > Field to our WebPage Schema as some Tools when attempting to read Field's
> > which were simply not present for some records.
> >
> > So this thread is opened to discussion surrounding what we can/must do to
> > improve this.
> > Should we store the Schema along with the data?
> > Should we store a Hash of the Schema along with the data?
> > Should we support Schema versioning?
> > Should we support Schema fingerprinting?
> >
> > Of course this is something for the 0.5-SNAPSHOT development drive but it
> > is something which we need to sort out as time goes on.
> >
> > Ta
> > Lewis
> >
> > [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
> >
> > --
> > *Lewis*
> >
>

Re: Schema evolution in Gora

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Lewis,

This is for sure a very interesting and something that GORA should deal
with.
It is funny that only now I found out that GORA actually means "Generic
Object Representation using Avro". This means that we will always have to
use Avro for everything? Never mind, we all can discuss about this when the
time comes.
For the little reading I did about data evolution,  :
- Schema along with data -> This could be done in a similar way as we are
approaching the union fields i.e. append an extra field to the data with
its schema, deserialize the schema, and then check if the data can actually
suffice the query or not. Of course this would be part of 0.5 :)
- Hash of the Schema along with the data, Schema versioning, Schema
fingerprinting ->
This needs some way of looking up saved schemas (versions, hashes, or
schema fingerprints).


Renato M.


2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney <le...@gmail.com>:

> Hi Folks,
> I've ended up in a conversation [0] over on user@avro regarding Schema
> evolution.
> Right now our workflow is as follows
>
>  * write .avsc schema and use GoraCompiler to generate Persistent data
> beans.
>  * use the Persistent class whenever we wish to read to or write from the
> data.
>
> AFAICT, as explained in [0], this presents us with a problem. Namely that
> we have very sketchy support to Schema evolution over time.
>
> We narrowly avoided minor situation over in Nutch when we added a 'batchId'
> Field to our WebPage Schema as some Tools when attempting to read Field's
> which were simply not present for some records.
>
> So this thread is opened to discussion surrounding what we can/must do to
> improve this.
> Should we store the Schema along with the data?
> Should we store a Hash of the Schema along with the data?
> Should we support Schema versioning?
> Should we support Schema fingerprinting?
>
> Of course this is something for the 0.5-SNAPSHOT development drive but it
> is something which we need to sort out as time goes on.
>
> Ta
> Lewis
>
> [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
>
> --
> *Lewis*
>