You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2016/09/07 20:45:19 UTC

type system evolution and should deserialization accommodate missing sofa refs?

With version 2.8.1 and 2.9.0, some tightening and clarification of error
messages occurred around attempts to add a FS (Feature Structure) to indexes.

In particular, a FS which is a subtype of AnnotationBase can only be added to
the view which corresponds to the view specified by the sofaRef feature that
comes with AnnotationBase.

This has tripped up some users, who are slowly evolving their type system, in
the presence of some previously serialized data.

They were using xmi serialization, and normal evolution of type systems that
merely adds or subtracts features works fine (if in lenient mode for the
"subtracts" case).  Features added are given default values - usually null or 0
- if not specified in the xmi serialization.

This fails, though, for changing a type's supertype from, e.g., TOP to
Annotation, which adds the new sofaRef feature.  The deserialization happens ok,
setting the new feature to null, but if that instance is indexed, the new,
tightened checks at add-to-index time throw an exception because the sofaRef
(value is null) doesn't match that of the Cas View.

We could detect this as a "special case", and if some context flag was set (say
by xmi deserialization going on), we could, if the sofaRef was null, set it to
the presumably correct value (corresponding to the view it was being indexed in).

Does this sound like a good thing to do?

-Marshall

Re: type system evolution and should deserialization accommodate missing sofa refs?

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 08.09.2016, at 14:58, Marshall Schor <ms...@schor.com> wrote:
> 
> The common way to avoid passing context parameters down into the bowels... is to
> use thread local variables, which in this case would not be accessed unless the
> "error" condition occurred, so no performance hit in the normal case.

I'm not worried so much about the performance or the way in which the parameter
is passed down. I'm more concerned about what seems to be a non-obvious long-ranging
architectural dependency. Hence the preference to handle bad legacy data close to the
ingestion point and not close to the heart of the framework.

Cheers,

-- Richard

Re: type system evolution and should deserialization accommodate missing sofa refs?

Posted by Marshall Schor <ms...@schor.com>.

The common way to avoid passing context parameters down into the bowels... is to
use thread local variables, which in this case would not be accessed unless the
"error" condition occurred, so no performance hit in the normal case.

But, in any case, I'm thinking that this might be a more general issue than just
XMI deserialization (XCAS, for instance could have a similar issue I think).

Also, FSs created with low level APIs could end up missing this feature.

The only downside to automatically setting it in this case, it seems, is lack of
error reporting if 2 errors occur simultaneously:

1) failing to set the feature
2) adding the FS to the wrong index

Trading this off against the (what seems to me a much more likely) scenario of
users evolving the type system, I'm in favor of fixing this by always setting
the right sofa feature value if the existing one is not set at all.  This, of
course, still leaves the check that was originally intended, of stopping the
accidental add-to-indexes for FSs which are subtypes of AnnotationBase, in a
view different from the one the annotation is over.

I've put in a Jira (https://issues.apache.org/jira/browse/UIMA-5102) for this to
track other opinions :-) and possible action.

-Marshall

On 9/8/2016 5:35 AM, Richard Eckart de Castilho wrote:
> Couldn't the (XMI) deserializer handle that locally as part of the deserialization
> instead of having to pass some context parameter down into the bowels of the 
> framework?
>
> Cheers,
>
> -- Richard
>
>> On 08.09.2016, at 10:23, Peter Kl�gl <pe...@averbis.com> wrote:
>>
>> sounds good to me
>>
>> Peter
>>
>>
>> Am 07.09.2016 um 22:45 schrieb Marshall Schor:
>>> With version 2.8.1 and 2.9.0, some tightening and clarification of error
>>> messages occurred around attempts to add a FS (Feature Structure) to indexes.
>>>
>>> In particular, a FS which is a subtype of AnnotationBase can only be added to
>>> the view which corresponds to the view specified by the sofaRef feature that
>>> comes with AnnotationBase.
>>>
>>> This has tripped up some users, who are slowly evolving their type system, in
>>> the presence of some previously serialized data.
>>>
>>> They were using xmi serialization, and normal evolution of type systems that
>>> merely adds or subtracts features works fine (if in lenient mode for the
>>> "subtracts" case).  Features added are given default values - usually null or 0
>>> - if not specified in the xmi serialization.
>>>
>>> This fails, though, for changing a type's supertype from, e.g., TOP to
>>> Annotation, which adds the new sofaRef feature.  The deserialization happens ok,
>>> setting the new feature to null, but if that instance is indexed, the new,
>>> tightened checks at add-to-index time throw an exception because the sofaRef
>>> (value is null) doesn't match that of the Cas View.
>>>
>>> We could detect this as a "special case", and if some context flag was set (say
>>> by xmi deserialization going on), we could, if the sofaRef was null, set it to
>>> the presumably correct value (corresponding to the view it was being indexed in).
>>>
>>> Does this sound like a good thing to do?
>>>
>>> -Marshall

Re: type system evolution and should deserialization accommodate missing sofa refs?

Posted by Richard Eckart de Castilho <re...@apache.org>.

Couldn't the (XMI) deserializer handle that locally as part of the deserialization
instead of having to pass some context parameter down into the bowels of the 
framework?

Cheers,

-- Richard

> On 08.09.2016, at 10:23, Peter Klügl <pe...@averbis.com> wrote:
> 
> sounds good to me
> 
> Peter
> 
> 
> Am 07.09.2016 um 22:45 schrieb Marshall Schor:
>> With version 2.8.1 and 2.9.0, some tightening and clarification of error
>> messages occurred around attempts to add a FS (Feature Structure) to indexes.
>> 
>> In particular, a FS which is a subtype of AnnotationBase can only be added to
>> the view which corresponds to the view specified by the sofaRef feature that
>> comes with AnnotationBase.
>> 
>> This has tripped up some users, who are slowly evolving their type system, in
>> the presence of some previously serialized data.
>> 
>> They were using xmi serialization, and normal evolution of type systems that
>> merely adds or subtracts features works fine (if in lenient mode for the
>> "subtracts" case).  Features added are given default values - usually null or 0
>> - if not specified in the xmi serialization.
>> 
>> This fails, though, for changing a type's supertype from, e.g., TOP to
>> Annotation, which adds the new sofaRef feature.  The deserialization happens ok,
>> setting the new feature to null, but if that instance is indexed, the new,
>> tightened checks at add-to-index time throw an exception because the sofaRef
>> (value is null) doesn't match that of the Cas View.
>> 
>> We could detect this as a "special case", and if some context flag was set (say
>> by xmi deserialization going on), we could, if the sofaRef was null, set it to
>> the presumably correct value (corresponding to the view it was being indexed in).
>> 
>> Does this sound like a good thing to do?
>> 
>> -Marshall

Re: type system evolution and should deserialization accommodate missing sofa refs?

Posted by Peter Klügl <pe...@averbis.com>.

sounds good to me


Peter


Am 07.09.2016 um 22:45 schrieb Marshall Schor:
> With version 2.8.1 and 2.9.0, some tightening and clarification of error
> messages occurred around attempts to add a FS (Feature Structure) to indexes.
>
> In particular, a FS which is a subtype of AnnotationBase can only be added to
> the view which corresponds to the view specified by the sofaRef feature that
> comes with AnnotationBase.
>
> This has tripped up some users, who are slowly evolving their type system, in
> the presence of some previously serialized data.
>
> They were using xmi serialization, and normal evolution of type systems that
> merely adds or subtracts features works fine (if in lenient mode for the
> "subtracts" case).  Features added are given default values - usually null or 0
> - if not specified in the xmi serialization.
>
> This fails, though, for changing a type's supertype from, e.g., TOP to
> Annotation, which adds the new sofaRef feature.  The deserialization happens ok,
> setting the new feature to null, but if that instance is indexed, the new,
> tightened checks at add-to-index time throw an exception because the sofaRef
> (value is null) doesn't match that of the Cas View.
>
> We could detect this as a "special case", and if some context flag was set (say
> by xmi deserialization going on), we could, if the sofaRef was null, set it to
> the presumably correct value (corresponding to the view it was being indexed in).
>
> Does this sound like a good thing to do?
>
> -Marshall
>