You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2017/11/13 19:38:59 UTC

CAS consistency checks

Hi all,

I am wondering if it would be feasible to add some "addToIndex" hooks to UIMA which would be called whenever a FS is added to indexes.

Why?

There may be conventions on the type system and feature values that cannot be expressed simply via the type system definition. For example, in DKPro Core, there is a convention that the begin/end of a Dependency annotation must match the begin/end of the Token that is referred to by the "dependent" feature. There are a number of additional conventions like these.

It would probably be possible to use customized JCas classes to check for such conventions, but that of course would fail if the feature structures are created/manipulated via the CAS API. Also, not everybody is fond of customized (i.e. not auto-generated) JCas classes.

Has anybody already through of supporting such "consistency checks" at the CAS level?

How are you ensuring consistent information in your CASes?

Cheers,

-- Richard

Re: CAS consistency checks

Posted by Marshall Schor <ms...@schor.com>.
Haven't thought about this enough, yet.

I'll just note here for the record that there exist actions a user can take
which can cause "hidden" add to indexes.  Example, a user modifies a feature,
and it turns out there's an index defined for the pipeline which is using this
as an index key.  If the FS is in the indexes, UIMA will automatically remove
the FS from the indexes, do the modification, and then add back the FS.

-M


On 11/13/2017 2:38 PM, Richard Eckart de Castilho wrote:
> Hi all,
>
> I am wondering if it would be feasible to add some "addToIndex" hooks to UIMA which would be called whenever a FS is added to indexes.
>
> Why?
>
> There may be conventions on the type system and feature values that cannot be expressed simply via the type system definition. For example, in DKPro Core, there is a convention that the begin/end of a Dependency annotation must match the begin/end of the Token that is referred to by the "dependent" feature. There are a number of additional conventions like these.
>
> It would probably be possible to use customized JCas classes to check for such conventions, but that of course would fail if the feature structures are created/manipulated via the CAS API. Also, not everybody is fond of customized (i.e. not auto-generated) JCas classes.
>
> Has anybody already through of supporting such "consistency checks" at the CAS level?
>
> How are you ensuring consistent information in your CASes?
>
> Cheers,
>
> -- Richard


Re: CAS consistency checks

Posted by Richard Eckart de Castilho <re...@apache.org>.
> On 14.11.2017, at 00:04, Marshall Schor <ms...@schor.com> wrote:
> 
> One thing to consider is where this functionality ought best to go.

I'm not even sure if the idea is a good one, but at least good enough to be discussed ;)

Mind that the scenario I have in mind is not an application-level scenario, but rather
a type-system-level scenario. Consider a component collection such as DKPro Core which
comes with a type system and a large number of components. It requires quite a bit of
effort to ensure that all components adhere to the same consistency rules. And it
requires a lot of redundant code to always set the features in the same consistent way.

So it is not only to "check" if consistency is maintained, but rather to "instill" 
consistency. 

To take the example of the Dependency type I made before, the "hook" could
automatically set the offsets of the Dependency feature structure to the offsets of
the Token annotation of referred to by the "Dependent" feature. That could help removing
redundant code. Another option would be to factor that code out into some kind of factory
class and have components call that factory instead of creating annotations directly. But
DKPro Core would end up with lots of such factories. Hence I thought, maybe this kind of
redundant consistency-keeping code is something that others might have as well and that
might be something to be addressed by the framework.

In order to "check" consistency, DKPro Core includes a modular consistency checking
subsystem which is used during unit testing.

So I was imagining some kind of registration mechanism where one could tell UIMA
"if an annotation of type X is added to the indexes, please call this code" - and
"this code" could then e.g. fill in some feature values automatically and/or check
for consistency in the already filled values.

Cheers,

-- Richard


Re: CAS consistency checks

Posted by Marshall Schor <ms...@schor.com>.
One thing to consider is where this functionality ought best to go.

If an App wanted consistency checking, it could

  - do what is suggested below - a "hook"  that can be set up to call some
consistency checker on every add-to-index.

  - or change the App code, refactoring it so that all add-to-indexes needing
consistency checking went to one (or more) consistency checking common routines,
which, if the checks passed, would do the add-to -indexes.

The 2nd approach allow more modularity - only those add-to-indexes where this
check is wanted would do the check, and many different kinds of consistency
checking could be implemented, for all kinds of subsets of add-to-indexes calls.

The first approach would funnel all add-to-indexes calls into a single
consistency checker, which would then funnel out (I guess) to the various kinds
of checks needed.

-M


On 11/13/2017 2:38 PM, Richard Eckart de Castilho wrote:
> Hi all,
>
> I am wondering if it would be feasible to add some "addToIndex" hooks to UIMA which would be called whenever a FS is added to indexes.
>
> Why?
>
> There may be conventions on the type system and feature values that cannot be expressed simply via the type system definition. For example, in DKPro Core, there is a convention that the begin/end of a Dependency annotation must match the begin/end of the Token that is referred to by the "dependent" feature. There are a number of additional conventions like these.
>
> It would probably be possible to use customized JCas classes to check for such conventions, but that of course would fail if the feature structures are created/manipulated via the CAS API. Also, not everybody is fond of customized (i.e. not auto-generated) JCas classes.
>
> Has anybody already through of supporting such "consistency checks" at the CAS level?
>
> How are you ensuring consistent information in your CASes?
>
> Cheers,
>
> -- Richard