You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2017/08/18 13:27:11 UTC

implicit use of type in UIMA iterators

Hi,

These are some not-quite-thought-out thoughts on "type" use in UIMA iterators. 

When I first encountered the detailed design of these in UIMA, I was surprised
to find that, except for type priority ordering, types did not play a major role
in the UIMA iterator APIs. 

    In particular, a FS used as an argument in moveTo(fs) could be a supertype
    of the type the index was over, as long as the supertype had the key
    fields.  This is, for example, typically the case for, say, an
    AnnotationIndex over some type like "Token"; you can use an "Annotation" (a
    supertype of Token) as the argument in moveTo(fs).

The AnnotationIndex defines a typePriority key.  To explore this further, let's
think about cases where the index doesn't use a typePriority key.

Assume we define a type Foo, and some subtypes of Foo:

Foo
  -FooSub_a
     -- FooSub_a_a  (subtype of FooSub_a)
     -- FooSub_a_b
  -FooSub_b

Next, assume you define/create an **index over FooSub_a**, with no typePriority key.

Now you could get an iterator over that index, and do operations like
"moveTo(xxx)";  the type of xxx could be any type defining the sorting key(s)
for the index.  In particular, it could be a subtype, or a supertype.  The type,
itself, plays no role in the moveTo operation.

===========
This was a surprise to me, when I first learned of it. 

I guess I had implicitly assumed that if I said
  -moveTo(aFooSub_b),
    --where there was a type FooSub_b which was "equal" (using the index's
compare operation)
a subsequent "get" would get a FooSub_b instance. 

Instead, I get the "left-most" FS in the index which compares "equal" with xxx,
which could be a FooSub_a instance
  - which is neither a sub or supertype of xxx

===========
If the index is defined **with a typePriority key**, then in the above case, I
do get a FS of the type of xxx (assuming it exists, of course).
===========

This is how UIMA V2 works.  It's mostly a "don't care" thing, I believe, because
of the prevalent use of the AnnotationIndex, which does define a typePriority key.

For UIMA v3, we could modify this behavior. 

One proposal is to change the meaning of "move-to-leftmost" in just the case
illustrated, where there is an "equal" match with the xxx; the modification
would be to (temporarily) include the type in move-to-leftmost, so the move
stops when the type becomes unequal.  This guarantees that the next "get" gets
the same type as the key, if the key exists.

    This proposal is for type equal matching, not for type/subtype matching.  So
    if the moveTo(xxx) was for type FooSub_b, but there was no matching instance
    of that type, but there were matching instances of other types (sub types,
    super types, and other (e.g. FooSub_a) types), the iterator would move to
    the leftmost one of all of these.  (Of course, with more complexity, other
    designs could be done).

    Issue: imagine there were multiple FSs "equal" to xxx, of FooSub_b, and
    other types.  Nothing is said about what moveToNext would do.  It could well
    move to a FS of some other type, instead of first going among the FooSub_b
    types.
      - the proposal could be augmented to guarantee all FSs "equal" to xxx of
    FooSub_b, would be returned first, if iterating forwards.

    Although this seems like the "least surprise" result, it starts to produce
    implementation complexity, and perhaps other surprises for other cases.

    So I'm not sure if any of these modifications are the right thing to do... 
    as compared to the simpler (more consistent, less special case, but with
    other surprises) approach that V2 has.

Just a note:

    Left-most is a concept applying only to FSs in the index which compare
    "equal" (using the keys specified for the index), and means the left-most
    one among the set of equal items.

Do others feel some sort of "improvement" in the moveTo(xxx) definition along
any of these lines is needed?  Or is it best to just keep things like v2 does
it, with the same "surprises"?

-Marshall

Re: implicit use of type in UIMA iterators

Posted by Marshall Schor <ms...@schor.com>.
On 8/18/2017 12:28 PM, Richard Eckart de Castilho wrote:
> On 18.08.2017, at 15:27, Marshall Schor <ms...@schor.com> wrote:
>> This was a surprise to me, when I first learned of it. 
>>
>> I guess I had implicitly assumed that if I said
>>  -moveTo(aFooSub_b),
>>    --where there was a type FooSub_b which was "equal" (using the index's
>> compare operation)
>> a subsequent "get" would get a FooSub_b instance. 
>>
>> Instead, I get the "left-most" FS in the index which compares "equal" with xxx,
>> which could be a FooSub_a instance
>>  - which is neither a sub or supertype of xxx
> You obtain your iterator from an index-over-FooSub_a.
>
> Everything that you get back from this index should be a FooSub_a or a subtype.
>
> If you tell an iterator over this index to moveTo(xxx), then it would set the
> iterator pointer to the insertion location of xxx.
>
> I don't find it surprising that the iterator will return a FooSub_a or a subtype
> instead of a xxx.
>
> What I find a bit of surprising is, that the moveTo(xxx) operation accepts the
> xxx in the first place. Let's say you define an index over the feature A and
> xxx does not even have the feature A. It seems a proper reaction in this case
> would be an illegal argument exception.
I agree, seems strange.  Although it does work as long as xxx inherits the same
features that are being used in the index keys.
-Marshall
>  
>
> Cheers,
>
> -- Richard


Re: implicit use of type in UIMA iterators

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 18.08.2017, at 15:27, Marshall Schor <ms...@schor.com> wrote:
> 
> This was a surprise to me, when I first learned of it. 
> 
> I guess I had implicitly assumed that if I said
>  -moveTo(aFooSub_b),
>    --where there was a type FooSub_b which was "equal" (using the index's
> compare operation)
> a subsequent "get" would get a FooSub_b instance. 
> 
> Instead, I get the "left-most" FS in the index which compares "equal" with xxx,
> which could be a FooSub_a instance
>  - which is neither a sub or supertype of xxx

You obtain your iterator from an index-over-FooSub_a.

Everything that you get back from this index should be a FooSub_a or a subtype.

If you tell an iterator over this index to moveTo(xxx), then it would set the
iterator pointer to the insertion location of xxx.

I don't find it surprising that the iterator will return a FooSub_a or a subtype
instead of a xxx.

What I find a bit of surprising is, that the moveTo(xxx) operation accepts the
xxx in the first place. Let's say you define an index over the feature A and
xxx does not even have the feature A. It seems a proper reaction in this case
would be an illegal argument exception. 

Cheers,

-- Richard

Re: implicit use of type in UIMA iterators

Posted by Marshall Schor <ms...@schor.com>.
After some more thinking, I'm now leaning toward making no change at all from
how V2 does things, because the unexpected consequences seem complex to describe.

The way this will affect things is:

1) specifying a typePriority key for a search index, and having no type priority
defined, or using "select" APIs with no typePriority, will use equal type
compare when establishing a move-to-leftmost boundary (currently, in v3, if no
typePriority is defined, this is not done, so the move-to-leftmost might move
farther to the left)

2) if no type priority key is defined for a search index, move-to-left-most will
ignore the type.  (no change from how it works now in v2).

3) The comment in UIMA-5536 (to reduce the "surprise" in one case) will not be
implemented.

-Marshall

On 8/18/2017 9:27 AM, Marshall Schor wrote:
> Hi,
>
> These are some not-quite-thought-out thoughts on "type" use in UIMA iterators. 
>
> When I first encountered the detailed design of these in UIMA, I was surprised
> to find that, except for type priority ordering, types did not play a major role
> in the UIMA iterator APIs. 
>
>     In particular, a FS used as an argument in moveTo(fs) could be a supertype
>     of the type the index was over, as long as the supertype had the key
>     fields.  This is, for example, typically the case for, say, an
>     AnnotationIndex over some type like "Token"; you can use an "Annotation" (a
>     supertype of Token) as the argument in moveTo(fs).
>
> The AnnotationIndex defines a typePriority key.  To explore this further, let's
> think about cases where the index doesn't use a typePriority key.
>
> Assume we define a type Foo, and some subtypes of Foo:
>
> Foo
>   -FooSub_a
>      -- FooSub_a_a  (subtype of FooSub_a)
>      -- FooSub_a_b
>   -FooSub_b
>
> Next, assume you define/create an **index over FooSub_a**, with no typePriority key.
>
> Now you could get an iterator over that index, and do operations like
> "moveTo(xxx)";  the type of xxx could be any type defining the sorting key(s)
> for the index.  In particular, it could be a subtype, or a supertype.  The type,
> itself, plays no role in the moveTo operation.
>
> ===========
> This was a surprise to me, when I first learned of it. 
>
> I guess I had implicitly assumed that if I said
>   -moveTo(aFooSub_b),
>     --where there was a type FooSub_b which was "equal" (using the index's
> compare operation)
> a subsequent "get" would get a FooSub_b instance. 
>
> Instead, I get the "left-most" FS in the index which compares "equal" with xxx,
> which could be a FooSub_a instance
>   - which is neither a sub or supertype of xxx
>
> ===========
> If the index is defined **with a typePriority key**, then in the above case, I
> do get a FS of the type of xxx (assuming it exists, of course).
> ===========
>
> This is how UIMA V2 works.  It's mostly a "don't care" thing, I believe, because
> of the prevalent use of the AnnotationIndex, which does define a typePriority key.
>
> For UIMA v3, we could modify this behavior. 
>
> One proposal is to change the meaning of "move-to-leftmost" in just the case
> illustrated, where there is an "equal" match with the xxx; the modification
> would be to (temporarily) include the type in move-to-leftmost, so the move
> stops when the type becomes unequal.  This guarantees that the next "get" gets
> the same type as the key, if the key exists.
>
>     This proposal is for type equal matching, not for type/subtype matching.  So
>     if the moveTo(xxx) was for type FooSub_b, but there was no matching instance
>     of that type, but there were matching instances of other types (sub types,
>     super types, and other (e.g. FooSub_a) types), the iterator would move to
>     the leftmost one of all of these.  (Of course, with more complexity, other
>     designs could be done).
>
>     Issue: imagine there were multiple FSs "equal" to xxx, of FooSub_b, and
>     other types.  Nothing is said about what moveToNext would do.  It could well
>     move to a FS of some other type, instead of first going among the FooSub_b
>     types.
>       - the proposal could be augmented to guarantee all FSs "equal" to xxx of
>     FooSub_b, would be returned first, if iterating forwards.
>
>     Although this seems like the "least surprise" result, it starts to produce
>     implementation complexity, and perhaps other surprises for other cases.
>
>     So I'm not sure if any of these modifications are the right thing to do... 
>     as compared to the simpler (more consistent, less special case, but with
>     other surprises) approach that V2 has.
>
> Just a note:
>
>     Left-most is a concept applying only to FSs in the index which compare
>     "equal" (using the keys specified for the index), and means the left-most
>     one among the set of equal items.
>
> Do others feel some sort of "improvement" in the moveTo(xxx) definition along
> any of these lines is needed?  Or is it best to just keep things like v2 does
> it, with the same "surprises"?
>
> -Marshall
>