You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2017/07/18 14:01:32 UTC

An undocumented quirk of UIMA Set indexes

While thinking through some updates to UIMA v3 indexes/iterators, I tried the
following experiment.

Configure UIMA with:

- a set index, indexed over the "begin" feature only.

- a type system - built in + a new subtype of Annotation, called "Token".

- make an instance of Annotation with begin=17.

- make an instance of Token with the same begin=17 value.

Add both to the indexes.  Because the set index defines the equality as the
begin feature, and the begin feature is the same, you might expect the set to
have just one entry.

But it has 2 (both of these).

It turns out the "set-ness" is done per type; the effect is as-if the equality
comparator for Sets includes the type.

-----------------------------

I don't propose to change this, unless there's a consensus of requests from our
users.  I would guess that our users have gotten used to this implementation. 
But I do propose to document this :-).

Other opinions?

-Marshall


Re: An undocumented quirk of UIMA Set indexes

Posted by Pablo Duboue <pa...@gmail.com>.
On Tue, Jul 18, 2017 at 10:01 AM, Marshall Schor <ms...@schor.com> wrote:
> While thinking through some updates to UIMA v3 indexes/iterators, I tried the
> following experiment.
>
> Configure UIMA with:
>
> - a set index, indexed over the "begin" feature only.
>
> - a type system - built in + a new subtype of Annotation, called "Token".
>
> - make an instance of Annotation with begin=17.
>
> - make an instance of Token with the same begin=17 value.
>
> Add both to the indexes.  Because the set index defines the equality as the
> begin feature, and the begin feature is the same, you might expect the set to
> have just one entry.
>
> But it has 2 (both of these).
>
> It turns out the "set-ness" is done per type; the effect is as-if the equality
> comparator for Sets includes the type.
>
> -----------------------------
>
> I don't propose to change this, unless there's a consensus of requests from our
> users.  I would guess that our users have gotten used to this implementation.
> But I do propose to document this :-).
>
> Other opinions?

How common are set indices. I wasn't even aware of their existence.

Documenting the behaviour you mention sounds like a great idea, yes.

P