You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2023/01/06 14:47:13 UTC

Type Priorities (was: Retire UIMA C++ SDK)

> On 6. Jan 2023, at 14:53, Pablo Duboue <pa...@gmail.com> wrote:
> 
>> Note that Cassis does not support indices or type priorities. To be
>> honest, those always seemed to be more in the way than helpful anyway. The
>> UIMAv3 select API by also default ignores type priorities (can be turned on
>> though for a given select call).
>> 
> 
> Type priorities were indeed a rare bird. But type indices are mighty
> useful. So UIMAv3 has no indices at all? Getting an iterator over
> annotations that fall inside another annotation is a very common task
> (sentences within paragraphs, tokens within sentences, etc). It is one of
> the few constructs that other NLP frameworks provide.

UIMAv3 still has indices, but as in UIMAv2, one normally does not have to configure them.
UIMA (Java) automatically creates indices for all subtypes of Annotation. Also, there is a general index for all FeatureStructures. The same is true for Cassis.

MySentence and MyNER in your code appear to be subtypes of Annotation and you don't seem to define any keys in addition/other than begin/end, so an index definition should not be required.

Defining custom indices would only be required e.g. if you need to set up different index keys. The select-API of UIMAv3 is aware of the automatically-created Annotation-subtype indices and uses them to perform fast seeks with respect to annotation begin/end. However, the select-API is not aware of custom indices and will not use them to speed up access. 
In my experience tough, most access is well-scoped via offsets (fast through the Annotation indices) and then a filter() statement can be used to further narrow down with a O(n) complexity.

-- Richard

Re: Type Priorities (was: Retire UIMA C++ SDK)

Posted by Pablo Duboue <pa...@gmail.com>.

On Fri, Jan 6, 2023 at 6:47 AM Richard Eckart de Castilho <re...@apache.org>
wrote:

UIMAv3 still has indices, but as in UIMAv2, one normally does not have to
> configure them.
> UIMA (Java) automatically creates indices for all subtypes of Annotation.
> Also, there is a general index for all FeatureStructures. The same is true
> for Cassis.
>
> MySentence and MyNER in your code appear to be subtypes of Annotation and
> you don't seem to define any keys in addition/other than begin/end, so an
> index definition should not be required.
>

Oh yes, you're correct. I got carried away after reading the original paper
linked by Eddie. I never used custom indexes in UIMA (Java) although I was
aware there were some corner cases that needed them for the show. In this
particular example the custom indices do buy you some good stuff (the top-k
sentences need more slow Python code otherwise, the selected NER bit can be
done easily by defining a new annotation). Interestingly, I wrote that
top-k code in Java many times because... I just didn't know the indices
could be used that way. In a way this was a bit of a contrived code to show
what UIMA-CPP can do for the Python people.

P