You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Hannes Korte <ha...@iais.fraunhofer.de> on 2010/04/07 13:57:30 UTC

Possibly a bug in subiterator

Hi,

I noticed a strange behavior of the annotation index subiterator in
uimaj 2.2.2 and 2.3.0.

Consider the sentence: 'Testing the UIMA-Framework'
with tokens: 'Testing' 'the' 'UIMA-Framework'
and the named entity: 'UIMA'

The type priorities list NamedEntity on top of the Token type.

If I call the Token subiterator for the NamedEntity 'UIMA' with
strict=false, I get an empty result. According to the docs, the
definition of Tokens contained in the NamendEntity is in the
strict=false setting defined as:

  annot.getBegin() <= b.getBegin() <= annot.getEnd()

for NamedEntity annot and Token b. This is true for 'UIMA' and
'UIMA-Framework', but the subiterator is empty.

If I change the NamedEntity to ' UIMA' (including the preceeding space),
then it works correctly, and the Token 'UIMA-Framework' is contained in
the subiterator.

I appended a simple java class with all needed files to demonstrate the
problem. Any ideas?

Best regards,
Hannes




Re: Possibly a bug in subiterator

Posted by Thilo Goetz <tw...@gmx.de>.
Hi,

thanks for reporting this.  Please open a JIRA issue and
attach the files, I'll take a look (just paste the text
from your email as issue description).  Thanks.

--Thilo

On 4/7/2010 13:57, Hannes Korte wrote:
> Hi,
> 
> I noticed a strange behavior of the annotation index subiterator in
> uimaj 2.2.2 and 2.3.0.
> 
> Consider the sentence: 'Testing the UIMA-Framework'
> with tokens: 'Testing' 'the' 'UIMA-Framework'
> and the named entity: 'UIMA'
> 
> The type priorities list NamedEntity on top of the Token type.
> 
> If I call the Token subiterator for the NamedEntity 'UIMA' with
> strict=false, I get an empty result. According to the docs, the
> definition of Tokens contained in the NamendEntity is in the
> strict=false setting defined as:
> 
>   annot.getBegin() <= b.getBegin() <= annot.getEnd()
> 
> for NamedEntity annot and Token b. This is true for 'UIMA' and
> 'UIMA-Framework', but the subiterator is empty.
> 
> If I change the NamedEntity to ' UIMA' (including the preceeding space),
> then it works correctly, and the Token 'UIMA-Framework' is contained in
> the subiterator.
> 
> I appended a simple java class with all needed files to demonstrate the
> problem. Any ideas?
> 
> Best regards,
> Hannes
> 
> 
>