You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Richard Eckart de Castilho (JIRA)" <de...@uima.apache.org> on 2013/07/13 09:29:49 UTC

[jira] [Commented] (UIMA-3075) Unambiguous non-strict subiterator may return annotations outside the given annotation's range

    [ https://issues.apache.org/jira/browse/UIMA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707682#comment-13707682 ] 

Richard Eckart de Castilho commented on UIMA-3075:
--------------------------------------------------

This sounds like a duplicate of UIMA-2808. Would you mind trying uimaj 2.4.1-SNAPSHOT to see if this is fixed for you there?
                
> Unambiguous non-strict subiterator may return annotations outside the given annotation's range
> ----------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3075
>                 URL: https://issues.apache.org/jira/browse/UIMA-3075
>             Project: UIMA
>          Issue Type: Bug
>    Affects Versions: 2.4.0C
>            Reporter: Alexander N Thomas
>            Priority: Minor
>
> REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get four token annotations
> "aaa" 0-3
> "bbb" 4-7
> "ccc" 8-11
> "ddd" 12-15
> I then iterate over the token annotations while printing the covered text, begin and end, make an unambiguous non-strict subiterator, and iterate over the subiterations printing out their covered text, begin and end all indented.
> 		Iterator<Annotation> iter = jcas.getAnnotationIndex(Token.type).iterator();
> 		while (iter.hasNext()) {
> 			Annotation a = iter.next();
> 			System.out.println("\"" + a.getCoveredText() + "\"" + " [" + a.getBegin() + ", " + a.getEnd() + ")");
> 			Iterator<Annotation> featIter = jcas.getAnnotationIndex().subiterator(a, false, false);
> 			while (featIter.hasNext()) {
> 				Annotation b = featIter.next();
> 				System.out.println("\t\"" + b.getCoveredText() + "\"" + " [" + b.getBegin() + ", " + b.getEnd() + ")");
> 			}
> 		}
> The output is
> "aaa" [0, 3)
> 	"bbb" [4, 7)
> "bbb" [4, 7)
> 	"ccc" [8, 11)
> "ccc" [8, 11)
> 	"ddd" [12, 15)
> "ddd" [12, 15)
> I think this can be fixed by adding an extra check at Subiterator.java ln: 127
> NOW
>     while (it.isValid() && ((start > annot.getBegin()) || (strict && annot.getEnd() > end))) {
>       it.moveToNext();
>     }
> POSSIBLE FIX
>     while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <= end) || (strict && annot.getEnd() > end))) {
>       it.moveToNext();
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira