You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Alexander N Thomas (JIRA)" <de...@uima.apache.org> on 2013/07/12 23:51:49 UTC
[jira] [Created] (UIMA-3075) Unambiguous non-strict subiterator may
return annotations outside the given annotation's range
Alexander N Thomas created UIMA-3075:
----------------------------------------
Summary: Unambiguous non-strict subiterator may return annotations outside the given annotation's range
Key: UIMA-3075
URL: https://issues.apache.org/jira/browse/UIMA-3075
Project: UIMA
Issue Type: Bug
Affects Versions: 2.4.0C
Reporter: Alexander N Thomas
Priority: Minor
REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get four token annotations
"aaa" 0-3
"bbb" 4-7
"ccc" 8-11
"ddd" 12-15
I then iterate over the token annotations while printing the covered text, begin and end, make an unambiguous non-strict subiterator, and iterate over the subiterations printing out their covered text, begin and end all indented.
Iterator<Annotation> iter = jcas.getAnnotationIndex(Token.type).iterator();
while (iter.hasNext()) {
Annotation a = iter.next();
System.out.println("\"" + a.getCoveredText() + "\"" + " [" + a.getBegin() + ", " + a.getEnd() + ")");
Iterator<Annotation> featIter = jcas.getAnnotationIndex().subiterator(a, false, false);
while (featIter.hasNext()) {
Annotation b = featIter.next();
System.out.println("\t\"" + b.getCoveredText() + "\"" + " [" + b.getBegin() + ", " + b.getEnd() + ")");
}
}
The output is
"aaa" [0, 3)
"bbb" [4, 7)
"bbb" [4, 7)
"ccc" [8, 11)
"ccc" [8, 11)
"ddd" [12, 15)
"ddd" [12, 15)
I think this can be fixed by adding an extra check at Subiterator.java ln: 127
NOW
while (it.isValid() && ((start > annot.getBegin()) || (strict && annot.getEnd() > end))) {
it.moveToNext();
}
POSSIBLE FIX
while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <= end) || (strict && annot.getEnd() > end))) {
it.moveToNext();
}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira