You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Peter Abramowitsch <pa...@gmail.com> on 2020/08/30 21:35:19 UTC

I think I found a bug.

Hi,
I was getting a StringIndexOutOfBoundsException in
DependencyUtil.doesSubsume(annot1, annot2)  with exactly this situation:

*negex annotator*
*the text begins  "negative for <anything>"*

If the chunk *negative for xyz *is preceded by anything else, even a space,
the problem goes away.  It also goes away when you choose another style of
negation.   "no headache", for instance

I've traced the problem back to some illegal entries in the jCAS  You can
see from the image below that the ContextAnnotation's begin offset is
illegal.

Clearly there's an off-by-one error and this triggered the exception
because in my example, the Annotation is created right from the 0th char of
my note text.  But it occurred to me that in every other case, where the
annotation doesn't begin on the first character and it doesn't throw an
exception, it might cause  downstream methods like doesSubsume to give the
wrong result because the begin/end offsets are wrong.

I'm not sure how to follow this up.  But if anyone wants to tackle it....?

This is from HistoryAttributeClassifier beginning at line 274

[image: image.png]

Re: I think I found a bug.

Posted by Kean Kaufmann <ke...@recordsone.com>.
Hi Peter,

I believe I've encountered this too; I never got around to tracking it down
to the root cause, and didn't have the civic-mindedness to report it as you
have.  Thanks!
To shut it up I implemented a brutal brute-force workaround, enclosed for
your possible amusement.

But it occurred to me that in every other case, where the annotation
> doesn't begin on the first character and it doesn't throw an exception, it
> might cause  downstream methods like doesSubsume to give the wrong result
> because the begin/end offsets are wrong.


One would think so, but interestingly enough, this does *not* seem to be
the case.  Everywhere I've checked (quite a few, over the past few years),
non-initial ContextAnnotation offsets look correct.

Workaround: a class that extends NegexAnnotator and adjusts the offsets at
the end of the process() method.

public class NegexAnnotator extends
org.apache.ctakes.ytex.uima.annotators.NegexAnnotator {
...

private void adjustContextOffsets(JCas jCas) {

String text = jCas.getDocumentText();

if (text == null) return;

Collection<ContextAnnotation> contexts = JCasUtil.select(jCas,
ContextAnnotation.class);

if (contexts == null || contexts.isEmpty()) return;

contexts.stream()

.filter(c -> c.getBegin() < 0)

.peek(c -> logger.debug("adjusting begin=" + c.getBegin()))

.forEach(c -> c.setBegin(0));

// don't know if this happens

int docTextLen = jCas.getDocumentText().length();

contexts.stream()

.filter(c -> c.getEnd() >= docTextLen)

.peek(c -> logger.debug("adjusting end=" + c.getEnd()))

.forEach(c -> c.setEnd(docTextLen - 1));

}




On Sun, Aug 30, 2020 at 5:35 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi,
> I was getting a StringIndexOutOfBoundsException in
> DependencyUtil.doesSubsume(annot1, annot2)  with exactly this situation:
>
> *negex annotator*
> *the text begins  "negative for <anything>"*
>
> If the chunk *negative for xyz *is preceded by anything else, even a
> space, the problem goes away.  It also goes away when you choose another
> style of negation.   "no headache", for instance
>
> I've traced the problem back to some illegal entries in the jCAS  You can
> see from the image below that the ContextAnnotation's begin offset is
> illegal.
>
> Clearly there's an off-by-one error and this triggered the exception
> because in my example, the Annotation is created right from the 0th char of
> my note text.  But it occurred to me that in every other case, where the
> annotation doesn't begin on the first character and it doesn't throw an
> exception, it might cause  downstream methods like doesSubsume to give the
> wrong result because the begin/end offsets are wrong.
>
> I'm not sure how to follow this up.  But if anyone wants to tackle it....?
>
> This is from HistoryAttributeClassifier beginning at line 274
>
> [image: image.png]
>
>
>
>

Re: I think I found a bug. [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Thanks Jeff,  I don't think the image is needed.  here's what it showed.

With the negex annotator in the pipeline

With "Negative for headache"  as the text starting at position 0
In HistoryAttributeClassifier beginning near line 274
the first IdentifiedAnnotation in the

*List<IdentifiedAnnotation> lsmentions*

contains a ContextAnnotation where the offset range is   -1, 13.
Looking at the text, it should probably have been 0, 11.

Add any text ahead of the "Negative for" and it works brilliantly.
Probably one of those  off-by-one errors  that comes from staying up too
late.

Peter



Peter

Peter


On Mon, Aug 31, 2020 at 3:48 AM Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Peter,
> I think the email server doesn't let images through. Can you post an
> imgur link maybe?
> Tim
>
> On Sun, 2020-08-30 at 14:35 -0700, Peter Abramowitsch wrote:
> > * External Email - Caution *
> >
> > Hi,
> > I was getting a StringIndexOutOfBoundsException in
> > DependencyUtil.doesSubsume(annot1, annot2)  with exactly this
> > situation:
> >
> > negex annotator
> > the text begins  "negative for <anything>"
> >
> > If the chunk negative for xyz is preceded by anything else, even a
> > space, the problem goes away.  It also goes away when you choose
> > another style of negation.   "no headache", for instance
> >
> > I've traced the problem back to some illegal entries in the jCAS  You
> > can see from the image below that the ContextAnnotation's begin
> > offset is illegal.
> >
> > Clearly there's an off-by-one error and this triggered the exception
> > because in my example, the Annotation is created right from the 0th
> > char of my note text.  But it occurred to me that in every other
> > case, where the annotation doesn't begin on the first character and
> > it doesn't throw an exception, it might cause  downstream methods
> > like doesSubsume to give the wrong result because the begin/end
> > offsets are wrong.
> >
> > I'm not sure how to follow this up.  But if anyone wants to tackle
> > it....?
> >
> > This is from HistoryAttributeClassifier beginning at line 274
> >
> >
> >
> >
> >
>

Re: I think I found a bug. [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Peter,
I think the email server doesn't let images through. Can you post an
imgur link maybe?
Tim

On Sun, 2020-08-30 at 14:35 -0700, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> Hi,
> I was getting a StringIndexOutOfBoundsException in
> DependencyUtil.doesSubsume(annot1, annot2)  with exactly this
> situation:
> 
> negex annotator
> the text begins  "negative for <anything>"
> 
> If the chunk negative for xyz is preceded by anything else, even a
> space, the problem goes away.  It also goes away when you choose
> another style of negation.   "no headache", for instance
> 
> I've traced the problem back to some illegal entries in the jCAS  You
> can see from the image below that the ContextAnnotation's begin
> offset is illegal.  
> 
> Clearly there's an off-by-one error and this triggered the exception
> because in my example, the Annotation is created right from the 0th
> char of my note text.  But it occurred to me that in every other
> case, where the annotation doesn't begin on the first character and
> it doesn't throw an exception, it might cause  downstream methods
> like doesSubsume to give the wrong result because the begin/end
> offsets are wrong.
> 
> I'm not sure how to follow this up.  But if anyone wants to tackle
> it....?
> 
> This is from HistoryAttributeClassifier beginning at line 274
> 
> 
> 
> 
>