You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by Peter Abramowitsch <pa...@gmail.com> on 2020/10/12 22:05:13 UTC

Found code relating to a bug I reported a few weeks ago.

Hi Sean

If you know every inch of the code maybe I can ask you what you think of
this problem I found in the negex annotator.   It causes any sentence to
crash when the very first character begins the negation:

*"Absence of headache"  *
causes a crash later on in another annotator because the ContextAnnotation
it creates has a begin offset of -1.

*" Absence of headache" *
successfully annotates the phrase.

I need to fix this urgently, but I found a mysterious piece of code that is
responsible for this.
I'm working off a trunk snapshot 4.0.1 taken Dec 27  2018

NegexAnnotator.annotateNegation()   at line 846 of its class file:

*846: nec.setBegin(s.getBegin() + t.getStart() - 1);*

In the case where a sentence begins with "Absence of...."   then both s
(Sentence) and t (negex token) begin at offset 0. Then the
ContextAnnotation goes on its destructive way.       So what's with the
-1   ?
Of course It also fails with "No headache..." as the beginning of a sentence

If you know, or have a hunch why the  -1 at that line is there I will track
it down further.  Otherwise I'm just tempted to leave it and
Max(calcOffset, 0)

The crash actually occurs as control passes to the historyOf annotator
which looks at the ContextAnnotation created by Negex.
See below the signature for the stack trace

My ICLA has not been approved  yet so I can't make any alterations to
source, nor would I without any orientation to the process.   Never done it
before nor have the keys to Jira

Anyone else who knows the NegexNegator in depth  please chime in as well

Peter
--------------------------------

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
at java.lang.String.substring(String.java:1960)
at org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:128)
at
org.apache.ctakes.dependency.parser.util.DependencyUtility.doesSubsume(DependencyUtility.java:67)
at
org.apache.ctakes.dependency.parser.util.DependencyUtility.getDependencyNodes(DependencyUtility.java:104)
at
org.apache.ctakes.dependency.parser.util.DependencyUtility.getNominalHeadNode(DependencyUtility.java:113)
at
org.apache.ctakes.assertion.attributes.history.HistoryAttributeClassifier.extract(HistoryAttributeClassifier.java:2

Re: Found code relating to a bug I reported a few weeks ago. [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Peter,

>this morning I got access
- Welcome to the party!

> are you suggesting we just let the negex annotator begin at offset 0 + token-pos
- Yes, but,
- I think that some good old fashioned user-powered regression testing would be in order.  Ad hoc if necessary.
- Or, maybe I will finally make that TestCasBuilder unit test helper that I've been putting off for so long ...

>Is the official archive in Git
- Nope.  Still svn.
>what is your favorite gui 
- I use Intellij IDEa for almost everything.
The community edition is free but powerful and there are plugins for all kinds of things.  I use it with svn, git, maven, java, bash, python, docker ... even my own ctakes class templates and markup for piper files.

>I prefer a peer-review/pull-request type interaction
- not a bad practice!
>Do you already implement something like that?
- You can put patches in jira tars/bug reports, which is somewhat easy to use.  If nobody is monitoring the tar then the patch just sits unnoticed, so you still need to make requests.  Assigning the tar to somebody should get that job done.
- You can also just email the original author of the code being fixed or improved and work out the review privately.  That is my preference as I like to keep the number of open apps/windows to a minimum and popping up jira involves one too many clicks for me.

>Do [jira] permissions and links come from Apache?
- I think that Apache "owns" the pages but anybody with rights can give write permission to any jira account you have.

>I'm on my way back to Italy 
- I am green with envy.

Sean

________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Thursday, October 15, 2020 2:13 PM
To: dev@ctakes.apache.org
Subject: Re: Found code relating to a bug I reported a few weeks ago. [EXTERNAL]

* External Email - Caution *


Thanks Sean

Yes, this morning I got access, but I'm not in a hurry to start tampering
with the archive.  If you want to take this into another mail stream it's
fine - perhaps better.     I have a couple of general questions and
specific questions

1.  About the fix we discussed above, are you suggesting we just let the
negex annotator begin at offset 0 + token-pos, and then see if anything
stops working (or improves!)  downstream, now that at least historyOf
downstream doesn't throw an exception anymore?   There are so many
downstream permutations given all the possible annotators it would be
impossible to test all of them.  And unless we see another one of these -1
strangenesses in other places where context annotations are created, can we
just assume that it is idiosyncratic to Negex for historical reasons?

2.  Is the official archive in Git now or in SVN?  Apache root mentioned
SVN only.   If SVN, what is your favorite gui if you use one?

3.  I prefer a peer-review/pull-request type interaction if possible.  I
would hate to introduce rubbish even if entitled to do so.  Do you already
implement something like that?

4.  What about Jira?  Do permissions and links come from Apache?  I've been
on it briefly in read-only mode.

There's no hurry to respond.  I'm on my way back to Italy shortly and will
set up shop there again next week sometime.

Regards, Peter

On Wed, Oct 14, 2020 at 2:12 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> There are untold miles of code that I have never traveled ...
>
> I don't understand the -1.  Maybe the original code used some other code (
> .start() )that wasn't zero-based?
>
> It seems like nec.begin should not be offset and that down the line
> consumers should offset to n-1 or .max(n-1,0).  Which of course means that
> any fixes need to have propagated adjustments.  Yay.
>
> I think that your ICLA has gone through (congratulations).  Just in time,
> right?
>
> Sean
>
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Monday, October 12, 2020 6:05 PM
> To: dev@ctakes.apache.org
> Subject: Found code relating to a bug I reported a few weeks ago.
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean
>
> If you know every inch of the code maybe I can ask you what you think of
> this problem I found in the negex annotator.   It causes any sentence to
> crash when the very first character begins the negation:
>
> *"Absence of headache"  *
> causes a crash later on in another annotator because the ContextAnnotation
> it creates has a begin offset of -1.
>
> *" Absence of headache" *
> successfully annotates the phrase.
>
> I need to fix this urgently, but I found a mysterious piece of code that is
> responsible for this.
> I'm working off a trunk snapshot 4.0.1 taken Dec 27  2018
>
> NegexAnnotator.annotateNegation()   at line 846 of its class file:
>
> *846: nec.setBegin(s.getBegin() + t.getStart() - 1);*
>
> In the case where a sentence begins with "Absence of...."   then both s
> (Sentence) and t (negex token) begin at offset 0. Then the
> ContextAnnotation goes on its destructive way.       So what's with the
> -1   ?
> Of course It also fails with "No headache..." as the beginning of a
> sentence
>
> If you know, or have a hunch why the  -1 at that line is there I will track
> it down further.  Otherwise I'm just tempted to leave it and
> Max(calcOffset, 0)
>
> The crash actually occurs as control passes to the historyOf annotator
> which looks at the ContextAnnotation created by Negex.
> See below the signature for the stack trace
>
> My ICLA has not been approved  yet so I can't make any alterations to
> source, nor would I without any orientation to the process.   Never done it
> before nor have the keys to Jira
>
> Anyone else who knows the NegexNegator in depth  please chime in as well
>
> Peter
> --------------------------------
>
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
> at java.lang.String.substring(String.java:1960)
> at org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:128)
> at
>
> org.apache.ctakes.dependency.parser.util.DependencyUtility.doesSubsume(DependencyUtility.java:67)
> at
>
> org.apache.ctakes.dependency.parser.util.DependencyUtility.getDependencyNodes(DependencyUtility.java:104)
> at
>
> org.apache.ctakes.dependency.parser.util.DependencyUtility.getNominalHeadNode(DependencyUtility.java:113)
> at
>
> org.apache.ctakes.assertion.attributes.history.HistoryAttributeClassifier.extract(HistoryAttributeClassifier.java:2
>

Re: Found code relating to a bug I reported a few weeks ago. [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.

Thanks Sean

Yes, this morning I got access, but I'm not in a hurry to start tampering
with the archive.  If you want to take this into another mail stream it's
fine - perhaps better.     I have a couple of general questions and
specific questions

1.  About the fix we discussed above, are you suggesting we just let the
negex annotator begin at offset 0 + token-pos, and then see if anything
stops working (or improves!)  downstream, now that at least historyOf
downstream doesn't throw an exception anymore?   There are so many
downstream permutations given all the possible annotators it would be
impossible to test all of them.  And unless we see another one of these -1
strangenesses in other places where context annotations are created, can we
just assume that it is idiosyncratic to Negex for historical reasons?

2.  Is the official archive in Git now or in SVN?  Apache root mentioned
SVN only.   If SVN, what is your favorite gui if you use one?

3.  I prefer a peer-review/pull-request type interaction if possible.  I
would hate to introduce rubbish even if entitled to do so.  Do you already
implement something like that?

4.  What about Jira?  Do permissions and links come from Apache?  I've been
on it briefly in read-only mode.

There's no hurry to respond.  I'm on my way back to Italy shortly and will
set up shop there again next week sometime.

Regards, Peter

On Wed, Oct 14, 2020 at 2:12 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> There are untold miles of code that I have never traveled ...
>
> I don't understand the -1.  Maybe the original code used some other code (
> .start() )that wasn't zero-based?
>
> It seems like nec.begin should not be offset and that down the line
> consumers should offset to n-1 or .max(n-1,0).  Which of course means that
> any fixes need to have propagated adjustments.  Yay.
>
> I think that your ICLA has gone through (congratulations).  Just in time,
> right?
>
> Sean
>
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Monday, October 12, 2020 6:05 PM
> To: dev@ctakes.apache.org
> Subject: Found code relating to a bug I reported a few weeks ago.
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean
>
> If you know every inch of the code maybe I can ask you what you think of
> this problem I found in the negex annotator.   It causes any sentence to
> crash when the very first character begins the negation:
>
> *"Absence of headache"  *
> causes a crash later on in another annotator because the ContextAnnotation
> it creates has a begin offset of -1.
>
> *" Absence of headache" *
> successfully annotates the phrase.
>
> I need to fix this urgently, but I found a mysterious piece of code that is
> responsible for this.
> I'm working off a trunk snapshot 4.0.1 taken Dec 27  2018
>
> NegexAnnotator.annotateNegation()   at line 846 of its class file:
>
> *846: nec.setBegin(s.getBegin() + t.getStart() - 1);*
>
> In the case where a sentence begins with "Absence of...."   then both s
> (Sentence) and t (negex token) begin at offset 0. Then the
> ContextAnnotation goes on its destructive way.       So what's with the
> -1   ?
> Of course It also fails with "No headache..." as the beginning of a
> sentence
>
> If you know, or have a hunch why the  -1 at that line is there I will track
> it down further.  Otherwise I'm just tempted to leave it and
> Max(calcOffset, 0)
>
> The crash actually occurs as control passes to the historyOf annotator
> which looks at the ContextAnnotation created by Negex.
> See below the signature for the stack trace
>
> My ICLA has not been approved  yet so I can't make any alterations to
> source, nor would I without any orientation to the process.   Never done it
> before nor have the keys to Jira
>
> Anyone else who knows the NegexNegator in depth  please chime in as well
>
> Peter
> --------------------------------
>
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
> at java.lang.String.substring(String.java:1960)
> at org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:128)
> at
>
> org.apache.ctakes.dependency.parser.util.DependencyUtility.doesSubsume(DependencyUtility.java:67)
> at
>
> org.apache.ctakes.dependency.parser.util.DependencyUtility.getDependencyNodes(DependencyUtility.java:104)
> at
>
> org.apache.ctakes.dependency.parser.util.DependencyUtility.getNominalHeadNode(DependencyUtility.java:113)
> at
>
> org.apache.ctakes.assertion.attributes.history.HistoryAttributeClassifier.extract(HistoryAttributeClassifier.java:2
>

Re: Found code relating to a bug I reported a few weeks ago. [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Peter,

There are untold miles of code that I have never traveled ...

I don't understand the -1.  Maybe the original code used some other code ( .start() )that wasn't zero-based?  

It seems like nec.begin should not be offset and that down the line consumers should offset to n-1 or .max(n-1,0).  Which of course means that any fixes need to have propagated adjustments.  Yay.

I think that your ICLA has gone through (congratulations).  Just in time, right?

Sean


________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Monday, October 12, 2020 6:05 PM
To: dev@ctakes.apache.org
Subject: Found code relating to a bug I reported a few weeks ago. [EXTERNAL]

* External Email - Caution *


Hi Sean

If you know every inch of the code maybe I can ask you what you think of
this problem I found in the negex annotator.   It causes any sentence to
crash when the very first character begins the negation:

*"Absence of headache"  *
causes a crash later on in another annotator because the ContextAnnotation
it creates has a begin offset of -1.

*" Absence of headache" *
successfully annotates the phrase.

I need to fix this urgently, but I found a mysterious piece of code that is
responsible for this.
I'm working off a trunk snapshot 4.0.1 taken Dec 27  2018

NegexAnnotator.annotateNegation()   at line 846 of its class file:

*846: nec.setBegin(s.getBegin() + t.getStart() - 1);*

In the case where a sentence begins with "Absence of...."   then both s
(Sentence) and t (negex token) begin at offset 0. Then the
ContextAnnotation goes on its destructive way.       So what's with the
-1   ?
Of course It also fails with "No headache..." as the beginning of a sentence

If you know, or have a hunch why the  -1 at that line is there I will track
it down further.  Otherwise I'm just tempted to leave it and
Max(calcOffset, 0)

The crash actually occurs as control passes to the historyOf annotator
which looks at the ContextAnnotation created by Negex.
See below the signature for the stack trace

My ICLA has not been approved  yet so I can't make any alterations to
source, nor would I without any orientation to the process.   Never done it
before nor have the keys to Jira

Anyone else who knows the NegexNegator in depth  please chime in as well

Peter
--------------------------------

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
at java.lang.String.substring(String.java:1960)
at org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:128)
at
org.apache.ctakes.dependency.parser.util.DependencyUtility.doesSubsume(DependencyUtility.java:67)
at
org.apache.ctakes.dependency.parser.util.DependencyUtility.getDependencyNodes(DependencyUtility.java:104)
at
org.apache.ctakes.dependency.parser.util.DependencyUtility.getNominalHeadNode(DependencyUtility.java:113)
at
org.apache.ctakes.assertion.attributes.history.HistoryAttributeClassifier.extract(HistoryAttributeClassifier.java:2