You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2020/07/29 17:31:33 UTC

Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL] [EXTERNAL]

Hi Tomasz,

As far as I know there aren't any upcoming releases planned.

Sean
________________________________________
From: Tomasz Oliwa <ol...@uchicago.edu>
Sent: Wednesday, July 29, 2020 1:17 PM
To: dev@ctakes.apache.org
Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL] [EXTERNAL]

* External Email - Caution *


Sean,

Since you mention a new release, is there any expected time for a new stable cTAKES release? An up-to-date stable release for the user installation would be appreciated I think.

Regards,
Tomasz

________________________________________
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Friday, July 24, 2020 10:45 AM
To: dev@ctakes.apache.org
Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]

I don't think that anybody does.  It is not in the release, not documented, not necessarily ready for widespread use, etc.  Everything associated with types List and ListEntry is new.

Hopefully when ctakes 4.0.1 ( should be 5.0 at this point ) is released these types will be much more usable.

Sean
________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Friday, July 24, 2020 10:50 AM
To: dev@ctakes.apache.org
Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]

* External Email - Caution *


Thanks Sean.  I didn't know about that annotator.

On Fri, Jul 24, 2020, 3:51 AM Finan, Sean <Se...@childrens.harvard.edu>
wrote:

> Hi Sreejith,
>
> Without seeing an example of text I can't say whether my next words will
> help you or not.
>
> If you are using trunk then you should have access to two 'new' annotation
> engines in ctakes-core.
> ListAnnotator        - Annotates formatted List Sections by detecting them
> using Regular Expressions provided in an input File.
> ListEntryNegator  - Checks List Entries for negation, which may be
> exhibited differently from unstructured negation.
>
> ListAnnotator can use any list of regular expressions in a file.  The
> default file is in ctakes-core-res, called DefaultListRegex.bsv
> The format for each line in the regex list is
> NAME||LIST_REGEX||ENTRY_SEPARATOR_REGEX   where
> NAME     - name of list type.  Can be anything.
> LIST_REGEX   - some regular expression for which a block of text will
> match a list in its entirety.
> ENTRY_SEPARATOR_REGEX   - some regular expression for which text within
> the entire list will match a single list entry.
> For instance, the List
> Smoker Status: N
> Drinking Status: Y
> Pregnant: N/A
> A -simple- line in the regex file could be
> Colonized
> List||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n){2,}||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n)
> Notice that each item is separated by two bar characters "||".
>
> The file of regular expressions can be changed using the LIST_TYPES_PATH
> parameter.
>
> ListEntryNegator will iterate through each ListEntry in the cas and use a
> regular expression to determine whether or not items in the list should be
> negated.
> Right now that regex is hard-coded in the class.  There should probably be
> a mechanism to overwrite it.  ": N" is not in there.   Also, only
> Disease/Disorders and Sign/Symptom mentions in the ListEntry are negated.
>  You would need to add SmokingStatusAnnotation as a negatable.
>
> I don't know if any of this is helpful, but I thought that I would throw
> it out there.
>
> Sean
> ________________________________________
> From: Sreejith Pk <sr...@gmail.com>
> Sent: Friday, July 24, 2020 4:09 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Peter, Thanks a lot for the reply.
>
> Let me elaborate more on the changes I have done so far. I have added
> KuRuleBasedClassifierAnnotator to the pipeline inorder to fetch Smoking
> related keywords from the document. I have
> modified KuRuleBasedClassifierAnnotator in such a way that it will iterate
> through the identified tokens and if the token matches any smoking related
> word which are configured inside a keyword.txt file. The identified tokens
> will be then set to SmokerNamedEntityAnnotation and thus can be read from
> the output XMI.
> Here in my scenario, the sentence I am passing to cTAKES is "Smoking
> status: N". As Smoking is configured inside keywords.txt, it will be coming
> as the output node in SmokerNamedEntityAnnotation. Its polarity only I am
> parsing in my parser logic. Here polarity of SmokerNamedEntityAnnotation
> - "Smoking" token is coming as 1 instead of expected -1
> (NB: I have removed ":" from the NamedEntityContextAnalizer.java - boundary
> words set)
>
> Thanks and Regards,
> Sreejith
>
>
> On Thu, Jul 23, 2020 at 11:20 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Check and see if the identified annotation you get for "Smoking status:
> N"
> > without your change is actually "Non Smoker" with polarity 1.
> > Nonsmoker is a separate concept, from a Smoker with polarity -1.  Instead
> > of looking at range text, check the canonical text for the concept you
> > have.
> > Having said that, there are many issues with negation in all of the
> > negation annotators.  Some are too eager, others are too cautious.
> >
> > Peter
> >
> > On Thu, Jul 23, 2020 at 10:17 AM Sreejith Pk <sr...@gmail.com> wrote:
> >
> > > Hi Team,
> > >
> > > We are using cTAKES 4.0.0 as the NLP engine in our application. I have
> > > added ContextAnnotator to the pipeline to achieve correct Polarity to
> the
> > > tokens.
> > > After analysing the ContextAnnotator code, I understand that negation
> > > determining condition is written in NegationFSM class.
> > > In my requirement, I have a sentence "Smoking status: N"  and I want to
> > set
> > > polarity -1 to the token "Smoking" because of the occurrence of "N". To
> > > achieve the same, I have tried adding "N" to the existing HashSet
> > > in NegationFSM constructor like iv_negVerbsSet.add("N"); But it seems,
> > > polarity of the word token "Smoking" is still  coming as 1.
> > > With the same configuration set if I pass "Smoking status: denies", I
> am
> > > getting the polarity of token "Smoking" as -1. Kindly help.
> > >
> > > Thanks & Regards
> > > Sreejith
> > >
> >
>

Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL] [EXTERNAL] [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Jeff,

The process for getting a release made is pretty simple.  In short ...
1.  Somebody in the Project Management Committee (PMC) proposes that a release be made.
2.  The PMC discusses and votes on the idea.
3.  People volunteer for certain duties: Release Manager (RM), testing, updating docs, fixes, etc.
4.  The RM makes the SVN branch and tag.

It has been a while since ctakes has had a release and people have been asking about the possibility.  So ... I am going to push forward with #1 right now ...

The github mirror has had issues in the past.  we aren't directly in charge of it as the overworked Apache Infra team owns such things.  I can certainly alert them to the repos being out of sync.

Thanks,
Sean
________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Friday, July 31, 2020 10:51 AM
To: dev@ctakes.apache.org
Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL] [EXTERNAL] [EXTERNAL]

* External Email - Caution *


Sean,

When I use cTAKES I'd like to be able to refer to the version number for
reproducibility. If I run just the latest trunk (to get access to a new
feature), it is not easily referenced. How is it decided to make a new
cTAKES release? Do you think there will be any future releases or would it
be better to begin referring to cTAKES by svn commit rather than version?

Also, unrelatedly, I am not sure when this happened, but the github mirror
for cTAKES (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=HWEXBNb1tzID3h-S9JDelDpenSBR8d-xbhX5c33KhtI&s=1WEwAbr5uIVVJQDVC8tAPpA5xgyUR1YNjDE7IT50xB0&e= ) doesn't seem to be updating.
It doesn't have dockhand (as an example).

Thanks,
Jeff

On Wed, Jul 29, 2020 at 1:31 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Tomasz,
>
> As far as I know there aren't any upcoming releases planned.
>
> Sean
> ________________________________________
> From: Tomasz Oliwa <ol...@uchicago.edu>
> Sent: Wednesday, July 29, 2020 1:17 PM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Sean,
>
> Since you mention a new release, is there any expected time for a new
> stable cTAKES release? An up-to-date stable release for the user
> installation would be appreciated I think.
>
> Regards,
> Tomasz
>
> ________________________________________
> From: Finan, Sean <Se...@childrens.harvard.edu>
> Sent: Friday, July 24, 2020 10:45 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
>
> I don't think that anybody does.  It is not in the release, not
> documented, not necessarily ready for widespread use, etc.  Everything
> associated with types List and ListEntry is new.
>
> Hopefully when ctakes 4.0.1 ( should be 5.0 at this point ) is released
> these types will be much more usable.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Friday, July 24, 2020 10:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks Sean.  I didn't know about that annotator.
>
> On Fri, Jul 24, 2020, 3:51 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu>
> wrote:
>
> > Hi Sreejith,
> >
> > Without seeing an example of text I can't say whether my next words will
> > help you or not.
> >
> > If you are using trunk then you should have access to two 'new'
> annotation
> > engines in ctakes-core.
> > ListAnnotator        - Annotates formatted List Sections by detecting
> them
> > using Regular Expressions provided in an input File.
> > ListEntryNegator  - Checks List Entries for negation, which may be
> > exhibited differently from unstructured negation.
> >
> > ListAnnotator can use any list of regular expressions in a file.  The
> > default file is in ctakes-core-res, called DefaultListRegex.bsv
> > The format for each line in the regex list is
> > NAME||LIST_REGEX||ENTRY_SEPARATOR_REGEX   where
> > NAME     - name of list type.  Can be anything.
> > LIST_REGEX   - some regular expression for which a block of text will
> > match a list in its entirety.
> > ENTRY_SEPARATOR_REGEX   - some regular expression for which text within
> > the entire list will match a single list entry.
> > For instance, the List
> > Smoker Status: N
> > Drinking Status: Y
> > Pregnant: N/A
> > A -simple- line in the regex file could be
> > Colonized
> >
> List||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n){2,}||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n)
> > Notice that each item is separated by two bar characters "||".
> >
> > The file of regular expressions can be changed using the LIST_TYPES_PATH
> > parameter.
> >
> > ListEntryNegator will iterate through each ListEntry in the cas and use a
> > regular expression to determine whether or not items in the list should
> be
> > negated.
> > Right now that regex is hard-coded in the class.  There should probably
> be
> > a mechanism to overwrite it.  ": N" is not in there.   Also, only
> > Disease/Disorders and Sign/Symptom mentions in the ListEntry are negated.
> >  You would need to add SmokingStatusAnnotation as a negatable.
> >
> > I don't know if any of this is helpful, but I thought that I would throw
> > it out there.
> >
> > Sean
> > ________________________________________
> > From: Sreejith Pk <sr...@gmail.com>
> > Sent: Friday, July 24, 2020 4:09 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Clarification regarding NegationFSM [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Peter, Thanks a lot for the reply.
> >
> > Let me elaborate more on the changes I have done so far. I have added
> > KuRuleBasedClassifierAnnotator to the pipeline inorder to fetch Smoking
> > related keywords from the document. I have
> > modified KuRuleBasedClassifierAnnotator in such a way that it will
> iterate
> > through the identified tokens and if the token matches any smoking
> related
> > word which are configured inside a keyword.txt file. The identified
> tokens
> > will be then set to SmokerNamedEntityAnnotation and thus can be read from
> > the output XMI.
> > Here in my scenario, the sentence I am passing to cTAKES is "Smoking
> > status: N". As Smoking is configured inside keywords.txt, it will be
> coming
> > as the output node in SmokerNamedEntityAnnotation. Its polarity only I am
> > parsing in my parser logic. Here polarity of SmokerNamedEntityAnnotation
> > - "Smoking" token is coming as 1 instead of expected -1
> > (NB: I have removed ":" from the NamedEntityContextAnalizer.java -
> boundary
> > words set)
> >
> > Thanks and Regards,
> > Sreejith
> >
> >
> > On Thu, Jul 23, 2020 at 11:20 PM Peter Abramowitsch <
> > pabramowitsch@gmail.com>
> > wrote:
> >
> > > Check and see if the identified annotation you get for "Smoking status:
> > N"
> > > without your change is actually "Non Smoker" with polarity 1.
> > > Nonsmoker is a separate concept, from a Smoker with polarity -1.
> Instead
> > > of looking at range text, check the canonical text for the concept you
> > > have.
> > > Having said that, there are many issues with negation in all of the
> > > negation annotators.  Some are too eager, others are too cautious.
> > >
> > > Peter
> > >
> > > On Thu, Jul 23, 2020 at 10:17 AM Sreejith Pk <sr...@gmail.com>
> wrote:
> > >
> > > > Hi Team,
> > > >
> > > > We are using cTAKES 4.0.0 as the NLP engine in our application. I
> have
> > > > added ContextAnnotator to the pipeline to achieve correct Polarity to
> > the
> > > > tokens.
> > > > After analysing the ContextAnnotator code, I understand that negation
> > > > determining condition is written in NegationFSM class.
> > > > In my requirement, I have a sentence "Smoking status: N"  and I want
> to
> > > set
> > > > polarity -1 to the token "Smoking" because of the occurrence of "N".
> To
> > > > achieve the same, I have tried adding "N" to the existing HashSet
> > > > in NegationFSM constructor like iv_negVerbsSet.add("N"); But it
> seems,
> > > > polarity of the word token "Smoking" is still  coming as 1.
> > > > With the same configuration set if I pass "Smoking status: denies", I
> > am
> > > > getting the polarity of token "Smoking" as -1. Kindly help.
> > > >
> > > > Thanks & Regards
> > > > Sreejith
> > > >
> > >
> >
>

Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL] [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Sean,

When I use cTAKES I'd like to be able to refer to the version number for
reproducibility. If I run just the latest trunk (to get access to a new
feature), it is not easily referenced. How is it decided to make a new
cTAKES release? Do you think there will be any future releases or would it
be better to begin referring to cTAKES by svn commit rather than version?

Also, unrelatedly, I am not sure when this happened, but the github mirror
for cTAKES (https://github.com/apache/ctakes) doesn't seem to be updating.
It doesn't have dockhand (as an example).

Thanks,
Jeff

On Wed, Jul 29, 2020 at 1:31 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Tomasz,
>
> As far as I know there aren't any upcoming releases planned.
>
> Sean
> ________________________________________
> From: Tomasz Oliwa <ol...@uchicago.edu>
> Sent: Wednesday, July 29, 2020 1:17 PM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Sean,
>
> Since you mention a new release, is there any expected time for a new
> stable cTAKES release? An up-to-date stable release for the user
> installation would be appreciated I think.
>
> Regards,
> Tomasz
>
> ________________________________________
> From: Finan, Sean <Se...@childrens.harvard.edu>
> Sent: Friday, July 24, 2020 10:45 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
>
> I don't think that anybody does.  It is not in the release, not
> documented, not necessarily ready for widespread use, etc.  Everything
> associated with types List and ListEntry is new.
>
> Hopefully when ctakes 4.0.1 ( should be 5.0 at this point ) is released
> these types will be much more usable.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Friday, July 24, 2020 10:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks Sean.  I didn't know about that annotator.
>
> On Fri, Jul 24, 2020, 3:51 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu>
> wrote:
>
> > Hi Sreejith,
> >
> > Without seeing an example of text I can't say whether my next words will
> > help you or not.
> >
> > If you are using trunk then you should have access to two 'new'
> annotation
> > engines in ctakes-core.
> > ListAnnotator        - Annotates formatted List Sections by detecting
> them
> > using Regular Expressions provided in an input File.
> > ListEntryNegator  - Checks List Entries for negation, which may be
> > exhibited differently from unstructured negation.
> >
> > ListAnnotator can use any list of regular expressions in a file.  The
> > default file is in ctakes-core-res, called DefaultListRegex.bsv
> > The format for each line in the regex list is
> > NAME||LIST_REGEX||ENTRY_SEPARATOR_REGEX   where
> > NAME     - name of list type.  Can be anything.
> > LIST_REGEX   - some regular expression for which a block of text will
> > match a list in its entirety.
> > ENTRY_SEPARATOR_REGEX   - some regular expression for which text within
> > the entire list will match a single list entry.
> > For instance, the List
> > Smoker Status: N
> > Drinking Status: Y
> > Pregnant: N/A
> > A -simple- line in the regex file could be
> > Colonized
> >
> List||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n){2,}||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n)
> > Notice that each item is separated by two bar characters "||".
> >
> > The file of regular expressions can be changed using the LIST_TYPES_PATH
> > parameter.
> >
> > ListEntryNegator will iterate through each ListEntry in the cas and use a
> > regular expression to determine whether or not items in the list should
> be
> > negated.
> > Right now that regex is hard-coded in the class.  There should probably
> be
> > a mechanism to overwrite it.  ": N" is not in there.   Also, only
> > Disease/Disorders and Sign/Symptom mentions in the ListEntry are negated.
> >  You would need to add SmokingStatusAnnotation as a negatable.
> >
> > I don't know if any of this is helpful, but I thought that I would throw
> > it out there.
> >
> > Sean
> > ________________________________________
> > From: Sreejith Pk <sr...@gmail.com>
> > Sent: Friday, July 24, 2020 4:09 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Clarification regarding NegationFSM [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Peter, Thanks a lot for the reply.
> >
> > Let me elaborate more on the changes I have done so far. I have added
> > KuRuleBasedClassifierAnnotator to the pipeline inorder to fetch Smoking
> > related keywords from the document. I have
> > modified KuRuleBasedClassifierAnnotator in such a way that it will
> iterate
> > through the identified tokens and if the token matches any smoking
> related
> > word which are configured inside a keyword.txt file. The identified
> tokens
> > will be then set to SmokerNamedEntityAnnotation and thus can be read from
> > the output XMI.
> > Here in my scenario, the sentence I am passing to cTAKES is "Smoking
> > status: N". As Smoking is configured inside keywords.txt, it will be
> coming
> > as the output node in SmokerNamedEntityAnnotation. Its polarity only I am
> > parsing in my parser logic. Here polarity of SmokerNamedEntityAnnotation
> > - "Smoking" token is coming as 1 instead of expected -1
> > (NB: I have removed ":" from the NamedEntityContextAnalizer.java -
> boundary
> > words set)
> >
> > Thanks and Regards,
> > Sreejith
> >
> >
> > On Thu, Jul 23, 2020 at 11:20 PM Peter Abramowitsch <
> > pabramowitsch@gmail.com>
> > wrote:
> >
> > > Check and see if the identified annotation you get for "Smoking status:
> > N"
> > > without your change is actually "Non Smoker" with polarity 1.
> > > Nonsmoker is a separate concept, from a Smoker with polarity -1.
> Instead
> > > of looking at range text, check the canonical text for the concept you
> > > have.
> > > Having said that, there are many issues with negation in all of the
> > > negation annotators.  Some are too eager, others are too cautious.
> > >
> > > Peter
> > >
> > > On Thu, Jul 23, 2020 at 10:17 AM Sreejith Pk <sr...@gmail.com>
> wrote:
> > >
> > > > Hi Team,
> > > >
> > > > We are using cTAKES 4.0.0 as the NLP engine in our application. I
> have
> > > > added ContextAnnotator to the pipeline to achieve correct Polarity to
> > the
> > > > tokens.
> > > > After analysing the ContextAnnotator code, I understand that negation
> > > > determining condition is written in NegationFSM class.
> > > > In my requirement, I have a sentence "Smoking status: N"  and I want
> to
> > > set
> > > > polarity -1 to the token "Smoking" because of the occurrence of "N".
> To
> > > > achieve the same, I have tried adding "N" to the existing HashSet
> > > > in NegationFSM constructor like iv_negVerbsSet.add("N"); But it
> seems,
> > > > polarity of the word token "Smoking" is still  coming as 1.
> > > > With the same configuration set if I pass "Smoking status: denies", I
> > am
> > > > getting the polarity of token "Smoking" as -1. Kindly help.
> > > >
> > > > Thanks & Regards
> > > > Sreejith
> > > >
> > >
> >
>