You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Yu Liang <li...@gmail.com> on 2014/12/31 15:55:28 UTC

cTakes polarity problem

I have a quick question about CTAKES.
I am using AE “AggregatePlaintextUMLSProcessor.xml” and want to get some negation results by referring to polarity attribute.
However, it turns out, for example “Negative for hepatitis”, is not negated. I think it is weird and I tried “No hepatitis”, “ Denies hepatitis” which return “polarity= -1”, but “Deny hepatitis.” returns “polarity=1”.

any one could give me some clue that what is wrong? Thank you!

RE: cTakes polarity problem

Posted by "Savova, Guergana" <Gu...@childrens.harvard.edu>.
cTAKES also implements a rule-based approach to the negation/polarity problem. It was the default until the latest release. You are free to use the rule-based implementation and compare results with the ML approach.
--Guergana

-----Original Message-----
From: Michael J Gurley [mailto:m-gurley@northwestern.edu] 
Sent: Wednesday, December 31, 2014 11:22 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes polarity problem

I think this demonstrates that machine learning is not the right approach to the negation/polarity problem.


Michael Gurley
m-gurley@northwestern.edu
312 925 3268
Northwestern University Clinical and Translational Sciences Institute
(NUCATS)
http://www.nucats.northwestern.edu
Rubloff Building
750 N Lake Shore Drive, 11th Floor
Chicago, IL 60611







On 12/31/14 9:13 AM, "Miller, Timothy"
<Ti...@childrens.harvard.edu> wrote:

>Hi Yu,
>
>The new polarity module is machine-learning based so it is not always 
>easy to diagnose accuracy issues. But generally it might mean there was 
>no example like that in the training data. It was trained on multiple 
>corpora, but sometimes certain phrases slip through the cracks, and 
>"Deny hepatitis," while possible in the truncated language of clinical 
>notes, seems like an unlikely phrase and so it may not be in our data.
>Is that a real example you saw or just a minimum (not) working example?
>If not do you have a real example (i.e. a whole sentence) where "deny"
>should cause a negation but does not? If so I will look into it. We 
>have had a few reports like this so it may be worth keeping track of 
>missed examples for future iterations of the module. It is important 
>that they be real examples "from the wild" though.
>
>(As an aside, machine learning methods don't understand language the 
>way people do so even if it seems obvious to a human that "Deny <disease>."
>should be negated, if it looks different enough from the context of an 
>example from the training data the ML will sometimes fall back to the 
>majority class of "Not negated".)
>
>Tim
>
>
>On 12/31/2014 10:03 AM, Yu Liang wrote:
>> I have a quick question about CTAKES.
>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get 
>>some negation results by referring to polarity attribute.
>> However, it turns out, for example ³Negative for hepatitis², is not 
>>negated. I think it is weird and I tried ³No hepatitis², ³ Denies 
>>hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns 
>>³polarity=1².
>>
>> any one could give me some clue that what is wrong? Thank you!
>


Re: cTakes polarity problem

Posted by vijay garla <vn...@gmail.com>.
As guergana mentioned ctakes has a rule based negation detection module.
In addition ytex adds a negex based analysis engine.  Both approaches are
very sensitive to sentence splitting (see previous threads on alternative
sentence splitters).

An additional advantage of rule based negation is you don't need some of
the memory & cpu intensive analysis engines required by the ml-based
negation detection ae.

Hth

Vj

On Thursday, January 1, 2015, John Green <jo...@gmail.com>
wrote:

> As I was reading this thread I had the same thought as Tim, perhaps a
> combination. It seems over the perfect training corpus this wouldnt be
> necessary, but perhaps as a stop gap the "ensemble" approach for some using
> your training data but working in a diff corpus (not that I really have the
> time to write anything here, just spit balling bc its an interesting
> thread). Im still bootstrapping myself in ML so I may not have followed
> David's reasoning perfectly, but couldn't a simple approach be that
> anything that isnt negated by the new algo get passed to negex as a fall
> back? I think that was what you were saying Tim.
>
> One area that I can comment on in a more meaningful way would be chiming in
> on Tim's remarks regarding the legitimacy of the phrase "Deny hepatitis": I
> agree, my clinical intuition says it's an unlikely phrase. More probable
> would be it was a typo; "Negative for hepatitis" would be more reasonable
> after, say, serology for HepB markers, though strictly speaking this would
> be less likely to be in a phrase reporting results of just that specific
> test (this would more likely be something a long the lines of "hep panel
> negative" or simply "the the labs were unremarkable". However, I could see
> this phrase in something like "the std screen was negative for hep but
> positive for hiv".
>
> The latter is definitely just one clinical opinion, people talk all kinds
> of ways on the wards, good and bad, and it ends up in their notes too.
>
> Best,
> JG
>
> On Wed, Dec 31, 2014 at 12:32 PM, David Kincaid <kincaid.dave@gmail.com
> <javascript:;>>
> wrote:
>
> > Tim, I like your idea of a hybrid approach. I've thought about trying a
> > hybrid approach in the past myself, but haven't had a chance to try it or
> > seen any papers on it. It seems you could do it by either treating the
> > NegEx output simply as a feature in the ML model or combining the output
> of
> > NegEx and the ML model as an ensemble of sorts. The former would probably
> > have the problem of the NegEx "feature" overwhelming any other features
> > since it would be right most of the time. If I were doing it I think I'd
> > start with the latter approach.
> >
> > In any event, it seems like right now people will need to see how the two
> > systems (NegEx and ML) work on their particular data and go with
> whichever
> > is best.
> >
> > - Dave
> >
> > On Wed, Dec 31, 2014 at 10:40 AM, Miller, Timothy <
> > Timothy.Miller@childrens.harvard.edu <javascript:;>> wrote:
> >
> > > Hi Michael,
> > > I'm somewhat sympathetic to that opinion. But we did a bunch of
> > > experiments and it seemed to us that negex was too hand-tailored for a
> > > specific dataset and that our new module did better across datasets and
> > > overall. The tradeoff is that it is harder to improve and it sometimes
> > > gives unexpected results on the kind of inputs people input by hand for
> > > preliminary testing. That is a tradeoff people will have to consider
> and
> > > like Guergana said, the rule-based module is still part of cTAKES.
> > > (FWIW, I believe it is possible to engineer examples that make Negex
> > > fail in unintuitive ways as well.) If you are interested in these
> > > experiments please check out our paper in Plos One where we look at the
> > > difficulty of the polarity problem, specifically porting systems to new
> > > domains:
> > >
> >
> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774
> > >
> > > I've been wondering if some hybrid approach might be useful. For
> > > example, maybe a system that runs the ML module and Negex and adds in
> > > all the recalled negated terms that Negex finds over and above the ML.
> > > This would probably fix some of the issues with test sentences but does
> > > not solve the problem of being hard to debug. Another possibility is
> > > using a more transparent ML method like decision trees or something.
> > >
> > > Tim
> > >
> > >
> > >
> > >
> > >
> > > On 12/31/2014 11:22 AM, Michael J Gurley wrote:
> > > > I think this demonstrates that machine learning is not the right
> > approach
> > > > to the negation/polarity problem.
> > > >
> > > >
> > > > Michael Gurley
> > > > m-gurley@northwestern.edu <javascript:;>
> > > > 312 925 3268
> > > > Northwestern University Clinical and Translational Sciences Institute
> > > > (NUCATS)
> > > > http://www.nucats.northwestern.edu
> > > > Rubloff Building
> > > > 750 N Lake Shore Drive, 11th Floor
> > > > Chicago, IL 60611
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 12/31/14 9:13 AM, "Miller, Timothy"
> > > > <Timothy.Miller@childrens.harvard.edu <javascript:;>> wrote:
> > > >
> > > >> Hi Yu,
> > > >>
> > > >> The new polarity module is machine-learning based so it is not
> always
> > > >> easy to diagnose accuracy issues. But generally it might mean there
> > was
> > > >> no example like that in the training data. It was trained on
> multiple
> > > >> corpora, but sometimes certain phrases slip through the cracks, and
> > > >> "Deny hepatitis," while possible in the truncated language of
> clinical
> > > >> notes, seems like an unlikely phrase and so it may not be in our
> data.
> > > >> Is that a real example you saw or just a minimum (not) working
> > example?
> > > >> If not do you have a real example (i.e. a whole sentence) where
> "deny"
> > > >> should cause a negation but does not? If so I will look into it. We
> > have
> > > >> had a few reports like this so it may be worth keeping track of
> missed
> > > >> examples for future iterations of the module. It is important that
> > they
> > > >> be real examples "from the wild" though.
> > > >>
> > > >> (As an aside, machine learning methods don't understand language the
> > way
> > > >> people do so even if it seems obvious to a human that "Deny
> > <disease>."
> > > >> should be negated, if it looks different enough from the context of
> an
> > > >> example from the training data the ML will sometimes fall back to
> the
> > > >> majority class of "Not negated".)
> > > >>
> > > >> Tim
> > > >>
> > > >>
> > > >> On 12/31/2014 10:03 AM, Yu Liang wrote:
> > > >>> I have a quick question about CTAKES.
> > > >>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get
> > > >>> some negation results by referring to polarity attribute.
> > > >>> However, it turns out, for example ³Negative for hepatitis², is not
> > > >>> negated. I think it is weird and I tried ³No hepatitis², ³ Denies
> > > >>> hepatitis² which return ³polarity= -1², but ³Deny hepatitis.²
> returns
> > > >>> ³polarity=1².
> > > >>>
> > > >>> any one could give me some clue that what is wrong? Thank you!
> > > >
> > >
> > >
> >
>

Re: cTakes polarity problem

Posted by John Green <jo...@gmail.com>.
As I was reading this thread I had the same thought as Tim, perhaps a
combination. It seems over the perfect training corpus this wouldnt be
necessary, but perhaps as a stop gap the "ensemble" approach for some using
your training data but working in a diff corpus (not that I really have the
time to write anything here, just spit balling bc its an interesting
thread). Im still bootstrapping myself in ML so I may not have followed
David's reasoning perfectly, but couldn't a simple approach be that
anything that isnt negated by the new algo get passed to negex as a fall
back? I think that was what you were saying Tim.

One area that I can comment on in a more meaningful way would be chiming in
on Tim's remarks regarding the legitimacy of the phrase "Deny hepatitis": I
agree, my clinical intuition says it's an unlikely phrase. More probable
would be it was a typo; "Negative for hepatitis" would be more reasonable
after, say, serology for HepB markers, though strictly speaking this would
be less likely to be in a phrase reporting results of just that specific
test (this would more likely be something a long the lines of "hep panel
negative" or simply "the the labs were unremarkable". However, I could see
this phrase in something like "the std screen was negative for hep but
positive for hiv".

The latter is definitely just one clinical opinion, people talk all kinds
of ways on the wards, good and bad, and it ends up in their notes too.

Best,
JG

On Wed, Dec 31, 2014 at 12:32 PM, David Kincaid <ki...@gmail.com>
wrote:

> Tim, I like your idea of a hybrid approach. I've thought about trying a
> hybrid approach in the past myself, but haven't had a chance to try it or
> seen any papers on it. It seems you could do it by either treating the
> NegEx output simply as a feature in the ML model or combining the output of
> NegEx and the ML model as an ensemble of sorts. The former would probably
> have the problem of the NegEx "feature" overwhelming any other features
> since it would be right most of the time. If I were doing it I think I'd
> start with the latter approach.
>
> In any event, it seems like right now people will need to see how the two
> systems (NegEx and ML) work on their particular data and go with whichever
> is best.
>
> - Dave
>
> On Wed, Dec 31, 2014 at 10:40 AM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
> > Hi Michael,
> > I'm somewhat sympathetic to that opinion. But we did a bunch of
> > experiments and it seemed to us that negex was too hand-tailored for a
> > specific dataset and that our new module did better across datasets and
> > overall. The tradeoff is that it is harder to improve and it sometimes
> > gives unexpected results on the kind of inputs people input by hand for
> > preliminary testing. That is a tradeoff people will have to consider and
> > like Guergana said, the rule-based module is still part of cTAKES.
> > (FWIW, I believe it is possible to engineer examples that make Negex
> > fail in unintuitive ways as well.) If you are interested in these
> > experiments please check out our paper in Plos One where we look at the
> > difficulty of the polarity problem, specifically porting systems to new
> > domains:
> >
> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774
> >
> > I've been wondering if some hybrid approach might be useful. For
> > example, maybe a system that runs the ML module and Negex and adds in
> > all the recalled negated terms that Negex finds over and above the ML.
> > This would probably fix some of the issues with test sentences but does
> > not solve the problem of being hard to debug. Another possibility is
> > using a more transparent ML method like decision trees or something.
> >
> > Tim
> >
> >
> >
> >
> >
> > On 12/31/2014 11:22 AM, Michael J Gurley wrote:
> > > I think this demonstrates that machine learning is not the right
> approach
> > > to the negation/polarity problem.
> > >
> > >
> > > Michael Gurley
> > > m-gurley@northwestern.edu
> > > 312 925 3268
> > > Northwestern University Clinical and Translational Sciences Institute
> > > (NUCATS)
> > > http://www.nucats.northwestern.edu
> > > Rubloff Building
> > > 750 N Lake Shore Drive, 11th Floor
> > > Chicago, IL 60611
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 12/31/14 9:13 AM, "Miller, Timothy"
> > > <Ti...@childrens.harvard.edu> wrote:
> > >
> > >> Hi Yu,
> > >>
> > >> The new polarity module is machine-learning based so it is not always
> > >> easy to diagnose accuracy issues. But generally it might mean there
> was
> > >> no example like that in the training data. It was trained on multiple
> > >> corpora, but sometimes certain phrases slip through the cracks, and
> > >> "Deny hepatitis," while possible in the truncated language of clinical
> > >> notes, seems like an unlikely phrase and so it may not be in our data.
> > >> Is that a real example you saw or just a minimum (not) working
> example?
> > >> If not do you have a real example (i.e. a whole sentence) where "deny"
> > >> should cause a negation but does not? If so I will look into it. We
> have
> > >> had a few reports like this so it may be worth keeping track of missed
> > >> examples for future iterations of the module. It is important that
> they
> > >> be real examples "from the wild" though.
> > >>
> > >> (As an aside, machine learning methods don't understand language the
> way
> > >> people do so even if it seems obvious to a human that "Deny
> <disease>."
> > >> should be negated, if it looks different enough from the context of an
> > >> example from the training data the ML will sometimes fall back to the
> > >> majority class of "Not negated".)
> > >>
> > >> Tim
> > >>
> > >>
> > >> On 12/31/2014 10:03 AM, Yu Liang wrote:
> > >>> I have a quick question about CTAKES.
> > >>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get
> > >>> some negation results by referring to polarity attribute.
> > >>> However, it turns out, for example ³Negative for hepatitis², is not
> > >>> negated. I think it is weird and I tried ³No hepatitis², ³ Denies
> > >>> hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns
> > >>> ³polarity=1².
> > >>>
> > >>> any one could give me some clue that what is wrong? Thank you!
> > >
> >
> >
>

Re: cTakes polarity problem

Posted by David Kincaid <ki...@gmail.com>.
Tim, I like your idea of a hybrid approach. I've thought about trying a
hybrid approach in the past myself, but haven't had a chance to try it or
seen any papers on it. It seems you could do it by either treating the
NegEx output simply as a feature in the ML model or combining the output of
NegEx and the ML model as an ensemble of sorts. The former would probably
have the problem of the NegEx "feature" overwhelming any other features
since it would be right most of the time. If I were doing it I think I'd
start with the latter approach.

In any event, it seems like right now people will need to see how the two
systems (NegEx and ML) work on their particular data and go with whichever
is best.

- Dave

On Wed, Dec 31, 2014 at 10:40 AM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Hi Michael,
> I'm somewhat sympathetic to that opinion. But we did a bunch of
> experiments and it seemed to us that negex was too hand-tailored for a
> specific dataset and that our new module did better across datasets and
> overall. The tradeoff is that it is harder to improve and it sometimes
> gives unexpected results on the kind of inputs people input by hand for
> preliminary testing. That is a tradeoff people will have to consider and
> like Guergana said, the rule-based module is still part of cTAKES.
> (FWIW, I believe it is possible to engineer examples that make Negex
> fail in unintuitive ways as well.) If you are interested in these
> experiments please check out our paper in Plos One where we look at the
> difficulty of the polarity problem, specifically porting systems to new
> domains:
> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774
>
> I've been wondering if some hybrid approach might be useful. For
> example, maybe a system that runs the ML module and Negex and adds in
> all the recalled negated terms that Negex finds over and above the ML.
> This would probably fix some of the issues with test sentences but does
> not solve the problem of being hard to debug. Another possibility is
> using a more transparent ML method like decision trees or something.
>
> Tim
>
>
>
>
>
> On 12/31/2014 11:22 AM, Michael J Gurley wrote:
> > I think this demonstrates that machine learning is not the right approach
> > to the negation/polarity problem.
> >
> >
> > Michael Gurley
> > m-gurley@northwestern.edu
> > 312 925 3268
> > Northwestern University Clinical and Translational Sciences Institute
> > (NUCATS)
> > http://www.nucats.northwestern.edu
> > Rubloff Building
> > 750 N Lake Shore Drive, 11th Floor
> > Chicago, IL 60611
> >
> >
> >
> >
> >
> >
> >
> > On 12/31/14 9:13 AM, "Miller, Timothy"
> > <Ti...@childrens.harvard.edu> wrote:
> >
> >> Hi Yu,
> >>
> >> The new polarity module is machine-learning based so it is not always
> >> easy to diagnose accuracy issues. But generally it might mean there was
> >> no example like that in the training data. It was trained on multiple
> >> corpora, but sometimes certain phrases slip through the cracks, and
> >> "Deny hepatitis," while possible in the truncated language of clinical
> >> notes, seems like an unlikely phrase and so it may not be in our data.
> >> Is that a real example you saw or just a minimum (not) working example?
> >> If not do you have a real example (i.e. a whole sentence) where "deny"
> >> should cause a negation but does not? If so I will look into it. We have
> >> had a few reports like this so it may be worth keeping track of missed
> >> examples for future iterations of the module. It is important that they
> >> be real examples "from the wild" though.
> >>
> >> (As an aside, machine learning methods don't understand language the way
> >> people do so even if it seems obvious to a human that "Deny <disease>."
> >> should be negated, if it looks different enough from the context of an
> >> example from the training data the ML will sometimes fall back to the
> >> majority class of "Not negated".)
> >>
> >> Tim
> >>
> >>
> >> On 12/31/2014 10:03 AM, Yu Liang wrote:
> >>> I have a quick question about CTAKES.
> >>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get
> >>> some negation results by referring to polarity attribute.
> >>> However, it turns out, for example ³Negative for hepatitis², is not
> >>> negated. I think it is weird and I tried ³No hepatitis², ³ Denies
> >>> hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns
> >>> ³polarity=1².
> >>>
> >>> any one could give me some clue that what is wrong? Thank you!
> >
>
>

Re: cTakes polarity problem

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Hi Michael,
I'm somewhat sympathetic to that opinion. But we did a bunch of
experiments and it seemed to us that negex was too hand-tailored for a
specific dataset and that our new module did better across datasets and
overall. The tradeoff is that it is harder to improve and it sometimes
gives unexpected results on the kind of inputs people input by hand for
preliminary testing. That is a tradeoff people will have to consider and
like Guergana said, the rule-based module is still part of cTAKES.
(FWIW, I believe it is possible to engineer examples that make Negex
fail in unintuitive ways as well.) If you are interested in these
experiments please check out our paper in Plos One where we look at the
difficulty of the polarity problem, specifically porting systems to new
domains:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774

I've been wondering if some hybrid approach might be useful. For
example, maybe a system that runs the ML module and Negex and adds in
all the recalled negated terms that Negex finds over and above the ML.
This would probably fix some of the issues with test sentences but does
not solve the problem of being hard to debug. Another possibility is
using a more transparent ML method like decision trees or something.

Tim





On 12/31/2014 11:22 AM, Michael J Gurley wrote:
> I think this demonstrates that machine learning is not the right approach
> to the negation/polarity problem.
>
>
> Michael Gurley
> m-gurley@northwestern.edu
> 312 925 3268
> Northwestern University Clinical and Translational Sciences Institute
> (NUCATS)
> http://www.nucats.northwestern.edu
> Rubloff Building
> 750 N Lake Shore Drive, 11th Floor
> Chicago, IL 60611
>
>
>
>
>
>
>
> On 12/31/14 9:13 AM, "Miller, Timothy"
> <Ti...@childrens.harvard.edu> wrote:
>
>> Hi Yu,
>>
>> The new polarity module is machine-learning based so it is not always
>> easy to diagnose accuracy issues. But generally it might mean there was
>> no example like that in the training data. It was trained on multiple
>> corpora, but sometimes certain phrases slip through the cracks, and
>> "Deny hepatitis," while possible in the truncated language of clinical
>> notes, seems like an unlikely phrase and so it may not be in our data.
>> Is that a real example you saw or just a minimum (not) working example?
>> If not do you have a real example (i.e. a whole sentence) where "deny"
>> should cause a negation but does not? If so I will look into it. We have
>> had a few reports like this so it may be worth keeping track of missed
>> examples for future iterations of the module. It is important that they
>> be real examples "from the wild" though.
>>
>> (As an aside, machine learning methods don't understand language the way
>> people do so even if it seems obvious to a human that "Deny <disease>."
>> should be negated, if it looks different enough from the context of an
>> example from the training data the ML will sometimes fall back to the
>> majority class of "Not negated".)
>>
>> Tim
>>
>>
>> On 12/31/2014 10:03 AM, Yu Liang wrote:
>>> I have a quick question about CTAKES.
>>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get
>>> some negation results by referring to polarity attribute.
>>> However, it turns out, for example ³Negative for hepatitis², is not
>>> negated. I think it is weird and I tried ³No hepatitis², ³ Denies
>>> hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns
>>> ³polarity=1².
>>>
>>> any one could give me some clue that what is wrong? Thank you!
>


Re: cTakes polarity problem

Posted by Michael J Gurley <m-...@northwestern.edu>.
I think this demonstrates that machine learning is not the right approach
to the negation/polarity problem.


Michael Gurley
m-gurley@northwestern.edu
312 925 3268
Northwestern University Clinical and Translational Sciences Institute
(NUCATS)
http://www.nucats.northwestern.edu
Rubloff Building
750 N Lake Shore Drive, 11th Floor
Chicago, IL 60611







On 12/31/14 9:13 AM, "Miller, Timothy"
<Ti...@childrens.harvard.edu> wrote:

>Hi Yu,
>
>The new polarity module is machine-learning based so it is not always
>easy to diagnose accuracy issues. But generally it might mean there was
>no example like that in the training data. It was trained on multiple
>corpora, but sometimes certain phrases slip through the cracks, and
>"Deny hepatitis," while possible in the truncated language of clinical
>notes, seems like an unlikely phrase and so it may not be in our data.
>Is that a real example you saw or just a minimum (not) working example?
>If not do you have a real example (i.e. a whole sentence) where "deny"
>should cause a negation but does not? If so I will look into it. We have
>had a few reports like this so it may be worth keeping track of missed
>examples for future iterations of the module. It is important that they
>be real examples "from the wild" though.
>
>(As an aside, machine learning methods don't understand language the way
>people do so even if it seems obvious to a human that "Deny <disease>."
>should be negated, if it looks different enough from the context of an
>example from the training data the ML will sometimes fall back to the
>majority class of "Not negated".)
>
>Tim
>
>
>On 12/31/2014 10:03 AM, Yu Liang wrote:
>> I have a quick question about CTAKES.
>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get
>>some negation results by referring to polarity attribute.
>> However, it turns out, for example ³Negative for hepatitis², is not
>>negated. I think it is weird and I tried ³No hepatitis², ³ Denies
>>hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns
>>³polarity=1².
>>
>> any one could give me some clue that what is wrong? Thank you!
>


Re: cTakes polarity problem

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Hi Yu,

The new polarity module is machine-learning based so it is not always
easy to diagnose accuracy issues. But generally it might mean there was
no example like that in the training data. It was trained on multiple
corpora, but sometimes certain phrases slip through the cracks, and
"Deny hepatitis," while possible in the truncated language of clinical
notes, seems like an unlikely phrase and so it may not be in our data.
Is that a real example you saw or just a minimum (not) working example?
If not do you have a real example (i.e. a whole sentence) where "deny"
should cause a negation but does not? If so I will look into it. We have
had a few reports like this so it may be worth keeping track of missed
examples for future iterations of the module. It is important that they
be real examples "from the wild" though.

(As an aside, machine learning methods don't understand language the way
people do so even if it seems obvious to a human that "Deny <disease>."
should be negated, if it looks different enough from the context of an
example from the training data the ML will sometimes fall back to the
majority class of "Not negated".)

Tim


On 12/31/2014 10:03 AM, Yu Liang wrote:
> I have a quick question about CTAKES.
> I am using AE “AggregatePlaintextUMLSProcessor.xml” and want to get some negation results by referring to polarity attribute.
> However, it turns out, for example “Negative for hepatitis”, is not negated. I think it is weird and I tried “No hepatitis”, “ Denies hepatitis” which return “polarity= -1”, but “Deny hepatitis.” returns “polarity=1”.
>
> any one could give me some clue that what is wrong? Thank you!