You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by "Miller, Timothy" <Ti...@childrens.harvard.edu> on 2019/01/16 15:58:30 UTC

Re: Question about negation [EXTERNAL]

It uses an SVM model. The training data is from a project called SHARPn, it is notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? That sounds more like a command than documentation of a negated concept ("denies" or "denied" would seem more common?). Even if that is a real example, I think it's unusual enough that there are probably not examples of "Deny X" in the training data.

Tim


-----Original Message-----
From: ouyeyu panyu <ouyeyu@gmail.com<mailto:ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
Reply-to: <us...@ctakes.apache.org>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>, dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is the training data and what machine learning algorithm is used? LogisticRegress, SVM, RandomForest or something else?
Thanks.

Re: Question about negation [EXTERNAL]

Posted by ouyeyu panyu <ou...@gmail.com>.

Thanks a lot, Timothy.
I really appreciate your help.


On Wed, Jan 16, 2019 at 8:16 AM Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> No, SHARPn was a later project. I'm not sure if there is any overlap in
> the datasets.
>
> There are 2 ways to look at the features, one is to read this paper:
> https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774
>
> and another is to look at the source:
>
> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/cleartk/AssertionCleartkAnalysisEngine.java?view=markup
>
> Tim
>
> -----Original Message-----
> *From*: ouyeyu panyu <ouyeyu@gmail.com
> <ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
> Reply-to: <us...@ctakes.apache.org>
> *To*: user@ctakes.apache.org
> *Cc*: dev@ctakes.apache.org <dev@ctakes.apache.org
> <%22dev@ctakes.apache.org%22%20%3cdev@ctakes.apache.org%3e>>
> *Subject*: Re: Question about negation [EXTERNAL]
> *Date*: Wed, 16 Jan 2019 08:09:06 -0800
>
> Hi Timothy,
>
> Thank you very much for the quick response.
>
>
> https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__pdfs.semanticscholar.org_8f2c_a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=bdfSiGGOpy6_mnRe0CZd0-wjjUpY-DH7SrOU5_WMkZE&s=UhoZqDN8rO9tb4R791cI7gKRT7zn_O2yZ8VZpbsD3Ek&e=>
> says
> The Mayo-derived linguistically annotated corpus (Mayo) was developed
> in-house and consisted of 273 clinical notes (100 650 tokens; 7299
> sentences; 61 consult; 1 discharge summary; 4 educational visit; 4 general
> medical examination; 48 limited exam; 19 multi-system evaluation; 43
> miscellaneous; 1 preoperative medical evaluation; 3 report; 3 specialty
> evaluation; 5 dismissal summary; 73 subsequent visit; 5 therapy; 3
> test-oriented miscellaneous).
>
> Is SHARPn based on the aforementioned 273 clinical notes?
> Also is there a way for me to look into the trained SVM model? Say what
> are features there and their weights?
>
> Best,
> Yu Pan
>
>
> On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
> It uses an SVM model. The training data is from a project called SHARPn,
> it is notes from Mayo Clinic with a variety of note types and specialties
> represented.
>
> As for the example, is it a real example that someone wrote "Deny
> hepatitis"? That sounds more like a command than documentation of a negated
> concept ("denies" or "denied" would seem more common?). Even if that is a
> real example, I think it's unusual enough that there are probably not
> examples of "Deny X" in the training data.
>
> Tim
>
>
> -----Original Message-----
> *From*: ouyeyu panyu <ouyeyu@gmail.com
> <ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
> Reply-to: <us...@ctakes.apache.org>
> *To*: user@ctakes.apache.org, dev@ctakes.apache.org
> *Subject*: Question about negation [EXTERNAL]
> *Date*: Wed, 16 Jan 2019 07:51:20 -0800
>
> Hi ctakes dev team,
>
>
> I have one question, hope someone can help me with it.
>
> For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis”
> returns polarity=1.
>
> It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for
> negation, which is machine learning based.
>
> It seems this issue is caused by the training data. Is this true? And what
> is the training data and what machine learning algorithm is used?
> LogisticRegress, SVM, RandomForest or something else?
>
> Thanks.
>
>

Re: Question about negation [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

No, SHARPn was a later project. I'm not sure if there is any overlap in the datasets.

There are 2 ways to look at the features, one is to read this paper:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774

and another is to look at the source:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/cleartk/AssertionCleartkAnalysisEngine.java?view=markup

Tim

-----Original Message-----
From: ouyeyu panyu <ouyeyu@gmail.com<mailto:ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
Reply-to: <us...@ctakes.apache.org>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: dev@ctakes.apache.org <dev@ctakes.apache.org<mailto:%22dev@ctakes.apache.org%22%20%3cdev@ctakes.apache.org%3e>>
Subject: Re: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 08:09:06 -0800

Hi Timothy,

Thank you very much for the quick response.

https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__pdfs.semanticscholar.org_8f2c_a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=bdfSiGGOpy6_mnRe0CZd0-wjjUpY-DH7SrOU5_WMkZE&s=UhoZqDN8rO9tb4R791cI7gKRT7zn_O2yZ8VZpbsD3Ek&e=> says
The Mayo-derived linguistically annotated corpus (Mayo) was developed in-house and consisted of 273 clinical notes (100 650 tokens; 7299 sentences; 61 consult; 1 discharge summary; 4 educational visit; 4 general medical examination; 48 limited exam; 19 multi-system evaluation; 43 miscellaneous; 1 preoperative medical evaluation; 3 report; 3 specialty evaluation; 5 dismissal summary; 73 subsequent visit; 5 therapy; 3 test-oriented miscellaneous).

Is SHARPn based on the aforementioned 273 clinical notes?
Also is there a way for me to look into the trained SVM model? Say what are features there and their weights?

Best,
Yu Pan


On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy <Ti...@childrens.harvard.edu>> wrote:
It uses an SVM model. The training data is from a project called SHARPn, it is notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? That sounds more like a command than documentation of a negated concept ("denies" or "denied" would seem more common?). Even if that is a real example, I think it's unusual enough that there are probably not examples of "Deny X" in the training data.

Tim


-----Original Message-----
From: ouyeyu panyu <ouyeyu@gmail.com<mailto:ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
Reply-to: <us...@ctakes.apache.org>>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>, dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is the training data and what machine learning algorithm is used? LogisticRegress, SVM, RandomForest or something else?
Thanks.

Re: Question about negation [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

No, SHARPn was a later project. I'm not sure if there is any overlap in the datasets.

There are 2 ways to look at the features, one is to read this paper:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774

and another is to look at the source:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/cleartk/AssertionCleartkAnalysisEngine.java?view=markup

Tim

-----Original Message-----
From: ouyeyu panyu <ouyeyu@gmail.com<mailto:ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
Reply-to: <us...@ctakes.apache.org>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: dev@ctakes.apache.org <dev@ctakes.apache.org<mailto:%22dev@ctakes.apache.org%22%20%3cdev@ctakes.apache.org%3e>>
Subject: Re: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 08:09:06 -0800

Hi Timothy,

Thank you very much for the quick response.

https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__pdfs.semanticscholar.org_8f2c_a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=bdfSiGGOpy6_mnRe0CZd0-wjjUpY-DH7SrOU5_WMkZE&s=UhoZqDN8rO9tb4R791cI7gKRT7zn_O2yZ8VZpbsD3Ek&e=> says
The Mayo-derived linguistically annotated corpus (Mayo) was developed in-house and consisted of 273 clinical notes (100 650 tokens; 7299 sentences; 61 consult; 1 discharge summary; 4 educational visit; 4 general medical examination; 48 limited exam; 19 multi-system evaluation; 43 miscellaneous; 1 preoperative medical evaluation; 3 report; 3 specialty evaluation; 5 dismissal summary; 73 subsequent visit; 5 therapy; 3 test-oriented miscellaneous).

Is SHARPn based on the aforementioned 273 clinical notes?
Also is there a way for me to look into the trained SVM model? Say what are features there and their weights?

Best,
Yu Pan


On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy <Ti...@childrens.harvard.edu>> wrote:
It uses an SVM model. The training data is from a project called SHARPn, it is notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? That sounds more like a command than documentation of a negated concept ("denies" or "denied" would seem more common?). Even if that is a real example, I think it's unusual enough that there are probably not examples of "Deny X" in the training data.

Tim


-----Original Message-----
From: ouyeyu panyu <ouyeyu@gmail.com<mailto:ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
Reply-to: <us...@ctakes.apache.org>>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>, dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is the training data and what machine learning algorithm is used? LogisticRegress, SVM, RandomForest or something else?
Thanks.

Re: Question about negation [EXTERNAL]

Posted by ouyeyu panyu <ou...@gmail.com>.

Hi Timothy,

Thank you very much for the quick response.

https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf
says
The Mayo-derived linguistically annotated corpus (Mayo) was developed
in-house and consisted of 273 clinical notes (100 650 tokens; 7299
sentences; 61 consult; 1 discharge summary; 4 educational visit; 4 general
medical examination; 48 limited exam; 19 multi-system evaluation; 43
miscellaneous; 1 preoperative medical evaluation; 3 report; 3 specialty
evaluation; 5 dismissal summary; 73 subsequent visit; 5 therapy; 3
test-oriented miscellaneous).

Is SHARPn based on the aforementioned 273 clinical notes?
Also is there a way for me to look into the trained SVM model? Say what are
features there and their weights?

Best,
Yu Pan

On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> It uses an SVM model. The training data is from a project called SHARPn,
> it is notes from Mayo Clinic with a variety of note types and specialties
> represented.
>
> As for the example, is it a real example that someone wrote "Deny
> hepatitis"? That sounds more like a command than documentation of a negated
> concept ("denies" or "denied" would seem more common?). Even if that is a
> real example, I think it's unusual enough that there are probably not
> examples of "Deny X" in the training data.
>
> Tim
>
>
> -----Original Message-----
> *From*: ouyeyu panyu <ouyeyu@gmail.com
> <ouyeyu%20panyu%20%3couyeyu@gmail.com%3e>>
> Reply-to: <us...@ctakes.apache.org>
> *To*: user@ctakes.apache.org, dev@ctakes.apache.org
> *Subject*: Question about negation [EXTERNAL]
> *Date*: Wed, 16 Jan 2019 07:51:20 -0800
>
> Hi ctakes dev team,
>
>
> I have one question, hope someone can help me with it.
>
> For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis”
> returns polarity=1.
>
> It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for
> negation, which is machine learning based.
>
> It seems this issue is caused by the training data. Is this true? And what
> is the training data and what machine learning algorithm is used?
> LogisticRegress, SVM, RandomForest or something else?
>
> Thanks.
>