You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by "Assur, Ted" <Th...@providence.org> on 2013/09/04 02:24:07 UTC

specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.

For example,
CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.

CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.

cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."

However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.

Is there a way to tune the detection of UMLS concepts?




--------------------------------------------
Ted Assur
IT Solutions Architect for Cancer Research
Providence Health & Services
ted.assur@providence.org
503-215-6476

Crede, ut intelligas.
Intellego, ut credam.




  ________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

This may sound strange, but SNOMED does not contain the term "CIN I".  It contains the terms "CIN I - Cervical intraepitheal neoplasia 1" and "CIN I - mild dyskaryosis".  

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org] 
Sent: Tuesday, September 03, 2013 10:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED 
> (though I don't fully understand what all the symbols mean in the umls 
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1 
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of 
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial 
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was 
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as 
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only 
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy 
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm 
>>> not sure if that is a correct context but I was able to duplicate 
>>> your findings. (Finds a CUI for CIN III but not if you change it to 
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems 
>>> to get it right, as CIN II and CIN III are both called NPs, and 
>>> similarly the LookupWindowAnnotator handles them both identically. 
>>> So that suggests it is a problem with the actual lookup of the 
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more 
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health & 
>>>> Services ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Ted,

In addition to performing searches, 
>  the hyperSql ( http://hsqldb.org/ ) database tool
should allow you to perform inserts into the umls dictionary database used by cTakes.

You can also create your own customized dictionary and run cTakes using only that dictionary or with umls plus that dictionary.  There are several ways to create a custom dictionary, and I think that you can start by looking in the resources/ ... /dictionary/lookup/ directory for examples.  It can be a little overwhelming if you just want to add one or two terms, and I am in the process of trying to make this a little easier for any user.  It may be a while before I can add my work to the trunk.   Until then, if you decide to go with the csv approach you can probably make it through with the examples in cTakes resources.  If you want to create a new hsql database then I can send you my (old) instructions on that process - but it might be overkill.

If you really want to know what lies behind the mask of the cTakes umls dictionary then I highly recommend that you just interface with it directly using the hsql tool.

Sean

________________________________________
From: Assur, Ted [Theodore.Assur@providence.org]
Sent: Friday, November 01, 2013 5:36 PM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

OK, Kind of resurfacing the original topic on this one, after I redirected it towards ICD codes last month:

I have several examples, like the one below, where it would be very helpful to be able to include UMLS terms that are in the UMLS 2011AB release, e.g. "CIN 1" (CUI = C0349458).

So if I have particular UMLS concepts I want to make sure and include, is there a way for me to *add* them to the umls dictionary used by cTAKES?

Ted

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 04, 2013 9:37 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

I don't know if this is exactly what you want, but you can use the hyperSql ( http://hsqldb.org/ ) database tool to perform searches on the umls dictionary used by cTakes.
For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide all the available terms starting with CIN.  In the result you'll see that there is no term "CIN I", and you'll also see that the only listing from ICD9 is for "CIN III" [C0851140, T191, MTHICD9 233.1]

If you want an icd9 code that isn't in the cTakes umls dictionary then you can find it online ... but that won't do you much good wrt cTakes.

Sean

-----Original Message-----
From: Assur, Ted [mailto:Theodore.Assur@providence.org]
Sent: Wednesday, September 04, 2013 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity?

Thank you

Ted

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>> Services ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>

________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Assur, Ted" <Th...@providence.org>.

OK, Kind of resurfacing the original topic on this one, after I redirected it towards ICD codes last month:

I have several examples, like the one below, where it would be very helpful to be able to include UMLS terms that are in the UMLS 2011AB release, e.g. "CIN 1" (CUI = C0349458).

So if I have particular UMLS concepts I want to make sure and include, is there a way for me to *add* them to the umls dictionary used by cTAKES?

Ted

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 04, 2013 9:37 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

I don't know if this is exactly what you want, but you can use the hyperSql ( http://hsqldb.org/ ) database tool to perform searches on the umls dictionary used by cTakes.
For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide all the available terms starting with CIN.  In the result you'll see that there is no term "CIN I", and you'll also see that the only listing from ICD9 is for "CIN III" [C0851140, T191, MTHICD9 233.1]

If you want an icd9 code that isn't in the cTakes umls dictionary then you can find it online ... but that won't do you much good wrt cTakes.

Sean

-----Original Message-----
From: Assur, Ted [mailto:Theodore.Assur@providence.org]
Sent: Wednesday, September 04, 2013 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity?

Thank you

Ted

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>> Services ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>

________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

I don't know if this is exactly what you want, but you can use the hyperSql ( http://hsqldb.org/ ) database tool to perform searches on the umls dictionary used by cTakes.  
For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide all the available terms starting with CIN.  In the result you'll see that there is no term "CIN I", and you'll also see that the only listing from ICD9 is for "CIN III" [C0851140, T191, MTHICD9 233.1]

If you want an icd9 code that isn't in the cTakes umls dictionary then you can find it online ... but that won't do you much good wrt cTakes.

Sean

-----Original Message-----
From: Assur, Ted [mailto:Theodore.Assur@providence.org] 
Sent: Wednesday, September 04, 2013 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity?

Thank you

Ted

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED 
> (though I don't fully understand what all the symbols mean in the umls 
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1 
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of 
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial 
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was 
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as 
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only 
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy 
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm 
>>> not sure if that is a correct context but I was able to duplicate 
>>> your findings. (Finds a CUI for CIN III but not if you change it to 
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems 
>>> to get it right, as CIN II and CIN III are both called NPs, and 
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the 
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more 
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health & 
>>>> Services ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>


________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Masanz, James J." <Ma...@mayo.edu>.

although cTAKES uses ICD9 entries when finding Named Entities, out of the box it doesn't assign ICD9 codes to the named entities, it assigns SNOMED-CT codes.
If some text matches an ICD9 term, and the ICD9 term has the same CUI as some SNOMED-CT term(s), the SNOMED-CT code for that SNOMED-CT term(s) is assigned to the annotation (along with the UMLS CUI), even if the SNOMED-CT term and the ICD9 term don't share any words.

Hope that helps

-- James




________________________________________
From: dev-return-1961-Masanz.James=mayo.edu@ctakes.apache.org [dev-return-1961-Masanz.James=mayo.edu@ctakes.apache.org] on behalf of Assur, Ted [Theodore.Assur@providence.org]
Sent: Wednesday, September 04, 2013 10:55 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity?

Thank you

Ted

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>> Services ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>


________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Vogel, James" <JV...@activehealth.net>.

Is anyone able to provide any more detailed guidance on what I'd need to change to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql database that would contain the ICD9 data?

Thanks.

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
Sent: Monday, September 16, 2013 7:25 AM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config and impl that would need to be made to get the ICD9 codes?
>
> -----Original Message-----
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> <Th...@providence.org>wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -----Original Message-----
>> From: Pei Chen [mailto:chenpei@apache.org]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>>>> able to perform the lookup successfully.
>>>> Note that CIN II synonyms do exist in other umls thersauses such as
>>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>>>
>>>> --Pei
>>>>
>>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>> That is a good question, Ted!
>>>>>
>>>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>>>> not sure if that is a correct context but I was able to duplicate
>>>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>>>> CIN II)
>>>>>
>>>>> My first thought was that it is the chunker. But the chunker seems
>>>>> to get it right, as CIN II and CIN III are both called NPs, and
>>>>> similarly the LookupWindowAnnotator handles them both identically.
>>>>> So that suggests it is a problem with the actual lookup of the
>>>>> tokens in the LookupWindow.
>>>>>
>>>>> That's all I can do for now but maybe someone else who knows more
>>>>> about its behavior offhand will have an idea.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>>>> I'm trying to understand what would prevent the
>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
>> that are defined in the UMLS version used by cTAKES.
>>>>>> For example,
>>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
>> parsed out as UMLS CUI C0206708.
>>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
>> Roman Numerals, I,II, and III.
>>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
>> C0851140: "Carcinoma in situ of uterine cervix."
>>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
>> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and
>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>>> Is there a way to tune the detection of UMLS concepts?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Ted Assur
>>>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>>>> Services ted.assur@providence.org
>>>>>> 503-215-6476
>>>>>>
>>>>>> Crede, ut intelligas.
>>>>>> Intellego, ut credam.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   ________________________________
>>>>>>
>>>>>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>> ________________________________
>>
>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>>
> IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.
>


IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

Text not matched

Posted by "Vogel, James" <JV...@activehealth.net>.

No concept or annotation is created for "Birbeck granule deficiency" when I run the clinical pipeline in the CVD.  I see it in the UMLS Metathesaurus Browser at https://uts.nlm.nih.gov//metathesaurus.html#C3150657;0;1;CUI;2012AB;EXACT_MATCH;*<https://uts.nlm.nih.gov/metathesaurus.html#C3150657;0;1;CUI;2012AB;EXACT_MATCH;*>; I see that its semantic type of T047 is in LookupDesc_DB.xml for the clinical pipeline. Is there additional configuration needed to match this term?



"deficiency" is annotated.





________________________________
IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

RE: Text not matched

Posted by "Vogel, James" <JV...@activehealth.net>.

Sorry if I wasted anyone's time but this was caused by a dumb mistake.  I'd commented out the line of code that added the SNOMED hits while trying to switch to ICD9 and didn't fully roll-back my changes.

From: Vogel, James
Sent: Monday, September 30, 2013 10:14 PM
To: 'dev@ctakes.apache.org'
Subject: Text not matched

No concept or annotation is created for "Birbeck granule deficiency" when I run the clinical pipeline in the CVD.  I see it in the UMLS Metathesaurus Browser at https://uts.nlm.nih.gov//metathesaurus.html#C3150657;0;1;CUI;2012AB;EXACT_MATCH;*<https://uts.nlm.nih.gov/metathesaurus.html#C3150657;0;1;CUI;2012AB;EXACT_MATCH;*>; I see that its semantic type of T047 is in LookupDesc_DB.xml for the clinical pipeline. Is there additional configuration needed to match this term?

"deficiency" is annotated.

________________________________
IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Hi James,
Glad you were able to make cTAKES work for your use case.  

The UMLS subset that is currently included in the resources should be:
*	International Classification of Diseases, Ninth Revision, Clinical Modification, 2012	ICD9CM_2012	ICD9CM	ENG	0	20997
*	International Classification of Diseases, Ninth Revision, Clinical Modification, Metathesaurus additional entry terms, 2012	MTHICD9_2012	ICD9CM	ENG	0	16304
*	Medical Subject Headings, 2012_2011_09_09	MSH2012_2011_09_09	MSH	ENG	0	321367
*	NCI Thesaurus, 2011_02D	NCI2011_02D	NCI	ENG	0	90135
*	SNOMED Clinical Terms, 2011_07_31	SNOMEDCT_2011_07_31	SNOMEDCT	ENG	9	324494

And also RxNorm for the rxnorm_index folder.
(I think there was a readme about it, if not, let's at least add it to the User FAQ's?)

--Pei

> -----Original Message-----
> From: Vogel, James [mailto:JVogel@activehealth.net]
> Sent: Monday, September 30, 2013 11:41 AM
> To: dev@ctakes.apache.org
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> That worked and I see how I can change the code to do both SNOMED and
> ICD9.
> I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON
> umls_ms_2011ab (cui);  I needed to change the database from 'read-only', is
> that going to cause any other problems?
> 
> What subset of ICD9 is in the dictionary?
> 
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Friday, September 27, 2013 11:26 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> James,
> Obviously it would be best to customize the code and/or the dictionary for
> your particular case.
> But if you want to try something that will work without any code changes,
> you can try the below in your LookupDesc_Db.xml Essentially, what it will do
> is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will
> allow you to specify an SQL statement that maps the CUI's to Codes.  Couple
> by the fact that there already is a table called umls_ms_2011ab which
> contains the codes and cui's from many different sources including ICD9CM.
> What you could do is just reuse the table as the mapping table as well and
> specify the source such as:
> select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'
> 
> (The downside is that I don't think there is a index on sourcetype so
> performance may suck).
> I've attached an example to normalize to ICD9CM codes instead of
> SNOMEDCT.
> <lookupConsumer
> className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbCons
> umerImpl">
> <properties>
> <property key="codingScheme" value="ICD9CM"/> <property
> key="cuiMetaField" value="cui"/> <property key="tuiMetaField"
> value="tui"/> <property key="anatomicalSiteTuis"
> value="T021,T022,T023,T024,T025,T026,T029,T030"/>
> <property key="procedureTuis" value="T059,T060,T061"/> <property
> key="disorderTuis"
> value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>
> <property key="findingTuis"
> value="T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/>
> <property key="dbConnExtResrcKey" value="DbConnection"/> <property
> key="mapPrepStmt" value="select code from umls_ms_2011ab where cui=?
> and sourcetype='ICD9CM'"/> </properties> </lookupConsumer>
> 
> On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen
> <ch...@apache.org>> wrote:
> James,
> One can try the NamedEntityLookupConsumerImpl instead of
> UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
> contain SNOMED codes.
> Will you need to preserve the TUI?  One thing is that
> NamedEntityLookupConsumerImpl will return back all of the hits, except that
> it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
> we should make the NamedEntityLookupConsumerImpl a bit more general.
> 
> --Pei
> 
> On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James
> <JV...@activehealth.net>> wrote:
> I now see that I use a query on umls_ms_2011ab where sourcetype =
> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
> ICD9CM annotations / concepts or do I change the code in consumeHits() or
> getSnomedCodes()?
> 
> -----Original Message-----
> From: Vogel, James
> Sent: Friday, September 27, 2013 6:30 PM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> Is anyone able to provide any more detailed guidance on what I'd need to
> change to add the ICD9 codes as tags, e.g., where do I look for the tables in
> the hsql database that would contain the ICD9 data?
> 
> Thanks.
> 
> -----Original Message-----
> From: Miller, Timothy
> [mailto:Timothy.Miller@childrens.harvard.edu<mailto:Timothy.Miller@childr
> ens.harvard.edu>]
> Sent: Monday, September 16, 2013 7:25 AM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> James,
> I haven't done it myself, so I don't know exactly how the config changes, but
> I know roughly where to look.  In the LookupDesc_Db.xml, the
> <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under the
> <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
> I believe this is where the actual dictionary filtering is done. There is also a
> consumer class called
> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl
> and a mapPrepStmt field with a SQL query that might need changing. That is
> where I would start looking, I'm not sure whether you would need to write a
> new consumer class, and what values the codingScheme field can take, but
> hopefully this helps you get started until someone else chimes in with more
> detailed info!
> 
> Tim
> 
> On 09/15/2013 08:39 PM, Vogel, James wrote:
> > Any more guidance you can give about the nature of the changes to the
> config and impl that would need to be made to get the ICD9 codes?
> >
> > -----Original Message-----
> > From: Pei Chen
> [mailto:chenpei@apache.org<ma...@apache.org>]
> > Sent: Wednesday, September 04, 2013 1:02 PM
> > To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> > Subject: Re: specificity in selecting EntityMentions when using
> > AggregatePlaintextUMLSProcessor
> >
> > Ted,
> >
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> > familiar> with how to access that information: In the example I've
> > described below,
> >
> >> where would I locate the ICD9 for a specific entity?
> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT
> > code or RxNorm code.
> >
> > [1]
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
> >
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_
> > Db.xml
> >
> > [2]
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/
> >
> src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedCon
> su
> > merImpl.java
> >
> >  If you would like it to return ICD9 codes, one would need to
> > modify/configure the above...
> >
> > --Pei
> >
> >
> > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> >
> <Theodore.Assur@providence.org<mailto:Theodore.Assur@providence.org
> >>wrote:
> >
> >> Thanks for looking into this, it's been puzzling me.
> >>
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> >> familiar with how to access that information: In the example I've
> >> described below, where would I locate the ICD9 for a specific entity?
> >>
> >> Thank you
> >>
> >> Ted
> >>
> >> -----Original Message-----
> >> From: Pei Chen
> [mailto:chenpei@apache.org<ma...@apache.org>]
> >> Sent: Tuesday, September 03, 2013 7:13 PM
> >> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> >> Subject: Re: specificity in selecting EntityMentions when using
> >> AggregatePlaintextUMLSProcessor
> >>
> >> You're right, it should have gotten "CIN I"- that's a strange one,
> >> probably needs to be debugged/looked into further...
> >>
> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> >>
> Timothy.Miller@childrens.harvard.edu<mailto:Timothy.Miller@childrens.har
> vard.edu>> wrote:
> >>> Ah. So it will get
> >>> CIN 2 (in SNOMED)
> >>> CIN III (in SNOMED)
> >>> CIN 3 (in SNOMED)
> >>>
> >>> but the rest are not in SNOMED?
> >>>
> >>> I wonder why it doesn't get CIN I? It looks like that exists in
> >>> SNOMED (though I don't fully understand what all the symbols mean in
> >>> the umls browser).
> >>>
> >>>> CIN I - Cervical intraepithelial neoplasia 1
> >>>> [A3002690/SNOMEDCT/SY/285836003]
> >>>
> >>> On 09/03/2013 09:55 PM, Pei Chen wrote:
> >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some
> >>>> of the terms do not exist in SNOMED- CIN 2 - Cervical
> >>>> intraepithelial neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists
> but not CIN II.
> >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it
> >>>> was able to perform the lookup successfully.
> >>>> Note that CIN II synonyms do exist in other umls thersauses such as
> >>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries
> >>>> only contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>>>
> >>>> --Pei
> >>>>
> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >>>>
> <Timothy.Miller@childrens.harvard.edu<mailto:Timothy.Miller@childrens.ha
> rvard.edu>> wrote:
> >>>>> That is a good question, Ted!
> >>>>>
> >>>>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>>>> not sure if that is a correct context but I was able to duplicate
> >>>>> your findings. (Finds a CUI for CIN III but not if you change it
> >>>>> to CIN II)
> >>>>>
> >>>>> My first thought was that it is the chunker. But the chunker seems
> >>>>> to get it right, as CIN II and CIN III are both called NPs, and
> >>>>> similarly the LookupWindowAnnotator handles them both identically.
> >>>>> So that suggests it is a problem with the actual lookup of the
> >>>>> tokens in the LookupWindow.
> >>>>>
> >>>>> That's all I can do for now but maybe someone else who knows more
> >>>>> about its behavior offhand will have an idea.
> >>>>>
> >>>>> Tim
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>>>> I'm trying to understand what would prevent the
> >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific
> >> problems that are defined in the UMLS version used by cTAKES.
> >>>>>> For example,
> >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
> >> parsed out as UMLS CUI C0206708.
> >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported
> >>>>>> with
> >> Roman Numerals, I,II, and III.
> >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
> >> C0851140: "Carcinoma in situ of uterine cervix."
> >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN
> >>>>>> II
> >> as their correct concepts, "Cervical intraepithelial neoplasia grade
> >> 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
> >>>>>> Is there a way to tune the detection of UMLS concepts?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --------------------------------------------
> >>>>>> Ted Assur
> >>>>>> IT Solutions Architect for Cancer Research Providence Health &
> >>>>>> Services
> >>>>>> ted.assur@providence.org<ma...@providence.org>
> >>>>>> 503-215-6476<tel:503-215-6476>
> >>>>>>
> >>>>>> Crede, ut intelligas.
> >>>>>> Intellego, ut credam.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   ________________________________
> >>>>>>
> >>>>>> This message is intended for the sole use of the addressee, and
> >>>>>> may
> >> contain information that is privileged, confidential and exempt from
> >> disclosure under applicable law. If you are not the addressee you are
> >> hereby notified that you may not use, copy, disclose, or distribute
> >> to anyone the message or any information contained in the message. If
> >> you have received this message in error, please immediately advise
> >> the sender by reply email and delete this message.
> >>
> >> ________________________________
> >>
> >> This message is intended for the sole use of the addressee, and may
> >> contain information that is privileged, confidential and exempt from
> >> disclosure under applicable law. If you are not the addressee you are
> >> hereby notified that you may not use, copy, disclose, or distribute
> >> to anyone the message or any information contained in the message. If
> >> you have received this message in error, please immediately advise
> >> the sender by reply email and delete this message.
> >>
> >>
> > IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.
> >
> 
> 
> IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.
> 
> 
> 
> ________________________________
> IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Vogel, James" <JV...@activehealth.net>.

That worked and I see how I can change the code to do both SNOMED and ICD9.
I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON umls_ms_2011ab (cui);  I needed to change the database from 'read-only', is that going to cause any other problems?

What subset of ICD9 is in the dictionary?

From: Pei Chen [mailto:chenpei@apache.org]
Sent: Friday, September 27, 2013 11:26 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

James,
Obviously it would be best to customize the code and/or the dictionary for your particular case.
But if you want to try something that will work without any code changes, you can try the below in your LookupDesc_Db.xml
Essentially, what it will do is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that maps the CUI's to Codes.  Couple by the fact that there already is a table called umls_ms_2011ab which contains the codes and cui's from many different sources including ICD9CM.
What you could do is just reuse the table as the mapping table as well and specify the source such as:
select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'

(The downside is that I don't think there is a index on sourcetype so performance may suck).
I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT.
<lookupConsumer className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl">
<properties>
<property key="codingScheme" value="ICD9CM"/>
<property key="cuiMetaField" value="cui"/>
<property key="tuiMetaField" value="tui"/>
<property key="anatomicalSiteTuis" value="T021,T022,T023,T024,T025,T026,T029,T030"/>
<property key="procedureTuis" value="T059,T060,T061"/>
<property key="disorderTuis" value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>
<property key="findingTuis" value="T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/>
<property key="dbConnExtResrcKey" value="DbConnection"/>
<property key="mapPrepStmt" value="select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'"/>
</properties>
</lookupConsumer>

On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen <ch...@apache.org>> wrote:
James,
One can try the NamedEntityLookupConsumerImpl instead of UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only contain SNOMED codes.
Will you need to preserve the TUI?  One thing is that NamedEntityLookupConsumerImpl will return back all of the hits, except that it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps we should make the NamedEntityLookupConsumerImpl a bit more general.

--Pei

On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James <JV...@activehealth.net>> wrote:
I now see that I use a query on umls_ms_2011ab where sourcetype = 'ICD9CM'.  Is there a way to use an existing AE or class to add additional ICD9CM annotations / concepts or do I change the code in consumeHits() or getSnomedCodes()?

-----Original Message-----
From: Vogel, James
Sent: Friday, September 27, 2013 6:30 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Is anyone able to provide any more detailed guidance on what I'd need to change to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql database that would contain the ICD9 data?

Thanks.

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu<ma...@childrens.harvard.edu>]
Sent: Monday, September 16, 2013 7:25 AM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config and impl that would need to be made to get the ICD9 codes?
>
> -----Original Message-----
> From: Pei Chen [mailto:chenpei@apache.org<ma...@apache.org>]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> <Th...@providence.org>>wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -----Original Message-----
>> From: Pei Chen [mailto:chenpei@apache.org<ma...@apache.org>]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu<ma...@childrens.harvard.edu>> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>>>> able to perform the lookup successfully.
>>>> Note that CIN II synonyms do exist in other umls thersauses such as
>>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>>>
>>>> --Pei
>>>>
>>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>>> <Ti...@childrens.harvard.edu>> wrote:
>>>>> That is a good question, Ted!
>>>>>
>>>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>>>> not sure if that is a correct context but I was able to duplicate
>>>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>>>> CIN II)
>>>>>
>>>>> My first thought was that it is the chunker. But the chunker seems
>>>>> to get it right, as CIN II and CIN III are both called NPs, and
>>>>> similarly the LookupWindowAnnotator handles them both identically.
>>>>> So that suggests it is a problem with the actual lookup of the
>>>>> tokens in the LookupWindow.
>>>>>
>>>>> That's all I can do for now but maybe someone else who knows more
>>>>> about its behavior offhand will have an idea.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>>>> I'm trying to understand what would prevent the
>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
>> that are defined in the UMLS version used by cTAKES.
>>>>>> For example,
>>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
>> parsed out as UMLS CUI C0206708.
>>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
>> Roman Numerals, I,II, and III.
>>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
>> C0851140: "Carcinoma in situ of uterine cervix."
>>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
>> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and
>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>>> Is there a way to tune the detection of UMLS concepts?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Ted Assur
>>>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>>>> Services ted.assur@providence.org<ma...@providence.org>
>>>>>> 503-215-6476<tel:503-215-6476>
>>>>>>
>>>>>> Crede, ut intelligas.
>>>>>> Intellego, ut credam.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   ________________________________
>>>>>>
>>>>>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>> ________________________________
>>
>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>>
> IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.
>

IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

________________________________
IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

RE: sentence number in WordToken

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Agreed.  I think we can use the ASF process here and the dev mailing list seems to work nicely.
I.e. discuss changes here; call a [VOTE] only if there is contention.  
But from what I have seen, the community has been able to reach a consensus and play nicely so far.

--Pei

> -----Original Message-----
> From: Wu, Stephen T., Ph.D. [mailto:Wu.Stephen@mayo.edu]
> Sent: Wednesday, October 02, 2013 11:32 AM
> To: dev@ctakes.apache.org; samir chabou
> Subject: Re: sentence number in WordToken
> 
> Hmm, we should probably have a process to vote up or down type system
> changes like this, since they affect everyone.
> In this case I'd agree with the others: don't add it.
> 
> stephen
> 
> 
> 
> On 9/30/13 11:21 AM, "samir chabou" <sa...@yahoo.com> wrote:
> 
> >thanks for the feed back it's a good point, I did it also with
> >selectCovering but as Richard mention I'll changed to indexCovering
> >since it's faster.
> >Samir
> >
> >
> >
> >
> >________________________________
> > From: "Chen, Pei" <Pe...@childrens.harvard.edu>
> >To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou
> ><sa...@yahoo.com>
> >Sent: Monday, September 30, 2013 12:10:45 PM
> >Subject: RE: sentence number in  WordToken
> >
> >
> >Samir,
> >I think Richard has a good point here.   What is the use to require
> >adding sentenceNumber() to BaseToken in the TypeSystem?
> >If it's only temporary, It may be a good idea to do it programmatically
> >with local variable rather than modifying the type system and having it
> >stored in the CAS...?
> >
> >Maybe something like:
> >boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
> >Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class); --Pei
> >
> >
> >> -----Original Message-----
> >> From: Richard Eckart de Castilho [mailto:rec@apache.org]
> >> Sent: Monday, September 30, 2013 11:59 AM
> >> To: dev@ctakes.apache.org; samir chabou
> >> Subject: Re: sentence number in WordToken
> >>
> >> Hi,
> >>
> >> if you do many selectCovering calls, you may be faster using
> >>indexCovering  once and then using the lookup index it produces.
> >>
> >> IMHO type systems should not contain information that can easily be
> >> calculated at runtime (e.g. sentence number, token number, etc.).
> >>
> >> Mind, I have no say here ;) Just my personal opinion.
> >>
> >> -- Richard
> >>
> >> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
> >>
> >> > Hi Pei,
> >> >
> >> > I though
> >> > this may be have some use ...
> >> >
> >> > Because I
> >> > need to know if two or more words tokens belong to the same
> >> > sentence; and since WordToken does not define the feature sentence
> >> > number. I added it to the TypeSystem. These are the steps:
> >> >
> >> > 1)      I added the sentence number
> >> > features for the type BaseToken in TypeSystem.xml file (I choose
> >> > the supper class in order that the feature be propagated to all
> >> > subclasses (wordToken,SymboleToken,NumToken ...)
> >> >
> >> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
> >> annotateRange) I set the new feature
> >> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
> >> shown below :
> >> >
> >> > bta.setSentenceNumber(sentence.getSentenceNumber());
> >> >       bta.addToIndexes();
> >> >
> >> > 3)      Generate the JCASGen in the tab de TypeSystem of the
> >> > aggregate
> >> >
> >> > 4)      Add the feature in the source
> >> > tab of the aggregate
> >> >
> >> > Probably I
> >> > could have used as alternative:
> >> > List<Sentence> list = JCasUtil.selectCovering(aJcas,
> >> > Sentence.class, entity1.getBegin(), entity1.getEnd()); the issue
> >> > with this is : if I have many entities to be checked at the same
> >> > time or if the entity1 is found in many places, I have to add some
> >> > if conditions to get sentence number
> >> >
> >> >
> >> > Thanks
> >> > Samir

Re: sentence number in WordToken

Posted by "Wu, Stephen T., Ph.D." <Wu...@mayo.edu>.

Hmm, we should probably have a process to vote up or down type system
changes like this, since they affect everyone.
In this case I'd agree with the others: don't add it.

stephen



On 9/30/13 11:21 AM, "samir chabou" <sa...@yahoo.com> wrote:

>thanks for the feed back it's a good point,
>I did it also with selectCovering but as Richard mention I'll changed to
>indexCovering since it's faster.
>Samir
>
>
>
>
>________________________________
> From: "Chen, Pei" <Pe...@childrens.harvard.edu>
>To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou
><sa...@yahoo.com>
>Sent: Monday, September 30, 2013 12:10:45 PM
>Subject: RE: sentence number in  WordToken
> 
>
>Samir,
>I think Richard has a good point here.   What is the use to require
>adding sentenceNumber() to BaseToken in the TypeSystem?
>If it's only temporary, It may be a good idea to do it programmatically
>with local variable rather than modifying the type system and having it
>stored in the CAS...?
>
>Maybe something like:
>boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
>Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class);
>--Pei
>
>
>> -----Original Message-----
>> From: Richard Eckart de Castilho [mailto:rec@apache.org]
>> Sent: Monday, September 30, 2013 11:59 AM
>> To: dev@ctakes.apache.org; samir chabou
>> Subject: Re: sentence number in WordToken
>> 
>> Hi,
>> 
>> if you do many selectCovering calls, you may be faster using
>>indexCovering
>> once and then using the lookup index it produces.
>> 
>> IMHO type systems should not contain information that can easily be
>> calculated at runtime (e.g. sentence number, token number, etc.).
>> 
>> Mind, I have no say here ;) Just my personal opinion.
>> 
>> -- Richard
>> 
>> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
>> 
>> > Hi Pei,
>> >
>> > I though
>> > this may be have some use ...
>> >
>> > Because I
>> > need to know if two or more words tokens belong to the same sentence;
>> > and since WordToken does not define the feature sentence number. I
>> > added it to the TypeSystem. These are the steps:
>> >
>> > 1)      I added the sentence number
>> > features for the type BaseToken in TypeSystem.xml file (I choose the
>> > supper class in order that the feature be propagated to all subclasses
>> > (wordToken,SymboleToken,NumToken ...)
>> >
>> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
>> annotateRange) I set the new feature
>> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
>> shown below :
>> >
>> > bta.setSentenceNumber(sentence.getSentenceNumber());
>> >       bta.addToIndexes();
>> >
>> > 3)      Generate the JCASGen in the tab de TypeSystem of the
>> > aggregate
>> >
>> > 4)      Add the feature in the source
>> > tab of the aggregate
>> >
>> > Probably I
>> > could have used as alternative:
>> > List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
>> > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I
>> > have many entities to be checked at the same time or if the entity1 is
>> > found in many places, I have to add some if conditions to get sentence
>> > number
>> >
>> >
>> > Thanks
>> > Samir

Re: sentence number in WordToken

Posted by samir chabou <sa...@yahoo.com>.

thanks for the feed back it's a good point,
I did it also with selectCovering but as Richard mention I'll changed to indexCovering since it's faster.
Samir




________________________________
 From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou <sa...@yahoo.com> 
Sent: Monday, September 30, 2013 12:10:45 PM
Subject: RE: sentence number in  WordToken
 

Samir,
I think Richard has a good point here.   What is the use to require adding sentenceNumber() to BaseToken in the TypeSystem?
If it's only temporary, It may be a good idea to do it programmatically with local variable rather than modifying the type system and having it stored in the CAS...?

Maybe something like:
boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class);
--Pei


> -----Original Message-----
> From: Richard Eckart de Castilho [mailto:rec@apache.org]
> Sent: Monday, September 30, 2013 11:59 AM
> To: dev@ctakes.apache.org; samir chabou
> Subject: Re: sentence number in WordToken
> 
> Hi,
> 
> if you do many selectCovering calls, you may be faster using indexCovering
> once and then using the lookup index it produces.
> 
> IMHO type systems should not contain information that can easily be
> calculated at runtime (e.g. sentence number, token number, etc.).
> 
> Mind, I have no say here ;) Just my personal opinion.
> 
> -- Richard
> 
> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
> 
> > Hi Pei,
> >
> > I though
> > this may be have some use ...
> >
> > Because I
> > need to know if two or more words tokens belong to the same sentence;
> > and since WordToken does not define the feature sentence number. I
> > added it to the TypeSystem. These are the steps:
> >
> > 1)      I added the sentence number
> > features for the type BaseToken in TypeSystem.xml file (I choose the
> > supper class in order that the feature be propagated to all subclasses
> > (wordToken,SymboleToken,NumToken ...)
> >
> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
> annotateRange) I set the new feature
> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
> shown below :
> >
> > bta.setSentenceNumber(sentence.getSentenceNumber());
> >       bta.addToIndexes();
> >
> > 3)      Generate the JCASGen in the tab de TypeSystem of the
> > aggregate
> >
> > 4)      Add the feature in the source
> > tab of the aggregate
> >
> > Probably I
> > could have used as alternative:
> > List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I
> > have many entities to be checked at the same time or if the entity1 is
> > found in many places, I have to add some if conditions to get sentence
> > number
> >
> >
> > Thanks
> > Samir

RE: sentence number in WordToken

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Samir,
I think Richard has a good point here.   What is the use to require adding sentenceNumber() to BaseToken in the TypeSystem?
If it's only temporary, It may be a good idea to do it programmatically with local variable rather than modifying the type system and having it stored in the CAS...?

Maybe something like:
boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class);
--Pei


> -----Original Message-----
> From: Richard Eckart de Castilho [mailto:rec@apache.org]
> Sent: Monday, September 30, 2013 11:59 AM
> To: dev@ctakes.apache.org; samir chabou
> Subject: Re: sentence number in WordToken
> 
> Hi,
> 
> if you do many selectCovering calls, you may be faster using indexCovering
> once and then using the lookup index it produces.
> 
> IMHO type systems should not contain information that can easily be
> calculated at runtime (e.g. sentence number, token number, etc.).
> 
> Mind, I have no say here ;) Just my personal opinion.
> 
> -- Richard
> 
> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
> 
> > Hi Pei,
> >
> > I though
> > this may be have some use ...
> >
> > Because I
> > need to know if two or more words tokens belong to the same sentence;
> > and since WordToken does not define the feature sentence number. I
> > added it to the TypeSystem. These are the steps:
> >
> > 1)      I added the sentence number
> > features for the type BaseToken in TypeSystem.xml file (I choose the
> > supper class in order that the feature be propagated to all subclasses
> > (wordToken,SymboleToken,NumToken ...)
> >
> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
> annotateRange) I set the new feature
> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
> shown below :
> >
> > bta.setSentenceNumber(sentence.getSentenceNumber());
> >       bta.addToIndexes();
> >
> > 3)      Generate the JCASGen in the tab de TypeSystem of the
> > aggregate
> >
> > 4)      Add the feature in the source
> > tab of the aggregate
> >
> > Probably I
> > could have used as alternative:
> > List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I
> > have many entities to be checked at the same time or if the entity1 is
> > found in many places, I have to add some if conditions to get sentence
> > number
> >
> >
> > Thanks
> > Samir

Re: sentence number in WordToken

Posted by Richard Eckart de Castilho <re...@apache.org>.

Hi,

if you do many selectCovering calls, you may be faster using
indexCovering once and then using the lookup index it produces.

IMHO type systems should not contain information that can easily
be calculated at runtime (e.g. sentence number, token number, etc.).

Mind, I have no say here ;) Just my personal opinion.

-- Richard

On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:

> Hi Pei,
> 
> I though
> this may be have some use …
>  
> Because I
> need to know if two or more words tokens belong to the same sentence; and
> since WordToken does not define the feature sentence number. I added it to the
> TypeSystem. These are the steps:
>  
> 1)      I added the sentence number
> features for the type BaseToken in TypeSystem.xml file (I choose the supper
> class in order that the feature be propagated to all subclasses
> (wordToken,SymboleToken,NumToken …)
>  
> 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode annotateRange) I set the new feature
> (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as shown below :
>      
> bta.setSentenceNumber(sentence.getSentenceNumber());
>       bta.addToIndexes();
>  
> 3)      Generate the JCASGen in the tab de TypeSystem of the
> aggregate
>  
> 4)      Add the feature in the source
> tab of the aggregate
>  
> Probably I
> could have used as alternative:
> List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> entity1.getBegin(), entity1.getEnd()); the issue with this is : if I have many
> entities to be checked at the same time or if the entity1 is found in many
> places, I have to add some if conditions to get sentence number 
> 
> 
> Thanks
> Samir

sentence number in WordToken

Posted by samir chabou <sa...@yahoo.com>.

Hi Pei,

I though
this may be have some use …
 
Because I
need to know if two or more words tokens belong to the same sentence; and
since WordToken does not define the feature sentence number. I added it to the
TypeSystem. These are the steps:
 
1)      I added the sentence number
features for the type BaseToken in TypeSystem.xml file (I choose the supper
class in order that the feature be propagated to all subclasses
(wordToken,SymboleToken,NumToken …)
 
2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode annotateRange) I set the new feature
(BaseToken.sentenceNumber = sentence.getSentenceNumber()) as shown below :
     
bta.setSentenceNumber(sentence.getSentenceNumber());
      bta.addToIndexes();
 
3)      Generate the JCASGen in the tab de TypeSystem of the
aggregate
 
4)      Add the feature in the source
tab of the aggregate
 
Probably I
could have used as alternative:
List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
entity1.getBegin(), entity1.getEnd()); the issue with this is : if I have many
entities to be checked at the same time or if the entity1 is found in many
places, I have to add some if conditions to get sentence number 


Thanks
Samir

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by Pei Chen <ch...@apache.org>.

James,
Obviously it would be best to customize the code and/or the dictionary for
your particular case.
But if you want to try something that will work without any code changes,
you can try the below in your LookupDesc_Db.xml
Essentially, what it will do is take advantage of the fact the the
UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that
maps the CUI's to Codes.  Couple by the fact that there already is a table
called umls_ms_2011ab which contains the codes and cui's from many
different sources including ICD9CM.
What you could do is just reuse the table as the mapping table as well and
specify the source such as:
select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'

(The downside is that I don't think there is a index on sourcetype so
performance may suck).
I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT.

<lookupConsumer className=
"org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl">

  <properties>

  <property key="codingScheme" value="ICD9CM"/>

  <property key="cuiMetaField" value="cui"/>

  <property key="tuiMetaField" value="tui"/>

  <property key="anatomicalSiteTuis" value=
"T021,T022,T023,T024,T025,T026,T029,T030"/>

  <property key="procedureTuis" value="T059,T060,T061"/>

  <property key="disorderTuis" value=
"T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>

  <property key="findingTuis" value=
"T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/>

  <property key="dbConnExtResrcKey" value="DbConnection"/>

  <property key="mapPrepStmt" value="select code from umls_ms_2011ab where
cui=? and sourcetype='ICD9CM'"/>

  </properties>

 </lookupConsumer>


On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen <ch...@apache.org> wrote:

> James,
> One can try the NamedEntityLookupConsumerImpl instead of
> UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
> contain SNOMED codes.
> Will you need to preserve the TUI?  One thing is that
> NamedEntityLookupConsumerImpl will return back all of the hits, except that
> it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
> we should make the NamedEntityLookupConsumerImpl a bit more general.
>
> --Pei
>
>
> On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James <JV...@activehealth.net>wrote:
>
>> I now see that I use a query on umls_ms_2011ab where sourcetype =
>> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
>> ICD9CM annotations / concepts or do I change the code in consumeHits() or
>> getSnomedCodes()?
>>
>> -----Original Message-----
>> From: Vogel, James
>> Sent: Friday, September 27, 2013 6:30 PM
>> To: dev@ctakes.apache.org
>> Subject: RE: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> Is anyone able to provide any more detailed guidance on what I'd need to
>> change to add the ICD9 codes as tags, e.g., where do I look for the tables
>> in the hsql database that would contain the ICD9 data?
>>
>> Thanks.
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Monday, September 16, 2013 7:25 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> James,
>> I haven't done it myself, so I don't know exactly how the config
>> changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
>> the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
>> the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
>> I believe this is where the actual dictionary filtering is done. There
>> is also a consumer class called
>> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
>> mapPrepStmt field with a SQL query that might need changing. That is
>> where I would start looking, I'm not sure whether you would need to
>> write a new consumer class, and what values the codingScheme field can
>> take, but hopefully this helps you get started until someone else chimes
>> in with more detailed info!
>>
>> Tim
>>
>> On 09/15/2013 08:39 PM, Vogel, James wrote:
>> > Any more guidance you can give about the nature of the changes to the
>> config and impl that would need to be made to get the ICD9 codes?
>> >
>> > -----Original Message-----
>> > From: Pei Chen [mailto:chenpei@apache.org]
>> > Sent: Wednesday, September 04, 2013 1:02 PM
>> > To: dev@ctakes.apache.org
>> > Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>> >
>> > Ted,
>> >
>> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> > familiar> with how to access that information: In the example I've
>> > described below,
>> >
>> >> where would I locate the ICD9 for a specific entity?
>> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
>> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT
>> code or
>> > RxNorm code.
>> >
>> > [1]
>> >
>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>> >
>> > [2]
>> >
>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>> >
>> >  If you would like it to return ICD9 codes, one would need to
>> > modify/configure the above...
>> >
>> > --Pei
>> >
>> >
>> > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
>> > <Th...@providence.org>wrote:
>> >
>> >> Thanks for looking into this, it's been puzzling me.
>> >>
>> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> >> familiar with how to access that information: In the example I've
>> described
>> >> below, where would I locate the ICD9 for a specific entity?
>> >>
>> >> Thank you
>> >>
>> >> Ted
>> >>
>> >> -----Original Message-----
>> >> From: Pei Chen [mailto:chenpei@apache.org]
>> >> Sent: Tuesday, September 03, 2013 7:13 PM
>> >> To: dev@ctakes.apache.org
>> >> Subject: Re: specificity in selecting EntityMentions when using
>> >> AggregatePlaintextUMLSProcessor
>> >>
>> >> You're right, it should have gotten "CIN I"- that's a strange one,
>> >> probably needs to be debugged/looked into further...
>> >>
>> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> >> Timothy.Miller@childrens.harvard.edu> wrote:
>> >>> Ah. So it will get
>> >>> CIN 2 (in SNOMED)
>> >>> CIN III (in SNOMED)
>> >>> CIN 3 (in SNOMED)
>> >>>
>> >>> but the rest are not in SNOMED?
>> >>>
>> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>> >>> (though I don't fully understand what all the symbols mean in the umls
>> >>> browser).
>> >>>
>> >>>> CIN I - Cervical intraepithelial neoplasia 1
>> >>>> [A3002690/SNOMEDCT/SY/285836003]
>> >>>
>> >>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>> >>>> able to perform the lookup successfully.
>> >>>> Note that CIN II synonyms do exist in other umls thersauses such as
>> >>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> >>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>> >>>>
>> >>>> --Pei
>> >>>>
>> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> >>>> <Ti...@childrens.harvard.edu> wrote:
>> >>>>> That is a good question, Ted!
>> >>>>>
>> >>>>> I tried it with a simple context: "The patient has a CIN III." I'm
>> >>>>> not sure if that is a correct context but I was able to duplicate
>> >>>>> your findings. (Finds a CUI for CIN III but not if you change it to
>> >>>>> CIN II)
>> >>>>>
>> >>>>> My first thought was that it is the chunker. But the chunker seems
>> >>>>> to get it right, as CIN II and CIN III are both called NPs, and
>> >>>>> similarly the LookupWindowAnnotator handles them both identically.
>> >>>>> So that suggests it is a problem with the actual lookup of the
>> >>>>> tokens in the LookupWindow.
>> >>>>>
>> >>>>> That's all I can do for now but maybe someone else who knows more
>> >>>>> about its behavior offhand will have an idea.
>> >>>>>
>> >>>>> Tim
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>> >>>>>> I'm trying to understand what would prevent the
>> >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific
>> problems
>> >> that are defined in the UMLS version used by cTAKES.
>> >>>>>> For example,
>> >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
>> >> parsed out as UMLS CUI C0206708.
>> >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
>> >> Roman Numerals, I,II, and III.
>> >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
>> >> C0851140: "Carcinoma in situ of uterine cervix."
>> >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN
>> II
>> >> as their correct concepts, "Cervical intraepithelial neoplasia grade
>> 1" and
>> >> "Cervical intraepithelial neoplasia grade 2" respectively.
>> >>>>>> Is there a way to tune the detection of UMLS concepts?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --------------------------------------------
>> >>>>>> Ted Assur
>> >>>>>> IT Solutions Architect for Cancer Research Providence Health &
>> >>>>>> Services ted.assur@providence.org
>> >>>>>> 503-215-6476
>> >>>>>>
>> >>>>>> Crede, ut intelligas.
>> >>>>>> Intellego, ut credam.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>   ________________________________
>> >>>>>>
>> >>>>>> This message is intended for the sole use of the addressee, and may
>> >> contain information that is privileged, confidential and exempt from
>> >> disclosure under applicable law. If you are not the addressee you are
>> >> hereby notified that you may not use, copy, disclose, or distribute to
>> >> anyone the message or any information contained in the message. If you
>> have
>> >> received this message in error, please immediately advise the sender by
>> >> reply email and delete this message.
>> >>
>> >> ________________________________
>> >>
>> >> This message is intended for the sole use of the addressee, and may
>> >> contain information that is privileged, confidential and exempt from
>> >> disclosure under applicable law. If you are not the addressee you are
>> >> hereby notified that you may not use, copy, disclose, or distribute to
>> >> anyone the message or any information contained in the message. If you
>> have
>> >> received this message in error, please immediately advise the sender by
>> >> reply email and delete this message.
>> >>
>> >>
>> > IMPORTANT WARNING: Information contained in this email is intended for
>> the use of the individual to whom it is addressed, and may contain
>> information that is privileged, confidential, and exempt from disclosure
>> under applicable law. If you are not the intended recipient, or the
>> employee or agent responsible for delivering the message to the intended
>> recipient, you are hereby notified that any dissemination, distribution, or
>> copying of this communication is STRICTLY FORBIDDEN. If you have received
>> this communication in error, please notify us immediately by return email
>> and delete this document. Thank you.
>> >
>>
>>
>> IMPORTANT WARNING: Information contained in this email is intended for
>> the use of the individual to whom it is addressed, and may contain
>> information that is privileged, confidential, and exempt from disclosure
>> under applicable law. If you are not the intended recipient, or the
>> employee or agent responsible for delivering the message to the intended
>> recipient, you are hereby notified that any dissemination, distribution, or
>> copying of this communication is STRICTLY FORBIDDEN. If you have received
>> this communication in error, please notify us immediately by return email
>> and delete this document. Thank you.
>>
>
>

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by Pei Chen <ch...@apache.org>.

James,
One can try the NamedEntityLookupConsumerImpl instead of
UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
contain SNOMED codes.
Will you need to preserve the TUI?  One thing is that
NamedEntityLookupConsumerImpl will return back all of the hits, except that
it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
we should make the NamedEntityLookupConsumerImpl a bit more general.

--Pei


On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James <JV...@activehealth.net>wrote:

> I now see that I use a query on umls_ms_2011ab where sourcetype =
> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
> ICD9CM annotations / concepts or do I change the code in consumeHits() or
> getSnomedCodes()?
>
> -----Original Message-----
> From: Vogel, James
> Sent: Friday, September 27, 2013 6:30 PM
> To: dev@ctakes.apache.org
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> Is anyone able to provide any more detailed guidance on what I'd need to
> change to add the ICD9 codes as tags, e.g., where do I look for the tables
> in the hsql database that would contain the ICD9 data?
>
> Thanks.
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Monday, September 16, 2013 7:25 AM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> James,
> I haven't done it myself, so I don't know exactly how the config
> changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
> the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
> the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
> I believe this is where the actual dictionary filtering is done. There
> is also a consumer class called
> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
> mapPrepStmt field with a SQL query that might need changing. That is
> where I would start looking, I'm not sure whether you would need to
> write a new consumer class, and what values the codingScheme field can
> take, but hopefully this helps you get started until someone else chimes
> in with more detailed info!
>
> Tim
>
> On 09/15/2013 08:39 PM, Vogel, James wrote:
> > Any more guidance you can give about the nature of the changes to the
> config and impl that would need to be made to get the ICD9 codes?
> >
> > -----Original Message-----
> > From: Pei Chen [mailto:chenpei@apache.org]
> > Sent: Wednesday, September 04, 2013 1:02 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> >
> > Ted,
> >
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> > familiar> with how to access that information: In the example I've
> > described below,
> >
> >> where would I locate the ICD9 for a specific entity?
> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code
> or
> > RxNorm code.
> >
> > [1]
> >
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
> >
> > [2]
> >
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
> >
> >  If you would like it to return ICD9 codes, one would need to
> > modify/configure the above...
> >
> > --Pei
> >
> >
> > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> > <Th...@providence.org>wrote:
> >
> >> Thanks for looking into this, it's been puzzling me.
> >>
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> >> familiar with how to access that information: In the example I've
> described
> >> below, where would I locate the ICD9 for a specific entity?
> >>
> >> Thank you
> >>
> >> Ted
> >>
> >> -----Original Message-----
> >> From: Pei Chen [mailto:chenpei@apache.org]
> >> Sent: Tuesday, September 03, 2013 7:13 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: specificity in selecting EntityMentions when using
> >> AggregatePlaintextUMLSProcessor
> >>
> >> You're right, it should have gotten "CIN I"- that's a strange one,
> >> probably needs to be debugged/looked into further...
> >>
> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> >> Timothy.Miller@childrens.harvard.edu> wrote:
> >>> Ah. So it will get
> >>> CIN 2 (in SNOMED)
> >>> CIN III (in SNOMED)
> >>> CIN 3 (in SNOMED)
> >>>
> >>> but the rest are not in SNOMED?
> >>>
> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> >>> (though I don't fully understand what all the symbols mean in the umls
> >>> browser).
> >>>
> >>>> CIN I - Cervical intraepithelial neoplasia 1
> >>>> [A3002690/SNOMEDCT/SY/285836003]
> >>>
> >>> On 09/03/2013 09:55 PM, Pei Chen wrote:
> >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
> >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
> >>>> able to perform the lookup successfully.
> >>>> Note that CIN II synonyms do exist in other umls thersauses such as
> >>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> >>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>>>
> >>>> --Pei
> >>>>
> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >>>> <Ti...@childrens.harvard.edu> wrote:
> >>>>> That is a good question, Ted!
> >>>>>
> >>>>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>>>> not sure if that is a correct context but I was able to duplicate
> >>>>> your findings. (Finds a CUI for CIN III but not if you change it to
> >>>>> CIN II)
> >>>>>
> >>>>> My first thought was that it is the chunker. But the chunker seems
> >>>>> to get it right, as CIN II and CIN III are both called NPs, and
> >>>>> similarly the LookupWindowAnnotator handles them both identically.
> >>>>> So that suggests it is a problem with the actual lookup of the
> >>>>> tokens in the LookupWindow.
> >>>>>
> >>>>> That's all I can do for now but maybe someone else who knows more
> >>>>> about its behavior offhand will have an idea.
> >>>>>
> >>>>> Tim
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>>>> I'm trying to understand what would prevent the
> >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific
> problems
> >> that are defined in the UMLS version used by cTAKES.
> >>>>>> For example,
> >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
> >> parsed out as UMLS CUI C0206708.
> >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
> >> Roman Numerals, I,II, and III.
> >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
> >> C0851140: "Carcinoma in situ of uterine cervix."
> >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
> >> as their correct concepts, "Cervical intraepithelial neoplasia grade 1"
> and
> >> "Cervical intraepithelial neoplasia grade 2" respectively.
> >>>>>> Is there a way to tune the detection of UMLS concepts?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --------------------------------------------
> >>>>>> Ted Assur
> >>>>>> IT Solutions Architect for Cancer Research Providence Health &
> >>>>>> Services ted.assur@providence.org
> >>>>>> 503-215-6476
> >>>>>>
> >>>>>> Crede, ut intelligas.
> >>>>>> Intellego, ut credam.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   ________________________________
> >>>>>>
> >>>>>> This message is intended for the sole use of the addressee, and may
> >> contain information that is privileged, confidential and exempt from
> >> disclosure under applicable law. If you are not the addressee you are
> >> hereby notified that you may not use, copy, disclose, or distribute to
> >> anyone the message or any information contained in the message. If you
> have
> >> received this message in error, please immediately advise the sender by
> >> reply email and delete this message.
> >>
> >> ________________________________
> >>
> >> This message is intended for the sole use of the addressee, and may
> >> contain information that is privileged, confidential and exempt from
> >> disclosure under applicable law. If you are not the addressee you are
> >> hereby notified that you may not use, copy, disclose, or distribute to
> >> anyone the message or any information contained in the message. If you
> have
> >> received this message in error, please immediately advise the sender by
> >> reply email and delete this message.
> >>
> >>
> > IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the
> employee or agent responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication is STRICTLY FORBIDDEN. If you have received
> this communication in error, please notify us immediately by return email
> and delete this document. Thank you.
> >
>
>
> IMPORTANT WARNING: Information contained in this email is intended for the
> use of the individual to whom it is addressed, and may contain information
> that is privileged, confidential, and exempt from disclosure under
> applicable law. If you are not the intended recipient, or the employee or
> agent responsible for delivering the message to the intended recipient, you
> are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.
>

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Vogel, James" <JV...@activehealth.net>.

I now see that I use a query on umls_ms_2011ab where sourcetype = 'ICD9CM'.  Is there a way to use an existing AE or class to add additional ICD9CM annotations / concepts or do I change the code in consumeHits() or getSnomedCodes()?

-----Original Message-----
From: Vogel, James
Sent: Friday, September 27, 2013 6:30 PM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Is anyone able to provide any more detailed guidance on what I'd need to change to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql database that would contain the ICD9 data?

Thanks.

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
Sent: Monday, September 16, 2013 7:25 AM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config and impl that would need to be made to get the ICD9 codes?
>
> -----Original Message-----
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> <Th...@providence.org>wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -----Original Message-----
>> From: Pei Chen [mailto:chenpei@apache.org]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>>>> able to perform the lookup successfully.
>>>> Note that CIN II synonyms do exist in other umls thersauses such as
>>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>>>
>>>> --Pei
>>>>
>>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>> That is a good question, Ted!
>>>>>
>>>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>>>> not sure if that is a correct context but I was able to duplicate
>>>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>>>> CIN II)
>>>>>
>>>>> My first thought was that it is the chunker. But the chunker seems
>>>>> to get it right, as CIN II and CIN III are both called NPs, and
>>>>> similarly the LookupWindowAnnotator handles them both identically.
>>>>> So that suggests it is a problem with the actual lookup of the
>>>>> tokens in the LookupWindow.
>>>>>
>>>>> That's all I can do for now but maybe someone else who knows more
>>>>> about its behavior offhand will have an idea.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>>>> I'm trying to understand what would prevent the
>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
>> that are defined in the UMLS version used by cTAKES.
>>>>>> For example,
>>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
>> parsed out as UMLS CUI C0206708.
>>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
>> Roman Numerals, I,II, and III.
>>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
>> C0851140: "Carcinoma in situ of uterine cervix."
>>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
>> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and
>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>>> Is there a way to tune the detection of UMLS concepts?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Ted Assur
>>>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>>>> Services ted.assur@providence.org
>>>>>> 503-215-6476
>>>>>>
>>>>>> Crede, ut intelligas.
>>>>>> Intellego, ut credam.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   ________________________________
>>>>>>
>>>>>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>> ________________________________
>>
>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>>
> IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.
>


IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config and impl that would need to be made to get the ICD9 codes?
>
> -----Original Message-----
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> <Th...@providence.org>wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -----Original Message-----
>> From: Pei Chen [mailto:chenpei@apache.org]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>>>> able to perform the lookup successfully.
>>>> Note that CIN II synonyms do exist in other umls thersauses such as
>>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>>>
>>>> --Pei
>>>>
>>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>> That is a good question, Ted!
>>>>>
>>>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>>>> not sure if that is a correct context but I was able to duplicate
>>>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>>>> CIN II)
>>>>>
>>>>> My first thought was that it is the chunker. But the chunker seems
>>>>> to get it right, as CIN II and CIN III are both called NPs, and
>>>>> similarly the LookupWindowAnnotator handles them both identically.
>>>>> So that suggests it is a problem with the actual lookup of the
>>>>> tokens in the LookupWindow.
>>>>>
>>>>> That's all I can do for now but maybe someone else who knows more
>>>>> about its behavior offhand will have an idea.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>>>> I'm trying to understand what would prevent the
>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
>> that are defined in the UMLS version used by cTAKES.
>>>>>> For example,
>>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
>> parsed out as UMLS CUI C0206708.
>>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
>> Roman Numerals, I,II, and III.
>>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
>> C0851140: "Carcinoma in situ of uterine cervix."
>>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
>> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and
>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>>> Is there a way to tune the detection of UMLS concepts?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Ted Assur
>>>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>>>> Services ted.assur@providence.org
>>>>>> 503-215-6476
>>>>>>
>>>>>> Crede, ut intelligas.
>>>>>> Intellego, ut credam.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   ________________________________
>>>>>>
>>>>>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>> ________________________________
>>
>> This message is intended for the sole use of the addressee, and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If you are not the addressee you are
>> hereby notified that you may not use, copy, disclose, or distribute to
>> anyone the message or any information contained in the message. If you have
>> received this message in error, please immediately advise the sender by
>> reply email and delete this message.
>>
>>
> IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.
>

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Vogel, James" <JV...@activehealth.net>.

Any more guidance you can give about the nature of the changes to the config and impl that would need to be made to get the ICD9 codes?

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org]
Sent: Wednesday, September 04, 2013 1:02 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Ted,

> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
familiar> with how to access that information: In the example I've
described below,

> where would I locate the ICD9 for a specific entity?

Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
RxNorm code.

[1]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml

[2]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java

 If you would like it to return ICD9 codes, one would need to
modify/configure the above...

--Pei


On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
<Th...@providence.org>wrote:

> Thanks for looking into this, it's been puzzling me.
>
> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar with how to access that information: In the example I've described
> below, where would I locate the ICD9 for a specific entity?
>
> Thank you
>
> Ted
>
> -----Original Message-----
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Tuesday, September 03, 2013 7:13 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> You're right, it should have gotten "CIN I"- that's a strange one,
> probably needs to be debugged/looked into further...
>
> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
> > Ah. So it will get
> > CIN 2 (in SNOMED)
> > CIN III (in SNOMED)
> > CIN 3 (in SNOMED)
> >
> > but the rest are not in SNOMED?
> >
> > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> > (though I don't fully understand what all the symbols mean in the umls
> > browser).
> >
> >> CIN I - Cervical intraepithelial neoplasia 1
> >> [A3002690/SNOMEDCT/SY/285836003]
> >
> >
> > On 09/03/2013 09:55 PM, Pei Chen wrote:
> >> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
> >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> >> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
> >> able to perform the lookup successfully.
> >> Note that CIN II synonyms do exist in other umls thersauses such as
> >> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>
> >> --Pei
> >>
> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >> <Ti...@childrens.harvard.edu> wrote:
> >>> That is a good question, Ted!
> >>>
> >>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>> not sure if that is a correct context but I was able to duplicate
> >>> your findings. (Finds a CUI for CIN III but not if you change it to
> >>> CIN II)
> >>>
> >>> My first thought was that it is the chunker. But the chunker seems
> >>> to get it right, as CIN II and CIN III are both called NPs, and
> >>> similarly the LookupWindowAnnotator handles them both identically.
> >>> So that suggests it is a problem with the actual lookup of the
> >>> tokens in the LookupWindow.
> >>>
> >>> That's all I can do for now but maybe someone else who knows more
> >>> about its behavior offhand will have an idea.
> >>>
> >>> Tim
> >>>
> >>>
> >>>
> >>>
> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>> I'm trying to understand what would prevent the
> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
> that are defined in the UMLS version used by cTAKES.
> >>>>
> >>>> For example,
> >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
> parsed out as UMLS CUI C0206708.
> >>>>
> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
> Roman Numerals, I,II, and III.
> >>>>
> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
> C0851140: "Carcinoma in situ of uterine cervix."
> >>>>
> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and
> "Cervical intraepithelial neoplasia grade 2" respectively.
> >>>>
> >>>> Is there a way to tune the detection of UMLS concepts?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --------------------------------------------
> >>>> Ted Assur
> >>>> IT Solutions Architect for Cancer Research Providence Health &
> >>>> Services ted.assur@providence.org
> >>>> 503-215-6476
> >>>>
> >>>> Crede, ut intelligas.
> >>>> Intellego, ut credam.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>   ________________________________
> >>>>
> >>>> This message is intended for the sole use of the addressee, and may
> contain information that is privileged, confidential and exempt from
> disclosure under applicable law. If you are not the addressee you are
> hereby notified that you may not use, copy, disclose, or distribute to
> anyone the message or any information contained in the message. If you have
> received this message in error, please immediately advise the sender by
> reply email and delete this message.
> >>>>
> >
>
>
> ________________________________
>
> This message is intended for the sole use of the addressee, and may
> contain information that is privileged, confidential and exempt from
> disclosure under applicable law. If you are not the addressee you are
> hereby notified that you may not use, copy, disclose, or distribute to
> anyone the message or any information contained in the message. If you have
> received this message in error, please immediately advise the sender by
> reply email and delete this message.
>
>

IMPORTANT WARNING: Information contained in this email is intended for the use of the individual to whom it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is STRICTLY FORBIDDEN. If you have received this communication in error, please notify us immediately by return email and delete this document. Thank you.

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by Pei Chen <ch...@apache.org>.

Ted,

> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
familiar> with how to access that information: In the example I've
described below,

> where would I locate the ICD9 for a specific entity?

Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
RxNorm code.

[1]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml

[2]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java

 If you would like it to return ICD9 codes, one would need to
modify/configure the above...

--Pei


On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
<Th...@providence.org>wrote:

> Thanks for looking into this, it's been puzzling me.
>
> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar with how to access that information: In the example I've described
> below, where would I locate the ICD9 for a specific entity?
>
> Thank you
>
> Ted
>
> -----Original Message-----
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Tuesday, September 03, 2013 7:13 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> You're right, it should have gotten "CIN I"- that's a strange one,
> probably needs to be debugged/looked into further...
>
> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
> > Ah. So it will get
> > CIN 2 (in SNOMED)
> > CIN III (in SNOMED)
> > CIN 3 (in SNOMED)
> >
> > but the rest are not in SNOMED?
> >
> > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> > (though I don't fully understand what all the symbols mean in the umls
> > browser).
> >
> >> CIN I - Cervical intraepithelial neoplasia 1
> >> [A3002690/SNOMEDCT/SY/285836003]
> >
> >
> > On 09/03/2013 09:55 PM, Pei Chen wrote:
> >> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
> >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> >> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
> >> able to perform the lookup successfully.
> >> Note that CIN II synonyms do exist in other umls thersauses such as
> >> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>
> >> --Pei
> >>
> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >> <Ti...@childrens.harvard.edu> wrote:
> >>> That is a good question, Ted!
> >>>
> >>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>> not sure if that is a correct context but I was able to duplicate
> >>> your findings. (Finds a CUI for CIN III but not if you change it to
> >>> CIN II)
> >>>
> >>> My first thought was that it is the chunker. But the chunker seems
> >>> to get it right, as CIN II and CIN III are both called NPs, and
> >>> similarly the LookupWindowAnnotator handles them both identically.
> >>> So that suggests it is a problem with the actual lookup of the
> >>> tokens in the LookupWindow.
> >>>
> >>> That's all I can do for now but maybe someone else who knows more
> >>> about its behavior offhand will have an idea.
> >>>
> >>> Tim
> >>>
> >>>
> >>>
> >>>
> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>> I'm trying to understand what would prevent the
> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
> that are defined in the UMLS version used by cTAKES.
> >>>>
> >>>> For example,
> >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
> parsed out as UMLS CUI C0206708.
> >>>>
> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
> Roman Numerals, I,II, and III.
> >>>>
> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
> C0851140: "Carcinoma in situ of uterine cervix."
> >>>>
> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and
> "Cervical intraepithelial neoplasia grade 2" respectively.
> >>>>
> >>>> Is there a way to tune the detection of UMLS concepts?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --------------------------------------------
> >>>> Ted Assur
> >>>> IT Solutions Architect for Cancer Research Providence Health &
> >>>> Services ted.assur@providence.org
> >>>> 503-215-6476
> >>>>
> >>>> Crede, ut intelligas.
> >>>> Intellego, ut credam.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>   ________________________________
> >>>>
> >>>> This message is intended for the sole use of the addressee, and may
> contain information that is privileged, confidential and exempt from
> disclosure under applicable law. If you are not the addressee you are
> hereby notified that you may not use, copy, disclose, or distribute to
> anyone the message or any information contained in the message. If you have
> received this message in error, please immediately advise the sender by
> reply email and delete this message.
> >>>>
> >
>
>
> ________________________________
>
> This message is intended for the sole use of the addressee, and may
> contain information that is privileged, confidential and exempt from
> disclosure under applicable law. If you are not the addressee you are
> hereby notified that you may not use, copy, disclose, or distribute to
> anyone the message or any information contained in the message. If you have
> received this message in error, please immediately advise the sender by
> reply email and delete this message.
>
>

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Assur, Ted" <Th...@providence.org>.

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity?

Thank you

Ted

-----Original Message-----
From: Pei Chen [mailto:chenpei@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>> Services ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>


________________________________

This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by Pei Chen <ch...@apache.org>.

You're right, it should have gotten "CIN I"- that's a strange one,
probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy
<Ti...@childrens.harvard.edu> wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED-
>> CIN 2 - Cervical intraepithelial neoplasia 2
>> [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm not
>>> sure if that is a correct context but I was able to duplicate your
>>> findings. (Finds a CUI for CIN III but not if you change it to CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems to
>>> get it right, as CIN II and CIN III are both called NPs, and similarly
>>> the LookupWindowAnnotator handles them both identically. So that
>>> suggests it is a problem with the actual lookup of the tokens in the
>>> LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more about
>>> its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------------------
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research
>>>> Providence Health & Services
>>>> ted.assur@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   ________________________________
>>>>
>>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>>
>

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Ah. So it will get
CIN 2 (in SNOMED)
CIN III (in SNOMED)
CIN 3 (in SNOMED)

but the rest are not in SNOMED?

I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
(though I don't fully understand what all the symbols mean in the umls
browser).

> CIN I - Cervical intraepithelial neoplasia 1
> [A3002690/SNOMEDCT/SY/285836003]


On 09/03/2013 09:55 PM, Pei Chen wrote:
> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> the terms do not exist in SNOMED-
> CIN 2 - Cervical intraepithelial neoplasia 2
> [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
> able to perform the lookup successfully.
> Note that CIN II synonyms do exist in other umls thersauses such as
> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>
> --Pei
>
> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> <Ti...@childrens.harvard.edu> wrote:
>> That is a good question, Ted!
>>
>> I tried it with a simple context: "The patient has a CIN III." I'm not
>> sure if that is a correct context but I was able to duplicate your
>> findings. (Finds a CUI for CIN III but not if you change it to CIN II)
>>
>> My first thought was that it is the chunker. But the chunker seems to
>> get it right, as CIN II and CIN III are both called NPs, and similarly
>> the LookupWindowAnnotator handles them both identically. So that
>> suggests it is a problem with the actual lookup of the tokens in the
>> LookupWindow.
>>
>> That's all I can do for now but maybe someone else who knows more about
>> its behavior offhand will have an idea.
>>
>> Tim
>>
>>
>>
>>
>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>>
>>> For example,
>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>>
>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>>
>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>>
>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>>
>>> Is there a way to tune the detection of UMLS concepts?
>>>
>>>
>>>
>>>
>>> --------------------------------------------
>>> Ted Assur
>>> IT Solutions Architect for Cancer Research
>>> Providence Health & Services
>>> ted.assur@providence.org
>>> 503-215-6476
>>>
>>> Crede, ut intelligas.
>>> Intellego, ut credam.
>>>
>>>
>>>
>>>
>>>   ________________________________
>>>
>>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>>

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by Pei Chen <ch...@apache.org>.

It has the correct parse (POS, chunks, and lookupwindow)- but some of
the terms do not exist in SNOMED-
CIN 2 - Cervical intraepithelial neoplasia 2
[A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was
able to perform the lookup successfully.
Note that CIN II synonyms do exist in other umls thersauses such as
MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.

--Pei

On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
<Ti...@childrens.harvard.edu> wrote:
> That is a good question, Ted!
>
> I tried it with a simple context: "The patient has a CIN III." I'm not
> sure if that is a correct context but I was able to duplicate your
> findings. (Finds a CUI for CIN III but not if you change it to CIN II)
>
> My first thought was that it is the chunker. But the chunker seems to
> get it right, as CIN II and CIN III are both called NPs, and similarly
> the LookupWindowAnnotator handles them both identically. So that
> suggests it is a problem with the actual lookup of the tokens in the
> LookupWindow.
>
> That's all I can do for now but maybe someone else who knows more about
> its behavior offhand will have an idea.
>
> Tim
>
>
>
>
> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>>
>> For example,
>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>>
>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>>
>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>>
>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>>
>> Is there a way to tune the detection of UMLS concepts?
>>
>>
>>
>>
>> --------------------------------------------
>> Ted Assur
>> IT Solutions Architect for Cancer Research
>> Providence Health & Services
>> ted.assur@providence.org
>> 503-215-6476
>>
>> Crede, ut intelligas.
>> Intellego, ut credam.
>>
>>
>>
>>
>>   ________________________________
>>
>> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>>
>

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

That is a good question, Ted!

I tried it with a simple context: "The patient has a CIN III." I'm not
sure if that is a correct context but I was able to duplicate your
findings. (Finds a CUI for CIN III but not if you change it to CIN II)

My first thought was that it is the chunker. But the chunker seems to
get it right, as CIN II and CIN III are both called NPs, and similarly
the LookupWindowAnnotator handles them both identically. So that
suggests it is a problem with the actual lookup of the tokens in the
LookupWindow.

That's all I can do for now but maybe someone else who knows more about
its behavior offhand will have an idea.

Tim




On 09/03/2013 08:24 PM, Assur, Ted wrote:
> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>
> For example,
> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>
> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>
> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>
> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>
> Is there a way to tune the detection of UMLS concepts?
>
>
>
>
> --------------------------------------------
> Ted Assur
> IT Solutions Architect for Cancer Research
> Providence Health & Services
> ted.assur@providence.org
> 503-215-6476
>
> Crede, ut intelligas.
> Intellego, ut credam.
>
>
>
>
>   ________________________________
>
> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
>

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

Posted by Pei Chen <ch...@apache.org>.

Hi Ted,
Detecting the stage/grade and other attributes and asserting those
relationships to the cancer aside (That's probably a separate
discussion)-  But in your example, since there are distinct SNOMEDCT
concepts and direct matches, it was able to identify "Cervical
intraepithelial neoplasia grade 1"
cui = "C0349458"
code = "285836003"
as well as "Cervical intraepithelial neoplasia"
cui = "C0206708"
code = "285636001"
,etc.
It should also be able to identify "CIN 2" as there should be an exact
match in SNOMEDCT: (CIN 2 - Cervical intraepithelial neoplasia 2
[A3002688/SNOMEDCT/SY/285838002]
Please see attached xml output.

I am using out of the box AggregatePlaintextUMLSProcessor from the 3.1RC3
--Pei

On Tue, Sep 3, 2013 at 8:24 PM, Assur, Ted
<Th...@providence.org> wrote:
> I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES.
>
> For example,
> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708.
>
> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III.
>
> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix."
>
> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
>
> Is there a way to tune the detection of UMLS concepts?
>
>
>
>
> --------------------------------------------
> Ted Assur
> IT Solutions Architect for Cancer Research
> Providence Health & Services
> ted.assur@providence.org
> 503-215-6476
>
> Crede, ut intelligas.
> Intellego, ut credam.
>
>
>
>
>   ________________________________
>
> This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.