You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Tomasz Oliwa <ol...@uchicago.edu> on 2015/11/13 00:27:42 UTC

cTAKES dictionary lookup behavior question

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: http://52.27.22.206:8080/index.jsp but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
            VB   DT   NN  IN  DT     NN    
           |===|     |======|              
           Event     Anatomy               
                     C1305907   

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are 

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation? 

Regards,
Tomasz

RE: cTAKES dictionary lookup behavior question

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
You are welcome - thank you for finding and reporting the bug!

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu] 
Sent: Monday, November 16, 2015 4:08 PM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Sean,

I checked out 'ctakes/trunk' with this fix and run it on the examples from the Description. It no longer finds the incorrect annotations mentioned in the Description. I closed the JIRA entry. Thanks for the quick fix.

Regards,
Tomasz

________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Monday, November 16, 2015 11:25 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Hi Tomasz,

I just checked in a fix.  Could you please re-run your tests and close the issue if it passes?

Thanks,
Sean

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Monday, November 16, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Sean,

I created a JIRA entry for this bug at: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D389&d=BQIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1RTSCop2kmDAXve36EPdNJo6avl2B2SPpQcxCZZdeFw&s=2TRn9S89nvbp0TDN_LC1xTglF3VWlqnkgV5KdURu4KM&e=

It would be great you could check in a fix for it.

Regards,
Tomasz

________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Monday, November 16, 2015 10:20 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Hi all,

This is not intended behavior, it is a bug.  I will check in a fix soon ...

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 12, 2015 6:53 PM
To: britt fitch; dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Britt,

I observed it also depends on what the "missed" word is.

"baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it.
"electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match.

It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made?

Is there any way to tweak or customize this behavior?

Thanks,
Tomasz


________________________________
From: britt fitch [britt.fitch@wiredinformatics.com]
Sent: Thursday, November 12, 2015 5:36 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES dictionary lookup behavior question

The rare words, given the example terms below are "primary", "milk", and "baby".
The lookup allows for a certain number of "misses".
The "baby to" hits on "baby" as the rare word.
"baby to" compared to "baby tooth" is 1 "miss" and qualifies as a match. (in practice, if I recall correctly, "to" is actually discarded entirely, so the comparison is actually "baby" : "baby tooth").

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

"oxygen" > "oxygen therapy"
"pathology" > "pathology department" , "pathology procedure"
"exercise" > "exercise pain management"

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=6LcknYupSIqPd8Uml-tNRhwLudfDpVLBcC5JjZFhFQo&e=
Britt.Fitch@wiredinformatics.com

On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu>> wrote:

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: https://urldefense.proofpoint.com/v2/url?u=http-3A__52.27.22.206-3A8080_index.jsp&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=UmyBQ5X4UBJggOqmIQkANeD0eUz0nrLqGN8Z6__iB8o&e=  but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
           VB   DT   NN  IN  DT     NN
          |===|     |======|
          Event     Anatomy
                    C1305907

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation?

Regards,
Tomasz


RE: cTAKES dictionary lookup behavior question

Posted by Tomasz Oliwa <ol...@uchicago.edu>.
Sean,

I checked out 'ctakes/trunk' with this fix and run it on the examples from the Description. It no longer finds the incorrect annotations mentioned in the Description. I closed the JIRA entry. Thanks for the quick fix.

Regards,
Tomasz

________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Monday, November 16, 2015 11:25 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Hi Tomasz,

I just checked in a fix.  Could you please re-run your tests and close the issue if it passes?

Thanks,
Sean

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Monday, November 16, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Sean,

I created a JIRA entry for this bug at: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D389&d=BQIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1RTSCop2kmDAXve36EPdNJo6avl2B2SPpQcxCZZdeFw&s=2TRn9S89nvbp0TDN_LC1xTglF3VWlqnkgV5KdURu4KM&e=

It would be great you could check in a fix for it.

Regards,
Tomasz

________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Monday, November 16, 2015 10:20 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Hi all,

This is not intended behavior, it is a bug.  I will check in a fix soon ...

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 12, 2015 6:53 PM
To: britt fitch; dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Britt,

I observed it also depends on what the "missed" word is.

"baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it.
"electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match.

It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made?

Is there any way to tweak or customize this behavior?

Thanks,
Tomasz


________________________________
From: britt fitch [britt.fitch@wiredinformatics.com]
Sent: Thursday, November 12, 2015 5:36 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES dictionary lookup behavior question

The rare words, given the example terms below are "primary", "milk", and "baby".
The lookup allows for a certain number of "misses".
The "baby to" hits on "baby" as the rare word.
"baby to" compared to "baby tooth" is 1 "miss" and qualifies as a match. (in practice, if I recall correctly, "to" is actually discarded entirely, so the comparison is actually "baby" : "baby tooth").

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

"oxygen" > "oxygen therapy"
"pathology" > "pathology department" , "pathology procedure"
"exercise" > "exercise pain management"

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=6LcknYupSIqPd8Uml-tNRhwLudfDpVLBcC5JjZFhFQo&e=
Britt.Fitch@wiredinformatics.com

On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu>> wrote:

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: https://urldefense.proofpoint.com/v2/url?u=http-3A__52.27.22.206-3A8080_index.jsp&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=UmyBQ5X4UBJggOqmIQkANeD0eUz0nrLqGN8Z6__iB8o&e=  but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
           VB   DT   NN  IN  DT     NN
          |===|     |======|
          Event     Anatomy
                    C1305907

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation?

Regards,
Tomasz


RE: cTAKES dictionary lookup behavior question

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Tomasz,

I just checked in a fix.  Could you please re-run your tests and close the issue if it passes?

Thanks,
Sean

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu] 
Sent: Monday, November 16, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Sean,

I created a JIRA entry for this bug at: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D389&d=BQIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1RTSCop2kmDAXve36EPdNJo6avl2B2SPpQcxCZZdeFw&s=2TRn9S89nvbp0TDN_LC1xTglF3VWlqnkgV5KdURu4KM&e= 

It would be great you could check in a fix for it.

Regards,
Tomasz

________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Monday, November 16, 2015 10:20 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Hi all,

This is not intended behavior, it is a bug.  I will check in a fix soon ...

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 12, 2015 6:53 PM
To: britt fitch; dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Britt,

I observed it also depends on what the "missed" word is.

"baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it.
"electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match.

It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made?

Is there any way to tweak or customize this behavior?

Thanks,
Tomasz


________________________________
From: britt fitch [britt.fitch@wiredinformatics.com]
Sent: Thursday, November 12, 2015 5:36 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES dictionary lookup behavior question

The rare words, given the example terms below are "primary", "milk", and "baby".
The lookup allows for a certain number of "misses".
The "baby to" hits on "baby" as the rare word.
"baby to" compared to "baby tooth" is 1 "miss" and qualifies as a match. (in practice, if I recall correctly, "to" is actually discarded entirely, so the comparison is actually "baby" : "baby tooth").

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

"oxygen" > "oxygen therapy"
"pathology" > "pathology department" , "pathology procedure"
"exercise" > "exercise pain management"

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=6LcknYupSIqPd8Uml-tNRhwLudfDpVLBcC5JjZFhFQo&e=
Britt.Fitch@wiredinformatics.com

On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu>> wrote:

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: https://urldefense.proofpoint.com/v2/url?u=http-3A__52.27.22.206-3A8080_index.jsp&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=UmyBQ5X4UBJggOqmIQkANeD0eUz0nrLqGN8Z6__iB8o&e=  but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
           VB   DT   NN  IN  DT     NN
          |===|     |======|
          Event     Anatomy
                    C1305907

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation?

Regards,
Tomasz


RE: cTAKES dictionary lookup behavior question

Posted by Tomasz Oliwa <ol...@uchicago.edu>.
Sean,

I created a JIRA entry for this bug at: https://issues.apache.org/jira/browse/CTAKES-389

It would be great you could check in a fix for it.

Regards,
Tomasz

________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Monday, November 16, 2015 10:20 AM
To: dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Hi all,

This is not intended behavior, it is a bug.  I will check in a fix soon ...

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 12, 2015 6:53 PM
To: britt fitch; dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Britt,

I observed it also depends on what the "missed" word is.

"baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it.
"electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match.

It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made?

Is there any way to tweak or customize this behavior?

Thanks,
Tomasz


________________________________
From: britt fitch [britt.fitch@wiredinformatics.com]
Sent: Thursday, November 12, 2015 5:36 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES dictionary lookup behavior question

The rare words, given the example terms below are "primary", "milk", and "baby".
The lookup allows for a certain number of "misses".
The "baby to" hits on "baby" as the rare word.
"baby to" compared to "baby tooth" is 1 "miss" and qualifies as a match. (in practice, if I recall correctly, "to" is actually discarded entirely, so the comparison is actually "baby" : "baby tooth").

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

"oxygen" > "oxygen therapy"
"pathology" > "pathology department" , "pathology procedure"
"exercise" > "exercise pain management"

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=6LcknYupSIqPd8Uml-tNRhwLudfDpVLBcC5JjZFhFQo&e=
Britt.Fitch@wiredinformatics.com

On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu>> wrote:

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: https://urldefense.proofpoint.com/v2/url?u=http-3A__52.27.22.206-3A8080_index.jsp&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=UmyBQ5X4UBJggOqmIQkANeD0eUz0nrLqGN8Z6__iB8o&e=  but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
           VB   DT   NN  IN  DT     NN
          |===|     |======|
          Event     Anatomy
                    C1305907

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation?

Regards,
Tomasz


RE: cTAKES dictionary lookup behavior question

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi all,

This is not intended behavior, it is a bug.  I will check in a fix soon ...

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu] 
Sent: Thursday, November 12, 2015 6:53 PM
To: britt fitch; dev@ctakes.apache.org
Subject: RE: cTAKES dictionary lookup behavior question

Britt,

I observed it also depends on what the "missed" word is.

"baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it.
"electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match.

It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made?

Is there any way to tweak or customize this behavior?

Thanks,
Tomasz


________________________________
From: britt fitch [britt.fitch@wiredinformatics.com]
Sent: Thursday, November 12, 2015 5:36 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES dictionary lookup behavior question

The rare words, given the example terms below are "primary", "milk", and "baby".
The lookup allows for a certain number of "misses".
The "baby to" hits on "baby" as the rare word.
"baby to" compared to "baby tooth" is 1 "miss" and qualifies as a match. (in practice, if I recall correctly, "to" is actually discarded entirely, so the comparison is actually "baby" : "baby tooth").

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

"oxygen" > "oxygen therapy"
"pathology" > "pathology department" , "pathology procedure"
"exercise" > "exercise pain management"

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=6LcknYupSIqPd8Uml-tNRhwLudfDpVLBcC5JjZFhFQo&e= 
Britt.Fitch@wiredinformatics.com

On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu>> wrote:

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: https://urldefense.proofpoint.com/v2/url?u=http-3A__52.27.22.206-3A8080_index.jsp&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=UmyBQ5X4UBJggOqmIQkANeD0eUz0nrLqGN8Z6__iB8o&e=  but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
           VB   DT   NN  IN  DT     NN
          |===|     |======|
          Event     Anatomy
                    C1305907

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation?

Regards,
Tomasz


RE: cTAKES dictionary lookup behavior question

Posted by Tomasz Oliwa <ol...@uchicago.edu>.
Britt,

I observed it also depends on what the "missed" word is.

"baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it.
"electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match.

It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made?

Is there any way to tweak or customize this behavior?

Thanks,
Tomasz


________________________________
From: britt fitch [britt.fitch@wiredinformatics.com]
Sent: Thursday, November 12, 2015 5:36 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES dictionary lookup behavior question

The rare words, given the example terms below are “primary”, “milk”, and “baby”.
The lookup allows for a certain number of “misses”.
The “baby to” hits on “baby” as the rare word.
“baby to” compared to “baby tooth” is 1 “miss” and qualifies as a match. (in practice, if I recall correctly, “to” is actually discarded entirely, so the comparison is actually “baby” : “baby tooth”).

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

“oxygen” > “oxygen therapy”
“pathology” > “pathology department” , “pathology procedure”
“exercise” > “exercise pain management”

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
Britt.Fitch@wiredinformatics.com

On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu>> wrote:

Hi,

cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: http://52.27.22.206:8080/index.jsp but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup

SENTENCE:  Took  the baby to  the hospital.
           VB   DT   NN  IN  DT     NN
          |===|     |======|
          Event     Anatomy
                    C1305907

It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are

C1305907|primary tooth
C1305907|milk tooth
C1305907|baby tooth

How can "baby to" trigger the "baby tooth" annotation?

Regards,
Tomasz


Re: cTAKES dictionary lookup behavior question

Posted by britt fitch <br...@wiredinformatics.com>.
The rare words, given the example terms below are “primary”, “milk”, and “baby”.
The lookup allows for a certain number of “misses”.
The “baby to” hits on “baby” as the rare word.
“baby to” compared to “baby tooth” is 1 “miss” and qualifies as a match. (in practice, if I recall correctly, “to” is actually discarded entirely, so the comparison is actually “baby” : “baby tooth”).

Others can correct my napkin logic though.

This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses.

For example:

“oxygen” > “oxygen therapy”
“pathology” > “pathology department” , “pathology procedure”
“exercise” > “exercise pain management”

Those are just some quick examples. It depends heavily on what the ontology contains though.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
Britt.Fitch@wiredinformatics.com

> On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu> wrote:
> 
> Hi,
> 
> cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: http://52.27.22.206:8080/index.jsp but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup
> 
> SENTENCE:  Took  the baby to  the hospital.
>            VB   DT   NN  IN  DT     NN
>           |===|     |======|
>           Event     Anatomy
>                     C1305907
> 
> It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are
> 
> C1305907|primary tooth
> C1305907|milk tooth
> C1305907|baby tooth
> 
> How can "baby to" trigger the "baby tooth" annotation?
> 
> Regards,
> Tomasz