You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by "Baas,Leah" <Le...@SanfordHealth.org> on 2019/01/08 15:47:53 UTC

Re: [EXTERNAL] Re: Filtering Annotated Files

Thanks Alden!

I am learning python and I think this will be extremely helpful.

Best,
Leah



From: Alden Gordon <al...@rubiconmd.com>
Reply-To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Date: Tuesday, January 8, 2019 at 9:43 AM
To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: [EXTERNAL] Re: Filtering Annotated Files

Leah,

If you know python, feel free to use the simple class I wrote to parse cTAKES XMI files, attached. It only pulls out the information I needed for my use case so you may need to adapt it.

Best,
Alden

On Tue, Jan 8, 2019 at 9:53 AM Smith, Lincoln <Li...@highmark.com>> wrote:
I don't know of anything other than parsing the xml text to look for your preferred terminology and CUIs of interest in the text. Its not overly difficult in R if you google some of their xml parsing examples. Lincoln

Lincoln Smith, MD, MS
Director, Analytic Enablement
Customer Engagement & Insight
412-544-8043
Lincoln.Smith@highmark.com<ma...@highmark.com>

From: Baas,Leah [mailto:Leah.Baas@SanfordHealth.org<ma...@SanfordHealth.org>]
Sent: Tuesday, January 08, 2019 9:44 AM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: [EXTERNAL] Filtering Annotated Files

To whom it may concern,

Hello! I am a student researcher who is new to NLP and cTAKES. I am trying to use cTAKES to extract clinical text indicative of BRCA mutations, and I’m feeling a bit lost. I’ve described my current progress below. Wondering if you can guide me to the next step:

So far, I’ve been able to create .xml files for each subject in my dataset, run the files through the default clinical pipeline, and view the annotated output files in the CVD. However, my goal is to “filter” the annotations for concepts relevant to BRCA mutations (such as UMLS CUIs and SNOMED CT terms), and this is where I’m getting stuck. Is there a way to isolate these specific concepts within the cTAKES system? Or does this require post-processing using a different platform?

Thanks for entertaining my amateur question!

Leah Baas


-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

________________________________

The information contained in this transmission may contain privileged and confidential information including personal information protected by federal and/or state privacy laws. It is intended only for the use of the addressee named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Highmark Health is a Pennsylvania nonprofit corporation. This communication may come from Highmark Health or one of its subsidiaries or affiliated businesses.


--

Alden Gordon
Director of Data Science & Analytics
(860) 402-6572


[mage removed by sender.]
rubiconmd.com<https://www.rubiconmd.com/>

Re: [EXTERNAL] Re: Filtering Annotated Files

Posted by "Baas,Leah" <Le...@SanfordHealth.org>.

Thank you, Tim!

I am not very familiar with Java, but I will take a look at this.

Best,
Leah

From: "Miller, Timothy" <Ti...@childrens.harvard.edu>
Reply-To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Date: Tuesday, January 8, 2019 at 9:54 AM
To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: Re: [EXTERNAL] Re: Filtering Annotated Files

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CtakesRestController.java?view=markup
-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

Re: [EXTERNAL] Re: Filtering Annotated Files

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

If you prefer to stay in Java, the annotations are stored in a data structure called a CAS. There are utility classes provided in UIMA and UIMAFit to extract these annotations. There is some code to do this in the ctakes-web-rest project, it takes all the CUIs extracted from the input and sends back a json object:

Check out the methods getAnalyzedJSON(...) and process(...)
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CtakesRestController.java?view=markup
which will lead you to JCasParser and CuiResponse:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/util/JCasParser.java?view=markup
<http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup>http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup

Tim

-----Original Message-----
From: "Baas,Leah" <Leah.Baas@SanfordHealth.org<mailto:%22Baas,Leah%22%20%3cLeah.Baas@SanfordHealth.org%3e>>
Reply-to: <us...@ctakes.apache.org>
To: user@ctakes.apache.org <user@ctakes.apache.org<mailto:%22user@ctakes.apache.org%22%20%3cuser@ctakes.apache.org%3e>>
Subject: Re: [EXTERNAL] Re: Filtering Annotated Files
Date: Tue, 8 Jan 2019 15:47:53 +0000

Thanks Alden!

I am learning python and I think this will be extremely helpful.

Best,
Leah



From: Alden Gordon <al...@rubiconmd.com>
Reply-To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Date: Tuesday, January 8, 2019 at 9:43 AM
To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: [EXTERNAL] Re: Filtering Annotated Files

Leah,

If you know python, feel free to use the simple class I wrote to parse cTAKES XMI files, attached. It only pulls out the information I needed for my use case so you may need to adapt it.

Best,
Alden

On Tue, Jan 8, 2019 at 9:53 AM Smith, Lincoln <Li...@highmark.com>> wrote:
I don't know of anything other than parsing the xml text to look for your preferred terminology and CUIs of interest in the text. Its not overly difficult in R if you google some of their xml parsing examples. Lincoln

Lincoln Smith, MD, MS
Director, Analytic Enablement
Customer Engagement & Insight
412-544-8043
Lincoln.Smith@highmark.com<ma...@highmark.com>

From: Baas,Leah [mailto:Leah.Baas@SanfordHealth.org<ma...@SanfordHealth.org>]
Sent: Tuesday, January 08, 2019 9:44 AM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: [EXTERNAL] Filtering Annotated Files

To whom it may concern,

Hello! I am a student researcher who is new to NLP and cTAKES. I am trying to use cTAKES to extract clinical text indicative of BRCA mutations, and I’m feeling a bit lost. I’ve described my current progress below. Wondering if you can guide me to the next step:

So far, I’ve been able to create .xml files for each subject in my dataset, run the files through the default clinical pipeline, and view the annotated output files in the CVD. However, my goal is to “filter” the annotations for concepts relevant to BRCA mutations (such as UMLS CUIs and SNOMED CT terms), and this is where I’m getting stuck. Is there a way to isolate these specific concepts within the cTAKES system? Or does this require post-processing using a different platform?

Thanks for entertaining my amateur question!

Leah Baas


-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

________________________________

The information contained in this transmission may contain privileged and confidential information including personal information protected by federal and/or state privacy laws. It is intended only for the use of the addressee named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Highmark Health is a Pennsylvania nonprofit corporation. This communication may come from Highmark Health or one of its subsidiaries or affiliated businesses.


--

Alden Gordon
Director of Data Science & Analytics
(860) 402-6572


[mage removed by sender.]
rubiconmd.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.rubiconmd.com_&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qPlEU76rQnDMaYeRjU1rulfG3BNk2QZuwDNyveQrogE&s=5ygFYTvFoYpqIGWAb0C1pdy-Zp1g1aIIpnC8dnG0v_c&e=>