You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by digital paula <cy...@hotmail.com> on 2013/12/05 02:58:46 UTC

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved



Pei,
 Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:
<!--
<configurationParameter>

<name>SegmentID</name>

<description/>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

<overrides>

<parameter>SimpleSegmentAnnotator/SegmentID</parameter>  

</overrides>

</configurationParameter>

-->
 
I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it: 1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness
 I looked back in the narrative and the heading was:  HISTORY OF PRESENT ILLNESS: The patient.....
 
I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this:  
segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4 
This would be good to have too:
segmentHeading:  HISTORY OF PRESENT ILLNESS: 
 
Thanks.  
 
Regards,
Paula

  > From: Pei.Chen@childrens.harvard.edu
> To: user@ctakes.apache.org
> Subject: RE: cTAKES Sectionizer:  how to integrate it with clinical pipeline
> Date: Tue, 3 Dec 2013 20:38:22 +0000
> 
> Paula,
> I moved the sectionizer to trunk now and added the xml descriptor for it.
> 
> In your Aggregate Descriptor, just add:
>     <delegateAnalysisEngine key="CDASegmentAnnotator">
>       <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>
>     </delegateAnalysisEngine>  
> .
> <node>CDASegmentAnnotator</node> 
> 
> If you would like to see it wired together via uimaFIT, check out the test case:
> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java
> 
> Hope that helps.
> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)
> --Pei
> 
> 
> 
> From: digital paula [mailto:cybersation@hotmail.com] 
> Sent: Tuesday, December 03, 2013 1:51 PM
> To: user@ctakes.apache.org
> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline
> 
> Hi Pei,
>  
> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.   
>  
> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer?  That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into the AggregatePlaintext AE using UIMAfit.
>  
> Thanks.
>  
> Regards,
> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by digital paula <cy...@hotmail.com>.

Pei, 

I added a couple more items 'PMH' for previous medical history,  'FH' for family history  and 'SH' for social history.  

Regards,
Paula

From: Pei.Chen@childrens.harvard.edu
To: user@ctakes.apache.org
CC: user@ctakes.apache.org
Subject: Re: cTAKES Sectionizer:  how to integrate it with clinical pipeline - Solved
Date: Fri, 6 Dec 2013 13:50:41 +0000

Would be great if we you could share the header names/patterns from your data set if it's not in the mappings file. 
Also side note, if you use the current DrugNER component, the sections to include for the lookup needs to be added. 

Sent from my iPhone

On Dec 6, 2013, at 7:33 AM, "Miller, Timothy" <Ti...@childrens.harvard.edu> wrote:

Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one worry yet, especially if it is the default
 sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header (i.e., introduce a typo)? And just out
 of curiosity, what kind of notes are you running it on? Any particular dataset?

Thanks

Tim

On 12/05/2013 09:04 PM, digital paula wrote:

Pei,   

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary
 testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few
 days. 

Thanks.

Regards,

Paula

From: 
Pei.Chen@childrens.harvard.edu

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Date: Thu, 5 Dec 2013 14:26:56 +0000

Paula,
Glad to hear it’s working for you.  Please feel free to let us know how it works out for
 you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: 
http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…

Thanks,
Pei

From:
 digital paula [mailto:cybersation@hotmail.com]

Sent: Wednesday, December 04, 2013 8:59 PM

To: 
user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in
 the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--

<configurationParameter>

<name>SegmentID</name>

<description/>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

<overrides>

<parameter>SimpleSegmentAnnotator/SegmentID</parameter>  

</overrides>

</configurationParameter>

-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was:  
HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary
 perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this: 

segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4 

This would be good to have too:

segmentHeading:  HISTORY OF PRESENT ILLNESS: 

Thanks.  

Regards,

Paula

> From: 
Pei.Chen@childrens.harvard.edu

> To: user@ctakes.apache.org

> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline

> Date: Tue, 3 Dec 2013 20:38:22 +0000

> 

> Paula,

> I moved the sectionizer to trunk now and added the xml descriptor for it.

> 

> In your Aggregate Descriptor, just add:

> <delegateAnalysisEngine key="CDASegmentAnnotator">

> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>

> </delegateAnalysisEngine> 

> .

> <node>CDASegmentAnnotator</node> 

> 

> If you would like to see it wired together via uimaFIT, check out the test case:

> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java

> 

> Hope that helps.

> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)

> --Pei

> 

> 

> 

> From: digital paula 
[mailto:cybersation@hotmail.com] 

> Sent: Tuesday, December 03, 2013 1:51 PM

> To: user@ctakes.apache.org

> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline

> 

> Hi Pei,

>  

> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.  

>  

> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer? 
 That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into
 the AggregatePlaintext AE using UIMAfit.

>  

> Thanks.

>  

> Regards,

> Paula

Re: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Would be great if we you could share the header names/patterns from your data set if it's not in the mappings file.
Also side note, if you use the current DrugNER component, the sections to include for the lookup needs to be added.

Sent from my iPhone

On Dec 6, 2013, at 7:33 AM, "Miller, Timothy" <Ti...@childrens.harvard.edu>> wrote:

Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one worry yet, especially if it is the default sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header (i.e., introduce a typo)? And just out of curiosity, what kind of notes are you running it on? Any particular dataset?
Thanks
Tim

On 12/05/2013 09:04 PM, digital paula wrote:
Pei,

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few days.

Thanks.

Regards,
Paula

________________________________
From: Pei.Chen@childrens.harvard.edu<ma...@childrens.harvard.edu>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved
Date: Thu, 5 Dec 2013 14:26:56 +0000

Paula,

Glad to hear it’s working for you.  Please feel free to let us know how it works out for you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

Check out: http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…

Thanks,

Pei

From: digital paula [mailto:cybersation@hotmail.com]
Sent: Wednesday, December 04, 2013 8:59 PM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--

<configurationParameter>
<name>SegmentID</name>
<description/>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
<overrides>
<parameter>SimpleSegmentAnnotator/SegmentID</parameter>
</overrides>
</configurationParameter>
-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was:  HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this:
segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4
This would be good to have too:
segmentHeading:  HISTORY OF PRESENT ILLNESS:

Thanks.

Regards,
Paula

> From: Pei.Chen@childrens.harvard.edu<ma...@childrens.harvard.edu>
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline
> Date: Tue, 3 Dec 2013 20:38:22 +0000
>
> Paula,
> I moved the sectionizer to trunk now and added the xml descriptor for it.
>
> In your Aggregate Descriptor, just add:
> <delegateAnalysisEngine key="CDASegmentAnnotator">
> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>
> </delegateAnalysisEngine>
> .
> <node>CDASegmentAnnotator</node>
>
> If you would like to see it wired together via uimaFIT, check out the test case:
> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java
>
> Hope that helps.
> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)
> --Pei
>
>
>
> From: digital paula [mailto:cybersation@hotmail.com]<mailto:[mailto:cybersation@hotmail.com]>
> Sent: Tuesday, December 03, 2013 1:51 PM
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline
>
> Hi Pei,
>
> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.
>
> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer?  That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into the AggregatePlaintext AE using UIMAfit.
>
> Thanks.
>
> Regards,
> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved (but sporadic issue unresolved for now)

Posted by digital paula <cy...@hotmail.com>.

Pei, I figured it out, the explanation is pretty much in the AssertionAnalysisEngine.java file.  

Thanks.

Regards.
Paula

From: cybersation@hotmail.com
To: user@ctakes.apache.org
Subject: RE: cTAKES Sectionizer:  how to integrate it with clinical pipeline - Solved (but sporadic issue unresolved for now)
Date: Fri, 13 Dec 2013 01:10:12 -0500

Thanks Pei but I'm  going to figure out what's going on with the sectionizer and it's sporadic NPE because I take my job as the Sectionizer Component Maintainer (I think that's what Tim referred to it as)  very seriously.  ;-)  

In regards to other features,  I'd like to know more about the features: conditional, generic and confidence.  The only information I could find from the website is this page:

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Assertion

1.  Conditional:  
all I know is that this feature is a Boolean with values true or false from the website.    What would make an annotation get denoted as true or false?

2.  Generic:
from the website the use of a term might be generic (e.g., "gave pt a diabetes brochure").     I looked up the word 'brochure' in the workspace and nothing returned so I have no idea where this is stored.   But I see that when I use the same text "gave pt a diabetes brochure" it returns true for the generic feature.   Can you advise on where the word 'brochure' is located?  I'd like to see what other terms when used in a certain context will give a true value for the generic feature.

3. Confidence:
I couldn't find any information on this feature.   Can you explain this one further and how confidence is determined?  Appears to just be values 0 or 1.    Is it just this file that pertains to determining confidence?
GoldEntityAndAttributeReader.java

Thanks.

Regards,
Paula 

From: Pei.Chen@childrens.harvard.edu
To: user@ctakes.apache.org
Subject: RE: cTAKES Sectionizer:  how to integrate it with clinical pipeline - Solved
Date: Wed, 11 Dec 2013 17:31:29 +0000

Paula,
Yes, I’ve actually seen that NPE a while back[1]… I believe it is actually coming from the medfacts assertion module (which has its own sectioning code…).
According to Matt, “I believe the newer version of the mastif-zoner remove the address conversion piece, which seems to be where the NPE was happening.”
Perhaps he can chime in a bit more on it (would be great if the newer version was available on maven central).

[1] 
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E

From: digital paula [mailto:cybersation@hotmail.com]

Sent: Monday, December 09, 2013 9:41 PM

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Tim,

I'm using the datasets from i2b2.org, requires signing a user data agreement.  

I've done more testing and did get an error "current or previous sentence IS NULL!"  This is sporadic, no identifiable trigger found yet.  When this error occurs, the program crashes.   I'm going to need to step through the code to figure out.  By any chance did
 you get this error? 

When I changed a section from 'hospital course' to 'xhospital course', the segment defaults to a previous segment which is good. 

Regards,

Paula

From: Timothy.Miller@childrens.harvard.edu

To: user@ctakes.apache.org

Subject: Re: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Date: Fri, 6 Dec 2013 12:32:36 +0000

Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one
 worry yet, especially if it is the default sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header
 (i.e., introduce a typo)? And just out of curiosity, what kind of notes are you running it on? Any particular dataset?

Thanks

Tim

On 12/05/2013 09:04 PM, digital paula wrote:

Pei,   

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary
 testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few
 days. 

Thanks.

Regards,

Paula

From:
Pei.Chen@childrens.harvard.edu

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Date: Thu, 5 Dec 2013 14:26:56 +0000

Paula,
Glad to hear it’s working for you.  Please feel free to let us know how it works out for you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: 
http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…
Thanks,
Pei

From: digital paula [mailto:cybersation@hotmail.com]

Sent: Wednesday, December 04, 2013 8:59 PM

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor
 and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--

<configurationParameter>

<name>SegmentID</name>

<description/>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

<overrides>

<parameter>SimpleSegmentAnnotator/SegmentID</parameter>  

</overrides>

</configurationParameter>

-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in
 the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was: 
HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary
 perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this: 

segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4 

This would be good to have too:

segmentHeading:  HISTORY OF PRESENT ILLNESS: 

Thanks.  

Regards,

Paula

> From: 
Pei.Chen@childrens.harvard.edu

> To: user@ctakes.apache.org

> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline

> Date: Tue, 3 Dec 2013 20:38:22 +0000

> 

> Paula,

> I moved the sectionizer to trunk now and added the xml descriptor for it.

> 

> In your Aggregate Descriptor, just add:

> <delegateAnalysisEngine key="CDASegmentAnnotator">

> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>

> </delegateAnalysisEngine> 

> .

> <node>CDASegmentAnnotator</node> 

> 

> If you would like to see it wired together via uimaFIT, check out the test case:

> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java

> 

> Hope that helps.

> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)

> --Pei

> 

> 

> 

> From: digital paula [mailto:cybersation@hotmail.com]

> Sent: Tuesday, December 03, 2013 1:51 PM

> To: user@ctakes.apache.org

> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline

> 

> Hi Pei,

>  

> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.  

>  

> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer? 
 That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into
 the AggregatePlaintext AE using UIMAfit.

>  

> Thanks.

>  

> Regards,

> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved (but sporadic issue unresolved for now)

Posted by digital paula <cy...@hotmail.com>.

Thanks Pei but I'm  going to figure out what's going on with the sectionizer and it's sporadic NPE because I take my job as the Sectionizer Component Maintainer (I think that's what Tim referred to it as)  very seriously.  ;-)  

In regards to other features,  I'd like to know more about the features: conditional, generic and confidence.  The only information I could find from the website is this page:

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Assertion

1.  Conditional:  
all I know is that this feature is a Boolean with values true or false from the website.    What would make an annotation get denoted as true or false?

2.  Generic:
from the website the use of a term might be generic (e.g., "gave pt a diabetes brochure").     I looked up the word 'brochure' in the workspace and nothing returned so I have no idea where this is stored.   But I see that when I use the same text "gave pt a diabetes brochure" it returns true for the generic feature.   Can you advise on where the word 'brochure' is located?  I'd like to see what other terms when used in a certain context will give a true value for the generic feature.

3. Confidence:
I couldn't find any information on this feature.   Can you explain this one further and how confidence is determined?  Appears to just be values 0 or 1.    Is it just this file that pertains to determining confidence?
GoldEntityAndAttributeReader.java

Thanks.

Regards,
Paula 

From: Pei.Chen@childrens.harvard.edu
To: user@ctakes.apache.org
Subject: RE: cTAKES Sectionizer:  how to integrate it with clinical pipeline - Solved
Date: Wed, 11 Dec 2013 17:31:29 +0000

Paula,
Yes, I’ve actually seen that NPE a while back[1]… I believe it is actually coming from the medfacts assertion module (which has its own sectioning code…).
According to Matt, “I believe the newer version of the mastif-zoner remove the address conversion piece, which seems to be where the NPE was happening.”
Perhaps he can chime in a bit more on it (would be great if the newer version was available on maven central).

[1] 
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E

From: digital paula [mailto:cybersation@hotmail.com]

Sent: Monday, December 09, 2013 9:41 PM

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Tim,

I'm using the datasets from i2b2.org, requires signing a user data agreement.  

I've done more testing and did get an error "current or previous sentence IS NULL!"  This is sporadic, no identifiable trigger found yet.  When this error occurs, the program crashes.   I'm going to need to step through the code to figure out.  By any chance did
 you get this error? 

When I changed a section from 'hospital course' to 'xhospital course', the segment defaults to a previous segment which is good. 

Regards,

Paula

From: Timothy.Miller@childrens.harvard.edu

To: user@ctakes.apache.org

Subject: Re: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Date: Fri, 6 Dec 2013 12:32:36 +0000

Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one
 worry yet, especially if it is the default sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header
 (i.e., introduce a typo)? And just out of curiosity, what kind of notes are you running it on? Any particular dataset?

Thanks

Tim

On 12/05/2013 09:04 PM, digital paula wrote:

Pei,   

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary
 testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few
 days. 

Thanks.

Regards,

Paula

From:
Pei.Chen@childrens.harvard.edu

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Date: Thu, 5 Dec 2013 14:26:56 +0000

Paula,
Glad to hear it’s working for you.  Please feel free to let us know how it works out for you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: 
http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…
Thanks,
Pei

From: digital paula [mailto:cybersation@hotmail.com]

Sent: Wednesday, December 04, 2013 8:59 PM

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor
 and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--

<configurationParameter>

<name>SegmentID</name>

<description/>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

<overrides>

<parameter>SimpleSegmentAnnotator/SegmentID</parameter>  

</overrides>

</configurationParameter>

-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in
 the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was: 
HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary
 perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this: 

segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4 

This would be good to have too:

segmentHeading:  HISTORY OF PRESENT ILLNESS: 

Thanks.  

Regards,

Paula

> From: 
Pei.Chen@childrens.harvard.edu

> To: user@ctakes.apache.org

> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline

> Date: Tue, 3 Dec 2013 20:38:22 +0000

> 

> Paula,

> I moved the sectionizer to trunk now and added the xml descriptor for it.

> 

> In your Aggregate Descriptor, just add:

> <delegateAnalysisEngine key="CDASegmentAnnotator">

> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>

> </delegateAnalysisEngine> 

> .

> <node>CDASegmentAnnotator</node> 

> 

> If you would like to see it wired together via uimaFIT, check out the test case:

> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java

> 

> Hope that helps.

> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)

> --Pei

> 

> 

> 

> From: digital paula [mailto:cybersation@hotmail.com]

> Sent: Tuesday, December 03, 2013 1:51 PM

> To: user@ctakes.apache.org

> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline

> 

> Hi Pei,

>  

> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.  

>  

> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer? 
 That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into
 the AggregatePlaintext AE using UIMAfit.

>  

> Thanks.

>  

> Regards,

> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Paula,
Yes, I've actually seen that NPE a while back[1]... I believe it is actually coming from the medfacts assertion module (which has its own sectioning code...).
According to Matt, "I believe the newer version of the mastif-zoner remove the address conversion piece, which seems to be where the NPE was happening."
Perhaps he can chime in a bit more on it (would be great if the newer version was available on maven central).

[1] http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E

From: digital paula [mailto:cybersation@hotmail.com]
Sent: Monday, December 09, 2013 9:41 PM
To: user@ctakes.apache.org
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Tim,

I'm using the datasets from i2b2.org, requires signing a user data agreement.

I've done more testing and did get an error "current or previous sentence IS NULL!"  This is sporadic, no identifiable trigger found yet.  When this error occurs, the program crashes.   I'm going to need to step through the code to figure out.  By any chance did you get this error?

When I changed a section from 'hospital course' to 'xhospital course', the segment defaults to a previous segment which is good.

Regards,
Paula

________________________________
From: Timothy.Miller@childrens.harvard.edu
To: user@ctakes.apache.org
Subject: Re: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved
Date: Fri, 6 Dec 2013 12:32:36 +0000
Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one worry yet, especially if it is the default sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header (i.e., introduce a typo)? And just out of curiosity, what kind of notes are you running it on? Any particular dataset?
Thanks
Tim

On 12/05/2013 09:04 PM, digital paula wrote:
Pei,

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few days.

Thanks.

Regards,
Paula

________________________________
From: Pei.Chen@childrens.harvard.edu<ma...@childrens.harvard.edu>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved
Date: Thu, 5 Dec 2013 14:26:56 +0000
Paula,
Glad to hear it's working for you.  Please feel free to let us know how it works out for you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:
There is a field called Segment.preferredText.  Which should display the first text column in the mappings file...
Thanks,
Pei

From: digital paula [mailto:cybersation@hotmail.com]
Sent: Wednesday, December 04, 2013 8:59 PM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:
<!--
<configurationParameter>
<name>SegmentID</name>
<description/>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
<overrides>
<parameter>SimpleSegmentAnnotator/SegmentID</parameter>
</overrides>
</configurationParameter>
-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was:  HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this:
segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4
This would be good to have too:
segmentHeading:  HISTORY OF PRESENT ILLNESS:

Thanks.

Regards,
Paula

> From: Pei.Chen@childrens.harvard.edu<ma...@childrens.harvard.edu>
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline
> Date: Tue, 3 Dec 2013 20:38:22 +0000
>
> Paula,
> I moved the sectionizer to trunk now and added the xml descriptor for it.
>
> In your Aggregate Descriptor, just add:
> <delegateAnalysisEngine key="CDASegmentAnnotator">
> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>
> </delegateAnalysisEngine>
> .
> <node>CDASegmentAnnotator</node>
>
> If you would like to see it wired together via uimaFIT, check out the test case:
> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java
>
> Hope that helps.
> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)
> --Pei
>
>
>
> From: digital paula [mailto:cybersation@hotmail.com]<mailto:[mailto:cybersation@hotmail.com]>
> Sent: Tuesday, December 03, 2013 1:51 PM
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline
>
> Hi Pei,
>
> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.
>
> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer?  That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into the AggregatePlaintext AE using UIMAfit.
>
> Thanks.
>
> Regards,
> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by digital paula <cy...@hotmail.com>.

Tim,

I'm using the datasets from i2b2.org, requires signing a user data agreement.  

I've done more testing and did get an error "current or previous sentence IS NULL!"  This is sporadic, no identifiable trigger found yet.  When this error occurs, the program crashes.   I'm going to need to step through the code to figure out.  By any chance did you get this error? 

When I changed a section from 'hospital course' to 'xhospital course', the segment defaults to a previous segment which is good.  

Regards,
Paula

From: Timothy.Miller@childrens.harvard.edu
To: user@ctakes.apache.org
Subject: Re: cTAKES Sectionizer:  how to integrate it with clinical pipeline - Solved
Date: Fri, 6 Dec 2013 12:32:36 +0000

Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one worry yet, especially if it is the default
 sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header (i.e., introduce a typo)? And just out
 of curiosity, what kind of notes are you running it on? Any particular dataset?

Thanks

Tim

On 12/05/2013 09:04 PM, digital paula wrote:

Pei,   

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary
 testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few
 days. 

Thanks.

Regards,

Paula

From: 
Pei.Chen@childrens.harvard.edu

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Date: Thu, 5 Dec 2013 14:26:56 +0000

Paula,
Glad to hear it’s working for you.  Please feel free to let us know how it works out for you
 in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: 
http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…

Thanks,
Pei

From:
 digital paula [mailto:cybersation@hotmail.com]

Sent: Wednesday, December 04, 2013 8:59 PM

To: 
user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in
 the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--

<configurationParameter>

<name>SegmentID</name>

<description/>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

<overrides>

<parameter>SimpleSegmentAnnotator/SegmentID</parameter>  

</overrides>

</configurationParameter>

-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was:  
HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary
 perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this: 

segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4 

This would be good to have too:

segmentHeading:  HISTORY OF PRESENT ILLNESS: 

Thanks.  

Regards,

Paula

> From: 
Pei.Chen@childrens.harvard.edu

> To: user@ctakes.apache.org

> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline

> Date: Tue, 3 Dec 2013 20:38:22 +0000

> 

> Paula,

> I moved the sectionizer to trunk now and added the xml descriptor for it.

> 

> In your Aggregate Descriptor, just add:

> <delegateAnalysisEngine key="CDASegmentAnnotator">

> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>

> </delegateAnalysisEngine> 

> .

> <node>CDASegmentAnnotator</node> 

> 

> If you would like to see it wired together via uimaFIT, check out the test case:

> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java

> 

> Hope that helps.

> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)

> --Pei

> 

> 

> 

> From: digital paula 
[mailto:cybersation@hotmail.com] 

> Sent: Tuesday, December 03, 2013 1:51 PM

> To: user@ctakes.apache.org

> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline

> 

> Hi Pei,

>  

> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.  

>  

> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer? 
 That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into
 the AggregatePlaintext AE using UIMAfit.

>  

> Thanks.

>  

> Regards,

> Paula

Re: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Glad you didn't encounter any issues. As far as getting it running there wasn't any particular issue, my worry is just the 'unknown unknowns' as they say. I think for performance I have one worry yet, especially if it is the default sectionizer -- does it fail gracefully and will it ever skip text? I think it should be ok but that would be one thing it would be worth testing. So, Paula, what happens if you change the spelling of a section header (i.e., introduce a typo)? And just out of curiosity, what kind of notes are you running it on? Any particular dataset?
Thanks
Tim

On 12/05/2013 09:04 PM, digital paula wrote:
Pei,

I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.

In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few days.

Thanks.

Regards,
Paula

________________________________
From: Pei.Chen@childrens.harvard.edu<ma...@childrens.harvard.edu>
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved
Date: Thu, 5 Dec 2013 14:26:56 +0000

Paula,

Glad to hear it’s working for you.  Please feel free to let us know how it works out for you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

Check out: http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…

Thanks,

Pei

From: digital paula [mailto:cybersation@hotmail.com]
Sent: Wednesday, December 04, 2013 8:59 PM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--

<configurationParameter>
<name>SegmentID</name>
<description/>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
<overrides>
<parameter>SimpleSegmentAnnotator/SegmentID</parameter>
</overrides>
</configurationParameter>
-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:

org/apache/ctakes/core/sections/ccda_sections.txt

I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was:  HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this:
segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4
This would be good to have too:
segmentHeading:  HISTORY OF PRESENT ILLNESS:

Thanks.

Regards,
Paula

> From: Pei.Chen@childrens.harvard.edu<ma...@childrens.harvard.edu>
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline
> Date: Tue, 3 Dec 2013 20:38:22 +0000
>
> Paula,
> I moved the sectionizer to trunk now and added the xml descriptor for it.
>
> In your Aggregate Descriptor, just add:
> <delegateAnalysisEngine key="CDASegmentAnnotator">
> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>
> </delegateAnalysisEngine>
> .
> <node>CDASegmentAnnotator</node>
>
> If you would like to see it wired together via uimaFIT, check out the test case:
> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java
>
> Hope that helps.
> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)
> --Pei
>
>
>
> From: digital paula [mailto:cybersation@hotmail.com]<mailto:[mailto:cybersation@hotmail.com]>
> Sent: Tuesday, December 03, 2013 1:51 PM
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline
>
> Hi Pei,
>
> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.
>
> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer?  That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into the AggregatePlaintext AE using UIMAfit.
>
> Thanks.
>
> Regards,
> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by digital paula <cy...@hotmail.com>.

Pei,   
 
I appreciate you mentioning the preferredText feature for getting section headings to render, the first column in the mapping  file should suffice.
 
In a previous post, Tim stated that the sectionizer would be a huge benefit to the research community once it's working or something along those lines.   What was the problem with getting it to work?  I ask because I didn't encounter any issues during my preliminary testing.  All I did was an integration and minor configuration, as stated in my previous post.   The reason why I'd like to know is so I'm cognizant of any known issues in case I encounter them once I get back to using the sectionizer...should be in a few days. 
 
Thanks.
 
Regards,
Paula
 
From: Pei.Chen@childrens.harvard.edu
To: user@ctakes.apache.org
Subject: RE: cTAKES Sectionizer:  how to integrate it with clinical pipeline - Solved
Date: Thu, 5 Dec 2013 14:26:56 +0000









Paula,
Glad to hear it’s working for you.  Please feel free to let us know how it works out for you in your use case and dataset.
 
>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)
 
>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:

There is a field called Segment.preferredText.  Which should display the first text column in the mappings file…




Thanks,
Pei
 



From: digital paula [mailto:cybersation@hotmail.com]


Sent: Wednesday, December 04, 2013 8:59 PM

To: user@ctakes.apache.org

Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved


 


Pei,

 


Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in the
 flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--


<configurationParameter>

<name>SegmentID</name>

<description/>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

<overrides>

<parameter>SimpleSegmentAnnotator/SegmentID</parameter>  

</overrides>

</configurationParameter>

-->


 


I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:


org/apache/ctakes/core/sections/ccda_sections.txt


I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:


 


1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness


 


I looked back in the narrative and the heading was:  
HISTORY OF PRESENT ILLNESS: The patient.....

 

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary
 perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this: 


segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4 

This would be good to have too:

segmentHeading:  HISTORY OF PRESENT ILLNESS: 

 

Thanks.  

 

Regards,

Paula

 

 



> From: Pei.Chen@childrens.harvard.edu

> To: user@ctakes.apache.org

> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline

> Date: Tue, 3 Dec 2013 20:38:22 +0000

> 

> Paula,

> I moved the sectionizer to trunk now and added the xml descriptor for it.

> 

> In your Aggregate Descriptor, just add:

> <delegateAnalysisEngine key="CDASegmentAnnotator">

> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>

> </delegateAnalysisEngine> 

> .

> <node>CDASegmentAnnotator</node> 

> 

> If you would like to see it wired together via uimaFIT, check out the test case:

> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java

> 

> Hope that helps.

> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)

> --Pei

> 

> 

> 

> From: digital paula [mailto:cybersation@hotmail.com]


> Sent: Tuesday, December 03, 2013 1:51 PM

> To: user@ctakes.apache.org

> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline

> 

> Hi Pei,

>  

> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.  


>  

> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer? 
 That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into
 the AggregatePlaintext AE using UIMAfit.

>  

> Thanks.

>  

> Regards,

> Paula

RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Paula,
Glad to hear it's working for you.  Please feel free to let us know how it works out for you in your use case and dataset.

>I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
Check out: http://svn.apache.org/r1547576 ctakes/trunk/ctakes-core-res/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt   (with props)

>This would be good to have too: segmentHeading:  HISTORY OF PRESENT ILLNESS:
There is a field called Segment.preferredText.  Which should display the first text column in the mappings file...

Thanks,
Pei

From: digital paula [mailto:cybersation@hotmail.com]
Sent: Wednesday, December 04, 2013 8:59 PM
To: user@ctakes.apache.org
Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline - Solved

Pei,

Okay, the sectionizer is now integrated in the clinical pipeline and I did some preliminary testing to confirm.  I added the CDASegmentAnnotator lines that you stated to the Aggregate Descriptor and commented the simple annotator in the flow.  In addition, I also had to comment out this in order for the Aggregate descriptor to save with no error:

<!--
<configurationParameter>
<name>SegmentID</name>
<description/>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
<overrides>
<parameter>SimpleSegmentAnnotator/SegmentID</parameter>
</overrides>
</configurationParameter>
-->

I didn't see it in the trunk so I  manually added the text file ccda_sections.txt taken from the sandbox.    I added it under src/main/resources in ctakes-core:
org/apache/ctakes/core/sections/ccda_sections.txt
I tested on a few narratives and I'm attaching what returned using CVD tool for one of them.    The segmentID was populated with segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4.     I looked in the ccda_sections.txt file and this was it:

1.3.6.1.4.1.19376.1.5.3.1.3.4,10164-2,HISTORY OF PRESENT ILLNESS,brief history of physical illness,history of present illness,history of the present illness

I looked back in the narrative and the heading was:  HISTORY OF PRESENT ILLNESS: The patient.....

I just tested on a few narratives and though that hardly constitutes  testing because I need to resolve a more urgent issue for my research but will return to this for full testing entailing the application to all of my narratives.   However, from a preliminary perspective, it looks good....only thing I'd like to see is the actual text as well for the segment heading as another feature.  For example what displays is this:
segmentID:  1.3.6.1.4.1.19376.1.5.3.1.3.4
This would be good to have too:
segmentHeading:  HISTORY OF PRESENT ILLNESS:

Thanks.

Regards,
Paula




> From: Pei.Chen@childrens.harvard.edu
> To: user@ctakes.apache.org
> Subject: RE: cTAKES Sectionizer: how to integrate it with clinical pipeline
> Date: Tue, 3 Dec 2013 20:38:22 +0000
>
> Paula,
> I moved the sectionizer to trunk now and added the xml descriptor for it.
>
> In your Aggregate Descriptor, just add:
> <delegateAnalysisEngine key="CDASegmentAnnotator">
> <import location="../../../ctakes-core/desc/analysis_engine/CDASegmentAnnotator.xml"/>
> </delegateAnalysisEngine>
> .
> <node>CDASegmentAnnotator</node>
>
> If you would like to see it wired together via uimaFIT, check out the test case:
> ctakes-core/src/test/java/org/apache/ctakes/core/ae/TestCDASegmentAnnotator.java
>
> Hope that helps.
> It might be even worthwhile defaulting to this instead of the SimpleSegment (since simple segment does nothing more than span the entire document...)
> --Pei
>
>
>
> From: digital paula [mailto:cybersation@hotmail.com]<mailto:[mailto:cybersation@hotmail.com]>
> Sent: Tuesday, December 03, 2013 1:51 PM
> To: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: cTAKES Sectionizer: how to integrate it with clinical pipeline
>
> Hi Pei,
>
> Last week we discussed briefly the sectionizer and now that I have it loaded successfully I just need to integrate it in the clinical pipeline.
>
> The sectionizer doesn't have a desc folder with associated XML descriptor and I understand that things are moving towards UIMAfit so that's probably the reason why it doesn't.  Can you provide some guidance on what you'd recommend for testing the sectionizer?  That is, should I just create the XML Descriptor using one of the reference materials from the UIMA website for creating descriptors or would you recommend  using UIMAfit?  If the latter, can you provide assistance of how to integrate the sectionizer into the AggregatePlaintext AE using UIMAfit.
>
> Thanks.
>
> Regards,
> Paula