You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by kishore <ka...@gmail.com> on 2017/12/29 09:46:01 UTC

Sectionheadings are not coming properly

Hello All,
        I am new to the community. to identify Sectionheadings I am
using AdvancedTokenizerPipeline.piper in my program. In segment annotation,
it is showing "Family History:" only. In my document I have many other
headings like "REVIEW OF SYSTEMS:","SOCIAL HISTORY:". I have seen regex
pattern for them in DefaultSectionRegex.bsv. Can anyone help me in this.

Thanks and Regards,
Kishore.

RE: Sectionheadings are not coming properly [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Kishore,

You should be able to find some examples of how to run multiple files in the ctakes-examples module src/    There should be at least one example class that uses the AggregateBuilder and has "Files" or "Directory" in the class name.

Otherwise you can implement your own annotator that performs your work starting with .indexCovered(..).  That way you can try various pre-defined pipelines to see what gives you the best results.   There should be at least one example annotator in ctakes-examples.

Sean

-----Original Message-----
From: kishore [mailto:kasaraneni.kishore@gmail.com] 
Sent: Tuesday, January 02, 2018 6:44 AM
To: dev@ctakes.apache.org
Subject: Re: Sectionheadings are not coming properly [EXTERNAL]

Hi Sean,
            Thanks allot, as you mentioned I have document like this "SOCIAL HISTORY:  Patient is reticent and withdrawn .". I removed eol($), now its working. I am able to read annotations section wise.
            I have another question. Right now my code is like this
            String note = "Hello World!  I feel no pain.  My father takes aspirin.  My sister might have a headache.";
            JCas jcas = JCasFactory.createJCas();
            jcas.setDocumentText(note);
            AggregateBuilder builder = new AggregateBuilder();
             builder.add..........
             .........
             .........
           SimplePipeline.runPipeline(jcas, builder.createAggregateDescription());
            Map<Segment,Collection<IdentifiedAnnoation>> annotationSections =

JCasUtil.indexCovered( jCas, Segment.class, IdentifiedAnnotation.class );

for ( Map.Entry<Segment,Collection<IdentifiedAnnotation>> entry :
annotationSections.entrySet() ){

---------

---------

}
          I am able to run this code for single document. can we run this for Multiple documents. How can we get JCas object for each document to pass it for JCasUtil.indexCovered(....); Thank you, Kishore.

On Fri, Dec 29, 2017 at 9:45 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Kishore,
>
> From what you can tell, is there anything in your section headers that 
> may not fit the regex?  Are they all on a line by themselves?  That is 
> one requirement for section headers in that pipeline.  For instance, 
> this will not work:
>
> SOCIAL HISTORY:  Patient is reticent and withdrawn ...
>
> If this is the case then you can try making a copy of the regex bsv 
> file and remove the eol requirement in each section regex.  If this is 
> the problem then we should probably consider adding the second regex 
> (no eol) type as an option.
>
> Another thing that wouldn't work is if the section headers have a 
> prefix of some sort, for instance an enumeration.
>
> 1) SOCIAL HISTORY:
> Patient is reticent and withdrawn ...
>
> 2) REVIEW OF SYMPTOMS:
> ...
>
> Another possibility is that the regex requires an empty line above 
> each section header.  I am not sure if this is the case or not - I 
> don't have ctakes open at this moment.
>
> Lastly, do you see any "regex timed out" messages in your log?  If the 
> note is particularly long then the regex may time out on more complex 
> patterns.  If that is the case then we can make the timeout variable 
> and you can retry with different values.
>
>
> Sean
>
>
> -----Original Message-----
> From: kishore [mailto:kasaraneni.kishore@gmail.com]
> Sent: Friday, December 29, 2017 4:46 AM
> To: dev@ctakes.apache.org
> Subject: Sectionheadings are not coming properly [EXTERNAL]
>
> Hello All,
>         I am new to the community. to identify Sectionheadings I am 
> using AdvancedTokenizerPipeline.piper in my program. In segment 
> annotation, it is showing "Family History:" only. In my document I 
> have many other headings like "REVIEW OF SYSTEMS:","SOCIAL HISTORY:". 
> I have seen regex pattern for them in DefaultSectionRegex.bsv. Can anyone help me in this.
>
> Thanks and Regards,
> Kishore.
>

Re: Sectionheadings are not coming properly [EXTERNAL]

Posted by kishore <ka...@gmail.com>.
Hi Sean,
            Thanks allot, as you mentioned I have document like this
"SOCIAL HISTORY:  Patient is reticent and withdrawn .". I removed eol($),
now its working. I am able to read annotations section wise.
            I have another question. Right now my code is like this
            String note = "Hello World!  I feel no pain.  My father takes
aspirin.  My sister might have a headache.";
            JCas jcas = JCasFactory.createJCas();
            jcas.setDocumentText(note);
            AggregateBuilder builder = new AggregateBuilder();
             builder.add..........
             .........
             .........
           SimplePipeline.runPipeline(jcas,
builder.createAggregateDescription());
            Map<Segment,Collection<IdentifiedAnnoation>> annotationSections
=

JCasUtil.indexCovered( jCas, Segment.class, IdentifiedAnnotation.class );

for ( Map.Entry<Segment,Collection<IdentifiedAnnotation>> entry :
annotationSections.entrySet() ){

---------

---------

}
          I am able to run this code for single document. can we run this
for Multiple documents. How can we get JCas object for each document to
pass it for JCasUtil.indexCovered(....);
Thank you,
Kishore.

On Fri, Dec 29, 2017 at 9:45 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Kishore,
>
> From what you can tell, is there anything in your section headers that may
> not fit the regex?  Are they all on a line by themselves?  That is one
> requirement for section headers in that pipeline.  For instance, this will
> not work:
>
> SOCIAL HISTORY:  Patient is reticent and withdrawn ...
>
> If this is the case then you can try making a copy of the regex bsv file
> and remove the eol requirement in each section regex.  If this is the
> problem then we should probably consider adding the second regex (no eol)
> type as an option.
>
> Another thing that wouldn't work is if the section headers have a prefix
> of some sort, for instance an enumeration.
>
> 1) SOCIAL HISTORY:
> Patient is reticent and withdrawn ...
>
> 2) REVIEW OF SYMPTOMS:
> ...
>
> Another possibility is that the regex requires an empty line above each
> section header.  I am not sure if this is the case or not - I don't have
> ctakes open at this moment.
>
> Lastly, do you see any "regex timed out" messages in your log?  If the
> note is particularly long then the regex may time out on more complex
> patterns.  If that is the case then we can make the timeout variable and
> you can retry with different values.
>
>
> Sean
>
>
> -----Original Message-----
> From: kishore [mailto:kasaraneni.kishore@gmail.com]
> Sent: Friday, December 29, 2017 4:46 AM
> To: dev@ctakes.apache.org
> Subject: Sectionheadings are not coming properly [EXTERNAL]
>
> Hello All,
>         I am new to the community. to identify Sectionheadings I am using
> AdvancedTokenizerPipeline.piper in my program. In segment annotation, it
> is showing "Family History:" only. In my document I have many other
> headings like "REVIEW OF SYSTEMS:","SOCIAL HISTORY:". I have seen regex
> pattern for them in DefaultSectionRegex.bsv. Can anyone help me in this.
>
> Thanks and Regards,
> Kishore.
>

RE: Sectionheadings are not coming properly [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Kishore,

From what you can tell, is there anything in your section headers that may not fit the regex?  Are they all on a line by themselves?  That is one requirement for section headers in that pipeline.  For instance, this will not work:

SOCIAL HISTORY:  Patient is reticent and withdrawn ...

If this is the case then you can try making a copy of the regex bsv file and remove the eol requirement in each section regex.  If this is the problem then we should probably consider adding the second regex (no eol) type as an option.

Another thing that wouldn't work is if the section headers have a prefix of some sort, for instance an enumeration.

1) SOCIAL HISTORY:
Patient is reticent and withdrawn ...

2) REVIEW OF SYMPTOMS:
...

Another possibility is that the regex requires an empty line above each section header.  I am not sure if this is the case or not - I don't have ctakes open at this moment.

Lastly, do you see any "regex timed out" messages in your log?  If the note is particularly long then the regex may time out on more complex patterns.  If that is the case then we can make the timeout variable and you can retry with different values.


Sean


-----Original Message-----
From: kishore [mailto:kasaraneni.kishore@gmail.com] 
Sent: Friday, December 29, 2017 4:46 AM
To: dev@ctakes.apache.org
Subject: Sectionheadings are not coming properly [EXTERNAL]

Hello All,
        I am new to the community. to identify Sectionheadings I am using AdvancedTokenizerPipeline.piper in my program. In segment annotation, it is showing "Family History:" only. In my document I have many other headings like "REVIEW OF SYSTEMS:","SOCIAL HISTORY:". I have seen regex pattern for them in DefaultSectionRegex.bsv. Can anyone help me in this.

Thanks and Regards,
Kishore.