You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2019/07/18 19:52:30 UTC

Re: cTAKES Pipeline [EXTERNAL]

Hi Maral,

This might be what you are talking about with respect to the Default Clinical Pipeline
https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline

That lists a command line method for running a set of files and getting xml output.

The default clinical pipeline configuration is actually contained in the plain text (piper) file
resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper

If you are looking at source code then the file is ctakes-clinical-pipeline-res/src/main/resources/ ...

You can also select and run a piper file with a gui
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI

Both methods are mentioned near the bottom of one of the pages detailing pipeline configuration
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files

There are several example pipelines constructed with code and/or plain text files in the ctakes-examples and ctakes-examples-res modules.  You can look at the different "Hello World" examples.

Since you are playing with maven, you can run the profile "runPiperGui".
mvn clean compile -DskipTests -PrunPiperGui

Sean


________________________________________
From: Maral Amir <ma...@gmail.com>
Sent: Thursday, July 18, 2019 2:29 PM
To: dev@ctakes.apache.org
Subject: cTAKES Pipeline [EXTERNAL]

Hi,

I just build my developer version of cTAKES with the help of wonderful
cTAKES developers.

For my next step, I would appreciate if somebody direct me to a right path.
I am planning to process text clinical documents through the entire
pipeline to generate xml output. I see the website suggest walking through
the Default Clinical Pipeline. I understand there are also multiple git
repositories on developed command line tool based Apache cTAKES.
My final goal is to integrate cTAKES with some Python packages( OCR, etc.)
into one pipeline and have some form of web service at the end. I would
deeply appreciate any suggestions.

Thanks,
Maral

Re: cTAKES Pipeline [EXTERNAL]

Posted by Maral Amir <ma...@gmail.com>.

Hi Sean,

Thank you for your reply. Your response to Siamak was very helpful and it
answered many of my questions. I appreciate your kind help.

Best,
Maral

On Mon, Jul 22, 2019 at 7:25 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Siamak,
>
> Just to clarify, $CTAKES_HOME isn't required by ctakes itself.  It is in
> the bin/ scripts just to make the java command at the end of the script
> more concise.
> The script attempts to set it to the directory in which you installed a
> binary installation of ctakes.
> The $CTAKES_HOME directory should contain the ctakes directories bin/ lib/
> resources/ ...
>
> Also, please be aware that those bin/ scripts are not meant to be used
> with a developer installation.
> The scripts are meant to be used with a built binary installation of
> ctakes - one with bin/ lib/ resources/ ... subdirectories all in one
> [ctakes home, ctakes root] directory.
> A developer [root] directory has subdirectories  ctakes-core/
> ctakes-core-res/ ctakes-clinical-pipeline/ ctakes-clinical-pipeline-res/ ...
>
> Can I assume, since you are writing java code (using PipelineBuilder),
> that you have some experience with java?
> You can put PipelineBuilder in any main(..) method and then start that
> main(..) from a command line just as you would any other java program.
> Just like any other java program, you need to have your $CLASSPATH set
> correctly and, for memory use, increase your maximum memory with -Xmx .
> These are VM options.
>
> The same goes for running the org.apache.uima.tools.cvd.CVD .  It is just
> another regular old java class with a main(..) method.
>
> To use the CVD to inspect output you just need to make sure that you
> produce XMI files.  With the PipelineBuilder this is really easy.
> After adding the AEs to the pipeline, just put .writeXmis(
> myOutputDirectory ) at the end of your builder.
>
> new PipelineBuilder()
> .readFiles( myInDir )
> .add( SimpleSegmentAnnotator.class )     for instance
> .add( etc. )
> .writeXmis( myOutDir )
> .run();
>
> Or some variation of that.  See the ctakes-examples module for more, for
> instance org.apache.ctakes.examples.pipeline    HelloWorldBuilderRunner
>
> -- If you only need to do anything special with the actual pipeline (e.g.
> special ae interfaces, uima control), after it has been built, then I
> recommend that you use the PiperFileReader and a piper file.  A piper file
> helps make the pipeline more transparent (outside the code), and allows
> modification of the pipeline without recompiling (and reinstalling).
> Then you can get the PipelineBuilder from the PiperFileReader.
> See org.apache.ctakes.examples.pipeline    HelloWorldPiperRunner
>
> PiperFileReader reader = new PiperFileReader( myPiperFilePath );
> PipelineBuilder builder = reader.getBuilder();
>
> --- If you don't need to do anything special with the pipeline then I
> recommend that you just use the PiperFileRunner, which does everything for
> you.
>
> java [VM options] org.apache.ctakes.core.pipeline.PiperFileRunner -p
> myPiperFilePath -i myInDir --writeXmis myOutDir
>
> One final note:  CVD is not a ctakes tool and ctakes itself does not
> contain CVD code.  The CVD is an Apache UIMA product, and help for can be
> found online.
> https://uima.apache.org/d/uimaj-current/tools.html
>
> runctakesCVD is a convenience script, and its name may be a little
> misleading.  Something like runUimaCvd might cause less confusion; let us
> know your thoughts on changing the name.  For instance, would a different
> name make looking for help faster or easier?
>
> Sean
>
> ________________________________________
> From: Siamak Barzegar <ba...@gmail.com>
> Sent: Monday, July 22, 2019 4:38 AM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> I have the same question (I want to run runctakesCVD or ..CPE on my
> modified codes, descriptors - not using an IDE, so what I should set my
> CTAKES_HOME variable into?). I do not use the Piper Gui. I am
> using PipelineBuilder on the source code.
> It is important to see the results by runctakesCVD as well. But what I
> should set my CTAKES_HOME variable into in runctakesCVD.sh file?
> java -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cvd.CVD "$@"
>
> With Best Wishes,
> Siamak
>
> On Fri, 19 Jul 2019 at 20:12, Maral Amir <ma...@gmail.com> wrote:
>
> > Hi Sean,
> >
> > Thank you so much for your insightful response.
> > I'm having a problem linking the piper files. I should mention I am using
> > command line interface. Could you please kindly let me know:
> >
> > 1. What I should set my CTAKES_HOME variable into. Right now I set my
> > CTAKES_HOME to my cTAKES user installation main folder. That is because I
> > could see in the last line of the runPiperFile.sh, the class directory
> > $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> > developer's version.
> >
> > java -cp
> >
> >
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> > -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> > org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
> >
> >
> > Also,
> >
> > 2. Where is the *"bin"* folder where the bash file resides. Right now I
> use
> > this one :
> > /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
> >
> >
> > Thanks,
> > Maral
> >
> > On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Maral,
> > >
> > > You can generate different output types by adding different writers to
> > the
> > > end of the pipeline.
> > > Here are the contents of the Default Clinical Pipeline piper file:
> > >
> > >
> > >
> >
> ========================================================================================
> > > // Commands and parameters to create a default plaintext document
> > > processing pipeline with UMLS lookup
> > >
> > > // Load a simple token processing pipeline from another pipeline file
> > > load DefaultTokenizerPipeline
> > >
> > > // Add non-core annotators
> > > add ContextDependentTokenizerAnnotator
> > > addDescription POSTagger
> > >
> > > // Add Chunkers
> > > load ChunkerSubPipe
> > >
> > > // Default fast dictionary lookup
> > > load DictionarySubPipe
> > >
> > > // Add Cleartk Entity Attribute annotators
> > > load AttributeCleartkSubPipe
> > >
> > >
> >
> ========================================================================================
> > >
> > >
> > > I recommend that you copy those lines to a new file (for instance,
> > > Maral.piper) and then add the following lines:
> > >
> > >
> > >
> >
> ========================================================================================
> > > // Write marked copy of note text in interactive html files
> > > add pretty.html.HtmlTextWriter SubDirectory=HTML
> > >
> > > // Write Fast Health Interoperability Resources (FHIR) json files.
> > > fhir.org
> > > package org.apache.ctakes.fhir.cc
> > > add FhirJsonFileWriter SubDirectory=FHIR
> > >
> > > // Write plaintext copy of note text with cui, semantic group, POS.
> > > Relations are listsed.
> > > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> > >
> > > // Write plaintext copy of note sentences with entity and relation
> > > disveries listed.
> > > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> > >
> > >
> >
> ========================================================================================
> > >
> > >
> > > The output directory should then contain some new output in different
> > > subdirectories.  You can change the subdirectory names.
> > >
> > > Note: the "=================================" are just there to
> indicate
> > > what is for the file.  Do not copy them.
> > >
> > > There are many more file writers, most of which write simple lists of
> > > discoveries in one form or another.
> > > I recommend trying the 4 above and see if any fit your purposes before
> > > moving on to more specialized writers.
> > >
> > > Sean
> > >
> > > ________________________________________
> > > From: Maral Amir <ma...@gmail.com>
> > > Sent: Thursday, July 18, 2019 7:11 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi Sean,
> > >
> > > Thank you so much for your very helpful and comprehensive response. I
> was
> > > able to generate the xmi results in the output directory and used UIMA
> > Cas
> > > Visual Debugger (CVD) as suggested to view the information. I have two
> > > questions:
> > > 1. What is the best reference for me to study and understand the
> > > annotations.
> > > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > > readable format without the help of CVD.
> > >
> > > Thanks,
> > > Maral
> > >
> > >
> > > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > > Hi Maral,
> > > >
> > > > This might be what you are talking about with respect to the Default
> > > > Clinical Pipeline
> > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > > >
> > > > That lists a command line method for running a set of files and
> getting
> > > > xml output.
> > > >
> > > > The default clinical pipeline configuration is actually contained in
> > the
> > > > plain text (piper) file
> > > >
> resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > > >
> > > > If you are looking at source code then the file is
> > > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > > >
> > > > You can also select and run a piper file with a gui
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > > >
> > > > Both methods are mentioned near the bottom of one of the pages
> > detailing
> > > > pipeline configuration
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > > >
> > > > There are several example pipelines constructed with code and/or
> plain
> > > > text files in the ctakes-examples and ctakes-examples-res modules.
> You
> > > can
> > > > look at the different "Hello World" examples.
> > > >
> > > > Since you are playing with maven, you can run the profile
> > "runPiperGui".
> > > > mvn clean compile -DskipTests -PrunPiperGui
> > > >
> > > > Sean
> > > >
> > > >
> > > > ________________________________________
> > > > From: Maral Amir <ma...@gmail.com>
> > > > Sent: Thursday, July 18, 2019 2:29 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: cTAKES Pipeline [EXTERNAL]
> > > >
> > > > Hi,
> > > >
> > > > I just build my developer version of cTAKES with the help of
> wonderful
> > > > cTAKES developers.
> > > >
> > > > For my next step, I would appreciate if somebody direct me to a right
> > > path.
> > > > I am planning to process text clinical documents through the entire
> > > > pipeline to generate xml output. I see the website suggest walking
> > > through
> > > > the Default Clinical Pipeline. I understand there are also multiple
> git
> > > > repositories on developed command line tool based Apache cTAKES.
> > > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > > etc.)
> > > > into one pipeline and have some form of web service at the end. I
> would
> > > > deeply appreciate any suggestions.
> > > >
> > > > Thanks,
> > > > Maral
> > > >
> > >
> >
>
>
> --
> Siamak Barzegar, PhD.
> Senior Research Engineer.
> Biomedical Text Mining Unit.
> Barcelona Supercomputing Centre
>

Re: cTAKES Pipeline [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Siamak,

Just to clarify, $CTAKES_HOME isn't required by ctakes itself.  It is in the bin/ scripts just to make the java command at the end of the script more concise.  
The script attempts to set it to the directory in which you installed a binary installation of ctakes.  
The $CTAKES_HOME directory should contain the ctakes directories bin/ lib/ resources/ ...

Also, please be aware that those bin/ scripts are not meant to be used with a developer installation.  
The scripts are meant to be used with a built binary installation of ctakes - one with bin/ lib/ resources/ ... subdirectories all in one [ctakes home, ctakes root] directory.   
A developer [root] directory has subdirectories  ctakes-core/ ctakes-core-res/ ctakes-clinical-pipeline/ ctakes-clinical-pipeline-res/ ...

Can I assume, since you are writing java code (using PipelineBuilder), that you have some experience with java?
You can put PipelineBuilder in any main(..) method and then start that main(..) from a command line just as you would any other java program.  Just like any other java program, you need to have your $CLASSPATH set correctly and, for memory use, increase your maximum memory with -Xmx .  These are VM options.

The same goes for running the org.apache.uima.tools.cvd.CVD .  It is just another regular old java class with a main(..) method.

To use the CVD to inspect output you just need to make sure that you produce XMI files.  With the PipelineBuilder this is really easy.
After adding the AEs to the pipeline, just put .writeXmis( myOutputDirectory ) at the end of your builder.

new PipelineBuilder()
.readFiles( myInDir )
.add( SimpleSegmentAnnotator.class )     for instance
.add( etc. )
.writeXmis( myOutDir )
.run();

Or some variation of that.  See the ctakes-examples module for more, for instance org.apache.ctakes.examples.pipeline    HelloWorldBuilderRunner

-- If you only need to do anything special with the actual pipeline (e.g. special ae interfaces, uima control), after it has been built, then I recommend that you use the PiperFileReader and a piper file.  A piper file helps make the pipeline more transparent (outside the code), and allows modification of the pipeline without recompiling (and reinstalling).
Then you can get the PipelineBuilder from the PiperFileReader.
See org.apache.ctakes.examples.pipeline    HelloWorldPiperRunner

PiperFileReader reader = new PiperFileReader( myPiperFilePath );
PipelineBuilder builder = reader.getBuilder();

--- If you don't need to do anything special with the pipeline then I recommend that you just use the PiperFileRunner, which does everything for you.

java [VM options] org.apache.ctakes.core.pipeline.PiperFileRunner -p myPiperFilePath -i myInDir --writeXmis myOutDir

One final note:  CVD is not a ctakes tool and ctakes itself does not contain CVD code.  The CVD is an Apache UIMA product, and help for can be found online.  
https://uima.apache.org/d/uimaj-current/tools.html

runctakesCVD is a convenience script, and its name may be a little misleading.  Something like runUimaCvd might cause less confusion; let us know your thoughts on changing the name.  For instance, would a different name make looking for help faster or easier?

Sean

________________________________________
From: Siamak Barzegar <ba...@gmail.com>
Sent: Monday, July 22, 2019 4:38 AM
To: dev@ctakes.apache.org
Subject: Re: cTAKES Pipeline [EXTERNAL]

Hi Sean,

I have the same question (I want to run runctakesCVD or ..CPE on my
modified codes, descriptors - not using an IDE, so what I should set my
CTAKES_HOME variable into?). I do not use the Piper Gui. I am
using PipelineBuilder on the source code.
It is important to see the results by runctakesCVD as well. But what I
should set my CTAKES_HOME variable into in runctakesCVD.sh file?
java -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/*
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cvd.CVD "$@"

With Best Wishes,
Siamak

On Fri, 19 Jul 2019 at 20:12, Maral Amir <ma...@gmail.com> wrote:

> Hi Sean,
>
> Thank you so much for your insightful response.
> I'm having a problem linking the piper files. I should mention I am using
> command line interface. Could you please kindly let me know:
>
> 1. What I should set my CTAKES_HOME variable into. Right now I set my
> CTAKES_HOME to my cTAKES user installation main folder. That is because I
> could see in the last line of the runPiperFile.sh, the class directory
> $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> developer's version.
>
> java -cp
>
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
>
>
> Also,
>
> 2. Where is the *"bin"* folder where the bash file resides. Right now I use
> this one :
> /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
>
>
> Thanks,
> Maral
>
> On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Maral,
> >
> > You can generate different output types by adding different writers to
> the
> > end of the pipeline.
> > Here are the contents of the Default Clinical Pipeline piper file:
> >
> >
> >
> ========================================================================================
> > // Commands and parameters to create a default plaintext document
> > processing pipeline with UMLS lookup
> >
> > // Load a simple token processing pipeline from another pipeline file
> > load DefaultTokenizerPipeline
> >
> > // Add non-core annotators
> > add ContextDependentTokenizerAnnotator
> > addDescription POSTagger
> >
> > // Add Chunkers
> > load ChunkerSubPipe
> >
> > // Default fast dictionary lookup
> > load DictionarySubPipe
> >
> > // Add Cleartk Entity Attribute annotators
> > load AttributeCleartkSubPipe
> >
> >
> ========================================================================================
> >
> >
> > I recommend that you copy those lines to a new file (for instance,
> > Maral.piper) and then add the following lines:
> >
> >
> >
> ========================================================================================
> > // Write marked copy of note text in interactive html files
> > add pretty.html.HtmlTextWriter SubDirectory=HTML
> >
> > // Write Fast Health Interoperability Resources (FHIR) json files.
> > fhir.org
> > package org.apache.ctakes.fhir.cc
> > add FhirJsonFileWriter SubDirectory=FHIR
> >
> > // Write plaintext copy of note text with cui, semantic group, POS.
> > Relations are listsed.
> > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> >
> > // Write plaintext copy of note sentences with entity and relation
> > disveries listed.
> > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> >
> >
> ========================================================================================
> >
> >
> > The output directory should then contain some new output in different
> > subdirectories.  You can change the subdirectory names.
> >
> > Note: the "=================================" are just there to indicate
> > what is for the file.  Do not copy them.
> >
> > There are many more file writers, most of which write simple lists of
> > discoveries in one form or another.
> > I recommend trying the 4 above and see if any fit your purposes before
> > moving on to more specialized writers.
> >
> > Sean
> >
> > ________________________________________
> > From: Maral Amir <ma...@gmail.com>
> > Sent: Thursday, July 18, 2019 7:11 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: cTAKES Pipeline [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thank you so much for your very helpful and comprehensive response. I was
> > able to generate the xmi results in the output directory and used UIMA
> Cas
> > Visual Debugger (CVD) as suggested to view the information. I have two
> > questions:
> > 1. What is the best reference for me to study and understand the
> > annotations.
> > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > readable format without the help of CVD.
> >
> > Thanks,
> > Maral
> >
> >
> > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Maral,
> > >
> > > This might be what you are talking about with respect to the Default
> > > Clinical Pipeline
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > >
> > > That lists a command line method for running a set of files and getting
> > > xml output.
> > >
> > > The default clinical pipeline configuration is actually contained in
> the
> > > plain text (piper) file
> > > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > >
> > > If you are looking at source code then the file is
> > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > >
> > > You can also select and run a piper file with a gui
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > >
> > > Both methods are mentioned near the bottom of one of the pages
> detailing
> > > pipeline configuration
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > >
> > > There are several example pipelines constructed with code and/or plain
> > > text files in the ctakes-examples and ctakes-examples-res modules.  You
> > can
> > > look at the different "Hello World" examples.
> > >
> > > Since you are playing with maven, you can run the profile
> "runPiperGui".
> > > mvn clean compile -DskipTests -PrunPiperGui
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Maral Amir <ma...@gmail.com>
> > > Sent: Thursday, July 18, 2019 2:29 PM
> > > To: dev@ctakes.apache.org
> > > Subject: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi,
> > >
> > > I just build my developer version of cTAKES with the help of wonderful
> > > cTAKES developers.
> > >
> > > For my next step, I would appreciate if somebody direct me to a right
> > path.
> > > I am planning to process text clinical documents through the entire
> > > pipeline to generate xml output. I see the website suggest walking
> > through
> > > the Default Clinical Pipeline. I understand there are also multiple git
> > > repositories on developed command line tool based Apache cTAKES.
> > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > etc.)
> > > into one pipeline and have some form of web service at the end. I would
> > > deeply appreciate any suggestions.
> > >
> > > Thanks,
> > > Maral
> > >
> >
>


--
Siamak Barzegar, PhD.
Senior Research Engineer.
Biomedical Text Mining Unit.
Barcelona Supercomputing Centre

Re: cTAKES Pipeline [EXTERNAL]

Posted by Siamak Barzegar <ba...@gmail.com>.

Hi Sean,

I have the same question (I want to run runctakesCVD or ..CPE on my
modified codes, descriptors - not using an IDE, so what I should set my
CTAKES_HOME variable into?). I do not use the Piper Gui. I am
using PipelineBuilder on the source code.
It is important to see the results by runctakesCVD as well. But what I
should set my CTAKES_HOME variable into in runctakesCVD.sh file?
java -cp $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/lib/*
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cvd.CVD "$@"

With Best Wishes,
Siamak

On Fri, 19 Jul 2019 at 20:12, Maral Amir <ma...@gmail.com> wrote:

> Hi Sean,
>
> Thank you so much for your insightful response.
> I'm having a problem linking the piper files. I should mention I am using
> command line interface. Could you please kindly let me know:
>
> 1. What I should set my CTAKES_HOME variable into. Right now I set my
> CTAKES_HOME to my cTAKES user installation main folder. That is because I
> could see in the last line of the runPiperFile.sh, the class directory
> $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> developer's version.
>
> java -cp
>
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
>
>
> Also,
>
> 2. Where is the *"bin"* folder where the bash file resides. Right now I use
> this one :
> /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
>
>
> Thanks,
> Maral
>
> On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Maral,
> >
> > You can generate different output types by adding different writers to
> the
> > end of the pipeline.
> > Here are the contents of the Default Clinical Pipeline piper file:
> >
> >
> >
> ========================================================================================
> > // Commands and parameters to create a default plaintext document
> > processing pipeline with UMLS lookup
> >
> > // Load a simple token processing pipeline from another pipeline file
> > load DefaultTokenizerPipeline
> >
> > // Add non-core annotators
> > add ContextDependentTokenizerAnnotator
> > addDescription POSTagger
> >
> > // Add Chunkers
> > load ChunkerSubPipe
> >
> > // Default fast dictionary lookup
> > load DictionarySubPipe
> >
> > // Add Cleartk Entity Attribute annotators
> > load AttributeCleartkSubPipe
> >
> >
> ========================================================================================
> >
> >
> > I recommend that you copy those lines to a new file (for instance,
> > Maral.piper) and then add the following lines:
> >
> >
> >
> ========================================================================================
> > // Write marked copy of note text in interactive html files
> > add pretty.html.HtmlTextWriter SubDirectory=HTML
> >
> > // Write Fast Health Interoperability Resources (FHIR) json files.
> > fhir.org
> > package org.apache.ctakes.fhir.cc
> > add FhirJsonFileWriter SubDirectory=FHIR
> >
> > // Write plaintext copy of note text with cui, semantic group, POS.
> > Relations are listsed.
> > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> >
> > // Write plaintext copy of note sentences with entity and relation
> > disveries listed.
> > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> >
> >
> ========================================================================================
> >
> >
> > The output directory should then contain some new output in different
> > subdirectories.  You can change the subdirectory names.
> >
> > Note: the "=================================" are just there to indicate
> > what is for the file.  Do not copy them.
> >
> > There are many more file writers, most of which write simple lists of
> > discoveries in one form or another.
> > I recommend trying the 4 above and see if any fit your purposes before
> > moving on to more specialized writers.
> >
> > Sean
> >
> > ________________________________________
> > From: Maral Amir <ma...@gmail.com>
> > Sent: Thursday, July 18, 2019 7:11 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: cTAKES Pipeline [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thank you so much for your very helpful and comprehensive response. I was
> > able to generate the xmi results in the output directory and used UIMA
> Cas
> > Visual Debugger (CVD) as suggested to view the information. I have two
> > questions:
> > 1. What is the best reference for me to study and understand the
> > annotations.
> > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > readable format without the help of CVD.
> >
> > Thanks,
> > Maral
> >
> >
> > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Maral,
> > >
> > > This might be what you are talking about with respect to the Default
> > > Clinical Pipeline
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > >
> > > That lists a command line method for running a set of files and getting
> > > xml output.
> > >
> > > The default clinical pipeline configuration is actually contained in
> the
> > > plain text (piper) file
> > > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > >
> > > If you are looking at source code then the file is
> > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > >
> > > You can also select and run a piper file with a gui
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > >
> > > Both methods are mentioned near the bottom of one of the pages
> detailing
> > > pipeline configuration
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > >
> > > There are several example pipelines constructed with code and/or plain
> > > text files in the ctakes-examples and ctakes-examples-res modules.  You
> > can
> > > look at the different "Hello World" examples.
> > >
> > > Since you are playing with maven, you can run the profile
> "runPiperGui".
> > > mvn clean compile -DskipTests -PrunPiperGui
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Maral Amir <ma...@gmail.com>
> > > Sent: Thursday, July 18, 2019 2:29 PM
> > > To: dev@ctakes.apache.org
> > > Subject: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi,
> > >
> > > I just build my developer version of cTAKES with the help of wonderful
> > > cTAKES developers.
> > >
> > > For my next step, I would appreciate if somebody direct me to a right
> > path.
> > > I am planning to process text clinical documents through the entire
> > > pipeline to generate xml output. I see the website suggest walking
> > through
> > > the Default Clinical Pipeline. I understand there are also multiple git
> > > repositories on developed command line tool based Apache cTAKES.
> > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > etc.)
> > > into one pipeline and have some form of web service at the end. I would
> > > deeply appreciate any suggestions.
> > >
> > > Thanks,
> > > Maral
> > >
> >
>


-- 
Siamak Barzegar, PhD.
Senior Research Engineer.
Biomedical Text Mining Unit.
Barcelona Supercomputing Centre

Re: cTAKES Pipeline [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Maral,

> Are you using an IDE (Integrated Development Environment) such as IntelliJ
> or Eclipse?
> If so then you should be able to create a run profile that can run
> pipelines.  There is plenty of help online for that kind of thing, and
> people on the mailing list can probably provide examples of what they have
> used for ctakes.
>

Sean
________________________________________
From: Maral Amir <ma...@gmail.com>
Sent: Friday, July 19, 2019 4:52 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES Pipeline [EXTERNAL]

Hi Sean,

Thank you very much for your kind and prompt reply. I used build profile
"runPiperGui"and it worked beautifully on my custom piper file. I
appreciate it if you kindly direct me to next steps on more run methods. As
I mentioned earlier, my final goal is to develop a OCR+NLP web service.

Thanks,
Maral

On Fri, Jul 19, 2019 at 12:08 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maral,
>
> There are two slightly different directory structures.  One for
> development (source structure), another for end use (installation
> structure).
>
> Since you have a copy of the source, lets start with that.
>
> Are you using an IDE (Integrated Development Environment) such as IntelliJ
> or Eclipse?
> If so then you should be able to create a run profile that can run
> pipelines.  There is plenty of help online for that kind of thing, and
> people on the mailing list can probably provide examples of what they have
> used for ctakes.
>
> If you are not using an ide, I suggest using the -PrunPiperGui maven
> profile that I mentioned below - just to run a test pipeline and see your
> output.  After you have a successful run then we can move on to other run
> methods.
>
> When you use an ide run profile or a maven profile you don't need to
> specify $CTAKES_HOME or worry about the classpath or bin/.
>
> Sean
>
> ________________________________________
> From: Maral Amir <ma...@gmail.com>
> Sent: Friday, July 19, 2019 2:12 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> Thank you so much for your insightful response.
> I'm having a problem linking the piper files. I should mention I am using
> command line interface. Could you please kindly let me know:
>
> 1. What I should set my CTAKES_HOME variable into. Right now I set my
> CTAKES_HOME to my cTAKES user installation main folder. That is because I
> could see in the last line of the runPiperFile.sh, the class directory
> $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> developer's version.
>
> java -cp
>
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
>
>
> Also,
>
> 2. Where is the *"bin"* folder where the bash file resides. Right now I use
> this one :
> /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
>
>
> Thanks,
> Maral
>
> On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Maral,
> >
> > You can generate different output types by adding different writers to
> the
> > end of the pipeline.
> > Here are the contents of the Default Clinical Pipeline piper file:
> >
> >
> >
> ========================================================================================
> > // Commands and parameters to create a default plaintext document
> > processing pipeline with UMLS lookup
> >
> > // Load a simple token processing pipeline from another pipeline file
> > load DefaultTokenizerPipeline
> >
> > // Add non-core annotators
> > add ContextDependentTokenizerAnnotator
> > addDescription POSTagger
> >
> > // Add Chunkers
> > load ChunkerSubPipe
> >
> > // Default fast dictionary lookup
> > load DictionarySubPipe
> >
> > // Add Cleartk Entity Attribute annotators
> > load AttributeCleartkSubPipe
> >
> >
> ========================================================================================
> >
> >
> > I recommend that you copy those lines to a new file (for instance,
> > Maral.piper) and then add the following lines:
> >
> >
> >
> ========================================================================================
> > // Write marked copy of note text in interactive html files
> > add pretty.html.HtmlTextWriter SubDirectory=HTML
> >
> > // Write Fast Health Interoperability Resources (FHIR) json files.
> > fhir.org
> > package org.apache.ctakes.fhir.cc
> > add FhirJsonFileWriter SubDirectory=FHIR
> >
> > // Write plaintext copy of note text with cui, semantic group, POS.
> > Relations are listsed.
> > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> >
> > // Write plaintext copy of note sentences with entity and relation
> > disveries listed.
> > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> >
> >
> ========================================================================================
> >
> >
> > The output directory should then contain some new output in different
> > subdirectories.  You can change the subdirectory names.
> >
> > Note: the "=================================" are just there to indicate
> > what is for the file.  Do not copy them.
> >
> > There are many more file writers, most of which write simple lists of
> > discoveries in one form or another.
> > I recommend trying the 4 above and see if any fit your purposes before
> > moving on to more specialized writers.
> >
> > Sean
> >
> > ________________________________________
> > From: Maral Amir <ma...@gmail.com>
> > Sent: Thursday, July 18, 2019 7:11 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: cTAKES Pipeline [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thank you so much for your very helpful and comprehensive response. I was
> > able to generate the xmi results in the output directory and used UIMA
> Cas
> > Visual Debugger (CVD) as suggested to view the information. I have two
> > questions:
> > 1. What is the best reference for me to study and understand the
> > annotations.
> > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > readable format without the help of CVD.
> >
> > Thanks,
> > Maral
> >
> >
> > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Maral,
> > >
> > > This might be what you are talking about with respect to the Default
> > > Clinical Pipeline
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > >
> > > That lists a command line method for running a set of files and getting
> > > xml output.
> > >
> > > The default clinical pipeline configuration is actually contained in
> the
> > > plain text (piper) file
> > > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > >
> > > If you are looking at source code then the file is
> > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > >
> > > You can also select and run a piper file with a gui
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > >
> > > Both methods are mentioned near the bottom of one of the pages
> detailing
> > > pipeline configuration
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > >
> > > There are several example pipelines constructed with code and/or plain
> > > text files in the ctakes-examples and ctakes-examples-res modules.  You
> > can
> > > look at the different "Hello World" examples.
> > >
> > > Since you are playing with maven, you can run the profile
> "runPiperGui".
> > > mvn clean compile -DskipTests -PrunPiperGui
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Maral Amir <ma...@gmail.com>
> > > Sent: Thursday, July 18, 2019 2:29 PM
> > > To: dev@ctakes.apache.org
> > > Subject: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi,
> > >
> > > I just build my developer version of cTAKES with the help of wonderful
> > > cTAKES developers.
> > >
> > > For my next step, I would appreciate if somebody direct me to a right
> > path.
> > > I am planning to process text clinical documents through the entire
> > > pipeline to generate xml output. I see the website suggest walking
> > through
> > > the Default Clinical Pipeline. I understand there are also multiple git
> > > repositories on developed command line tool based Apache cTAKES.
> > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > etc.)
> > > into one pipeline and have some form of web service at the end. I would
> > > deeply appreciate any suggestions.
> > >
> > > Thanks,
> > > Maral
> > >
> >
>

Re: cTAKES Pipeline [EXTERNAL]

Posted by Maral Amir <ma...@gmail.com>.

Hi Sean,

Thank you very much for your kind and prompt reply. I used build profile
"runPiperGui"and it worked beautifully on my custom piper file. I
appreciate it if you kindly direct me to next steps on more run methods. As
I mentioned earlier, my final goal is to develop a OCR+NLP web service.

Thanks,
Maral

On Fri, Jul 19, 2019 at 12:08 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maral,
>
> There are two slightly different directory structures.  One for
> development (source structure), another for end use (installation
> structure).
>
> Since you have a copy of the source, lets start with that.
>
> Are you using an IDE (Integrated Development Environment) such as IntelliJ
> or Eclipse?
> If so then you should be able to create a run profile that can run
> pipelines.  There is plenty of help online for that kind of thing, and
> people on the mailing list can probably provide examples of what they have
> used for ctakes.
>
> If you are not using an ide, I suggest using the -PrunPiperGui maven
> profile that I mentioned below - just to run a test pipeline and see your
> output.  After you have a successful run then we can move on to other run
> methods.
>
> When you use an ide run profile or a maven profile you don't need to
> specify $CTAKES_HOME or worry about the classpath or bin/.
>
> Sean
>
> ________________________________________
> From: Maral Amir <ma...@gmail.com>
> Sent: Friday, July 19, 2019 2:12 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> Thank you so much for your insightful response.
> I'm having a problem linking the piper files. I should mention I am using
> command line interface. Could you please kindly let me know:
>
> 1. What I should set my CTAKES_HOME variable into. Right now I set my
> CTAKES_HOME to my cTAKES user installation main folder. That is because I
> could see in the last line of the runPiperFile.sh, the class directory
> $CTAKES_HOME/lib/* is included and no /lib folder is present in the
> developer's version.
>
> java -cp
>
> $CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
> org.apache.ctakes.core.pipeline.PiperFileRunner "$@"
>
>
> Also,
>
> 2. Where is the *"bin"* folder where the bash file resides. Right now I use
> this one :
> /Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin
>
>
> Thanks,
> Maral
>
> On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Maral,
> >
> > You can generate different output types by adding different writers to
> the
> > end of the pipeline.
> > Here are the contents of the Default Clinical Pipeline piper file:
> >
> >
> >
> ========================================================================================
> > // Commands and parameters to create a default plaintext document
> > processing pipeline with UMLS lookup
> >
> > // Load a simple token processing pipeline from another pipeline file
> > load DefaultTokenizerPipeline
> >
> > // Add non-core annotators
> > add ContextDependentTokenizerAnnotator
> > addDescription POSTagger
> >
> > // Add Chunkers
> > load ChunkerSubPipe
> >
> > // Default fast dictionary lookup
> > load DictionarySubPipe
> >
> > // Add Cleartk Entity Attribute annotators
> > load AttributeCleartkSubPipe
> >
> >
> ========================================================================================
> >
> >
> > I recommend that you copy those lines to a new file (for instance,
> > Maral.piper) and then add the following lines:
> >
> >
> >
> ========================================================================================
> > // Write marked copy of note text in interactive html files
> > add pretty.html.HtmlTextWriter SubDirectory=HTML
> >
> > // Write Fast Health Interoperability Resources (FHIR) json files.
> > fhir.org
> > package org.apache.ctakes.fhir.cc
> > add FhirJsonFileWriter SubDirectory=FHIR
> >
> > // Write plaintext copy of note text with cui, semantic group, POS.
> > Relations are listsed.
> > add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
> >
> > // Write plaintext copy of note sentences with entity and relation
> > disveries listed.
> > add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
> >
> >
> ========================================================================================
> >
> >
> > The output directory should then contain some new output in different
> > subdirectories.  You can change the subdirectory names.
> >
> > Note: the "=================================" are just there to indicate
> > what is for the file.  Do not copy them.
> >
> > There are many more file writers, most of which write simple lists of
> > discoveries in one form or another.
> > I recommend trying the 4 above and see if any fit your purposes before
> > moving on to more specialized writers.
> >
> > Sean
> >
> > ________________________________________
> > From: Maral Amir <ma...@gmail.com>
> > Sent: Thursday, July 18, 2019 7:11 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: cTAKES Pipeline [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thank you so much for your very helpful and comprehensive response. I was
> > able to generate the xmi results in the output directory and used UIMA
> Cas
> > Visual Debugger (CVD) as suggested to view the information. I have two
> > questions:
> > 1. What is the best reference for me to study and understand the
> > annotations.
> > 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> > readable format without the help of CVD.
> >
> > Thanks,
> > Maral
> >
> >
> > On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Maral,
> > >
> > > This might be what you are talking about with respect to the Default
> > > Clinical Pipeline
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> > >
> > > That lists a command line method for running a set of files and getting
> > > xml output.
> > >
> > > The default clinical pipeline configuration is actually contained in
> the
> > > plain text (piper) file
> > > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> > >
> > > If you are looking at source code then the file is
> > > ctakes-clinical-pipeline-res/src/main/resources/ ...
> > >
> > > You can also select and run a piper file with a gui
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> > >
> > > Both methods are mentioned near the bottom of one of the pages
> detailing
> > > pipeline configuration
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> > >
> > > There are several example pipelines constructed with code and/or plain
> > > text files in the ctakes-examples and ctakes-examples-res modules.  You
> > can
> > > look at the different "Hello World" examples.
> > >
> > > Since you are playing with maven, you can run the profile
> "runPiperGui".
> > > mvn clean compile -DskipTests -PrunPiperGui
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Maral Amir <ma...@gmail.com>
> > > Sent: Thursday, July 18, 2019 2:29 PM
> > > To: dev@ctakes.apache.org
> > > Subject: cTAKES Pipeline [EXTERNAL]
> > >
> > > Hi,
> > >
> > > I just build my developer version of cTAKES with the help of wonderful
> > > cTAKES developers.
> > >
> > > For my next step, I would appreciate if somebody direct me to a right
> > path.
> > > I am planning to process text clinical documents through the entire
> > > pipeline to generate xml output. I see the website suggest walking
> > through
> > > the Default Clinical Pipeline. I understand there are also multiple git
> > > repositories on developed command line tool based Apache cTAKES.
> > > My final goal is to integrate cTAKES with some Python packages( OCR,
> > etc.)
> > > into one pipeline and have some form of web service at the end. I would
> > > deeply appreciate any suggestions.
> > >
> > > Thanks,
> > > Maral
> > >
> >
>

Re: cTAKES Pipeline [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Maral,

There are two slightly different directory structures.  One for  development (source structure), another for end use (installation structure).

Since you have a copy of the source, lets start with that.  

Are you using an IDE (Integrated Development Environment) such as IntelliJ or Eclipse? 
If so then you should be able to create a run profile that can run pipelines.  There is plenty of help online for that kind of thing, and people on the mailing list can probably provide examples of what they have used for ctakes.

If you are not using an ide, I suggest using the -PrunPiperGui maven profile that I mentioned below - just to run a test pipeline and see your output.  After you have a successful run then we can move on to other run methods.

When you use an ide run profile or a maven profile you don't need to specify $CTAKES_HOME or worry about the classpath or bin/.

Sean

________________________________________
From: Maral Amir <ma...@gmail.com>
Sent: Friday, July 19, 2019 2:12 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES Pipeline [EXTERNAL]

Hi Sean,

Thank you so much for your insightful response.
I'm having a problem linking the piper files. I should mention I am using
command line interface. Could you please kindly let me know:

1. What I should set my CTAKES_HOME variable into. Right now I set my
CTAKES_HOME to my cTAKES user installation main folder. That is because I
could see in the last line of the runPiperFile.sh, the class directory
$CTAKES_HOME/lib/* is included and no /lib folder is present in the
developer's version.

java -cp
$CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
org.apache.ctakes.core.pipeline.PiperFileRunner "$@"


Also,

2. Where is the *"bin"* folder where the bash file resides. Right now I use
this one :
/Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin


Thanks,
Maral

On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maral,
>
> You can generate different output types by adding different writers to the
> end of the pipeline.
> Here are the contents of the Default Clinical Pipeline piper file:
>
>
> ========================================================================================
> // Commands and parameters to create a default plaintext document
> processing pipeline with UMLS lookup
>
> // Load a simple token processing pipeline from another pipeline file
> load DefaultTokenizerPipeline
>
> // Add non-core annotators
> add ContextDependentTokenizerAnnotator
> addDescription POSTagger
>
> // Add Chunkers
> load ChunkerSubPipe
>
> // Default fast dictionary lookup
> load DictionarySubPipe
>
> // Add Cleartk Entity Attribute annotators
> load AttributeCleartkSubPipe
>
> ========================================================================================
>
>
> I recommend that you copy those lines to a new file (for instance,
> Maral.piper) and then add the following lines:
>
>
> ========================================================================================
> // Write marked copy of note text in interactive html files
> add pretty.html.HtmlTextWriter SubDirectory=HTML
>
> // Write Fast Health Interoperability Resources (FHIR) json files.
> fhir.org
> package org.apache.ctakes.fhir.cc
> add FhirJsonFileWriter SubDirectory=FHIR
>
> // Write plaintext copy of note text with cui, semantic group, POS.
> Relations are listsed.
> add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
>
> // Write plaintext copy of note sentences with entity and relation
> disveries listed.
> add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
>
> ========================================================================================
>
>
> The output directory should then contain some new output in different
> subdirectories.  You can change the subdirectory names.
>
> Note: the "=================================" are just there to indicate
> what is for the file.  Do not copy them.
>
> There are many more file writers, most of which write simple lists of
> discoveries in one form or another.
> I recommend trying the 4 above and see if any fit your purposes before
> moving on to more specialized writers.
>
> Sean
>
> ________________________________________
> From: Maral Amir <ma...@gmail.com>
> Sent: Thursday, July 18, 2019 7:11 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> Thank you so much for your very helpful and comprehensive response. I was
> able to generate the xmi results in the output directory and used UIMA Cas
> Visual Debugger (CVD) as suggested to view the information. I have two
> questions:
> 1. What is the best reference for me to study and understand the
> annotations.
> 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> readable format without the help of CVD.
>
> Thanks,
> Maral
>
>
> On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Maral,
> >
> > This might be what you are talking about with respect to the Default
> > Clinical Pipeline
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> >
> > That lists a command line method for running a set of files and getting
> > xml output.
> >
> > The default clinical pipeline configuration is actually contained in the
> > plain text (piper) file
> > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> >
> > If you are looking at source code then the file is
> > ctakes-clinical-pipeline-res/src/main/resources/ ...
> >
> > You can also select and run a piper file with a gui
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> >
> > Both methods are mentioned near the bottom of one of the pages detailing
> > pipeline configuration
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> >
> > There are several example pipelines constructed with code and/or plain
> > text files in the ctakes-examples and ctakes-examples-res modules.  You
> can
> > look at the different "Hello World" examples.
> >
> > Since you are playing with maven, you can run the profile "runPiperGui".
> > mvn clean compile -DskipTests -PrunPiperGui
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Maral Amir <ma...@gmail.com>
> > Sent: Thursday, July 18, 2019 2:29 PM
> > To: dev@ctakes.apache.org
> > Subject: cTAKES Pipeline [EXTERNAL]
> >
> > Hi,
> >
> > I just build my developer version of cTAKES with the help of wonderful
> > cTAKES developers.
> >
> > For my next step, I would appreciate if somebody direct me to a right
> path.
> > I am planning to process text clinical documents through the entire
> > pipeline to generate xml output. I see the website suggest walking
> through
> > the Default Clinical Pipeline. I understand there are also multiple git
> > repositories on developed command line tool based Apache cTAKES.
> > My final goal is to integrate cTAKES with some Python packages( OCR,
> etc.)
> > into one pipeline and have some form of web service at the end. I would
> > deeply appreciate any suggestions.
> >
> > Thanks,
> > Maral
> >
>

Re: cTAKES Pipeline [EXTERNAL]

Posted by Maral Amir <ma...@gmail.com>.

Hi Sean,

Thank you so much for your insightful response.
I'm having a problem linking the piper files. I should mention I am using
command line interface. Could you please kindly let me know:

1. What I should set my CTAKES_HOME variable into. Right now I set my
CTAKES_HOME to my cTAKES user installation main folder. That is because I
could see in the last line of the runPiperFile.sh, the class directory
$CTAKES_HOME/lib/* is included and no /lib folder is present in the
developer's version.

java -cp
$CTAKES_HOME/desc/:$CTAKES_HOME/resources/:$CTAKES_HOME/resources/resources:$CTAKES_HOME/lib/*
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx3g
org.apache.ctakes.core.pipeline.PiperFileRunner "$@"


Also,

2. Where is the *"bin"* folder where the bash file resides. Right now I use
this one :
/Users/local/projects/ctakes/trunk/ctakes-distribution/src/main/bin


Thanks,
Maral

On Fri, Jul 19, 2019 at 6:13 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maral,
>
> You can generate different output types by adding different writers to the
> end of the pipeline.
> Here are the contents of the Default Clinical Pipeline piper file:
>
>
> ========================================================================================
> // Commands and parameters to create a default plaintext document
> processing pipeline with UMLS lookup
>
> // Load a simple token processing pipeline from another pipeline file
> load DefaultTokenizerPipeline
>
> // Add non-core annotators
> add ContextDependentTokenizerAnnotator
> addDescription POSTagger
>
> // Add Chunkers
> load ChunkerSubPipe
>
> // Default fast dictionary lookup
> load DictionarySubPipe
>
> // Add Cleartk Entity Attribute annotators
> load AttributeCleartkSubPipe
>
> ========================================================================================
>
>
> I recommend that you copy those lines to a new file (for instance,
> Maral.piper) and then add the following lines:
>
>
> ========================================================================================
> // Write marked copy of note text in interactive html files
> add pretty.html.HtmlTextWriter SubDirectory=HTML
>
> // Write Fast Health Interoperability Resources (FHIR) json files.
> fhir.org
> package org.apache.ctakes.fhir.cc
> add FhirJsonFileWriter SubDirectory=FHIR
>
> // Write plaintext copy of note text with cui, semantic group, POS.
> Relations are listsed.
> add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT
>
> // Write plaintext copy of note sentences with entity and relation
> disveries listed.
> add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
>
> ========================================================================================
>
>
> The output directory should then contain some new output in different
> subdirectories.  You can change the subdirectory names.
>
> Note: the "=================================" are just there to indicate
> what is for the file.  Do not copy them.
>
> There are many more file writers, most of which write simple lists of
> discoveries in one form or another.
> I recommend trying the 4 above and see if any fit your purposes before
> moving on to more specialized writers.
>
> Sean
>
> ________________________________________
> From: Maral Amir <ma...@gmail.com>
> Sent: Thursday, July 18, 2019 7:11 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES Pipeline [EXTERNAL]
>
> Hi Sean,
>
> Thank you so much for your very helpful and comprehensive response. I was
> able to generate the xmi results in the output directory and used UIMA Cas
> Visual Debugger (CVD) as suggested to view the information. I have two
> questions:
> 1. What is the best reference for me to study and understand the
> annotations.
> 2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
> readable format without the help of CVD.
>
> Thanks,
> Maral
>
>
> On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Maral,
> >
> > This might be what you are talking about with respect to the Default
> > Clinical Pipeline
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
> >
> > That lists a command line method for running a set of files and getting
> > xml output.
> >
> > The default clinical pipeline configuration is actually contained in the
> > plain text (piper) file
> > resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> >
> > If you are looking at source code then the file is
> > ctakes-clinical-pipeline-res/src/main/resources/ ...
> >
> > You can also select and run a piper file with a gui
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
> >
> > Both methods are mentioned near the bottom of one of the pages detailing
> > pipeline configuration
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
> >
> > There are several example pipelines constructed with code and/or plain
> > text files in the ctakes-examples and ctakes-examples-res modules.  You
> can
> > look at the different "Hello World" examples.
> >
> > Since you are playing with maven, you can run the profile "runPiperGui".
> > mvn clean compile -DskipTests -PrunPiperGui
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Maral Amir <ma...@gmail.com>
> > Sent: Thursday, July 18, 2019 2:29 PM
> > To: dev@ctakes.apache.org
> > Subject: cTAKES Pipeline [EXTERNAL]
> >
> > Hi,
> >
> > I just build my developer version of cTAKES with the help of wonderful
> > cTAKES developers.
> >
> > For my next step, I would appreciate if somebody direct me to a right
> path.
> > I am planning to process text clinical documents through the entire
> > pipeline to generate xml output. I see the website suggest walking
> through
> > the Default Clinical Pipeline. I understand there are also multiple git
> > repositories on developed command line tool based Apache cTAKES.
> > My final goal is to integrate cTAKES with some Python packages( OCR,
> etc.)
> > into one pipeline and have some form of web service at the end. I would
> > deeply appreciate any suggestions.
> >
> > Thanks,
> > Maral
> >
>

Re: cTAKES Pipeline [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Maral,

You can generate different output types by adding different writers to the end of the pipeline.
Here are the contents of the Default Clinical Pipeline piper file:

========================================================================================
// Commands and parameters to create a default plaintext document processing pipeline with UMLS lookup

// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe
========================================================================================

I recommend that you copy those lines to a new file (for instance, Maral.piper) and then add the following lines:

========================================================================================
// Write marked copy of note text in interactive html files
add pretty.html.HtmlTextWriter SubDirectory=HTML

// Write Fast Health Interoperability Resources (FHIR) json files.  fhir.org
package org.apache.ctakes.fhir.cc
add FhirJsonFileWriter SubDirectory=FHIR

// Write plaintext copy of note text with cui, semantic group, POS.  Relations are listsed.
add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT

// Write plaintext copy of note sentences with entity and relation disveries listed.
add property.plaintext.PropertyTextWriterFit SubDirectory=PROP
========================================================================================

The output directory should then contain some new output in different subdirectories.  You can change the subdirectory names.

Note: the "=================================" are just there to indicate what is for the file.  Do not copy them.

There are many more file writers, most of which write simple lists of discoveries in one form or another.  
I recommend trying the 4 above and see if any fit your purposes before moving on to more specialized writers.

Sean

________________________________________
From: Maral Amir <ma...@gmail.com>
Sent: Thursday, July 18, 2019 7:11 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES Pipeline [EXTERNAL]

Hi Sean,

Thank you so much for your very helpful and comprehensive response. I was
able to generate the xmi results in the output directory and used UIMA Cas
Visual Debugger (CVD) as suggested to view the information. I have two
questions:
1. What is the best reference for me to study and understand the
annotations.
2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
readable format without the help of CVD.

Thanks,
Maral

On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maral,
>
> This might be what you are talking about with respect to the Default
> Clinical Pipeline
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=cBb87McNP4vp678BVVM6z9Wwfr_CQNb--5XKAUPDxYM&e=
>
> That lists a command line method for running a set of files and getting
> xml output.
>
> The default clinical pipeline configuration is actually contained in the
> plain text (piper) file
> resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
>
> If you are looking at source code then the file is
> ctakes-clinical-pipeline-res/src/main/resources/ ...
>
> You can also select and run a piper file with a gui
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=lTtwFsqMJEl1M73fifRpWrO6BZX_R0d2gh3HOqvAx90&e=
>
> Both methods are mentioned near the bottom of one of the pages detailing
> pipeline configuration
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WJJB6qjAiCjVDSuwYgcYjXv0EenGbCblnUGl8Rc5V9I&s=0VYZQYTmgYmbRW_vsbf8XACzsVWdetpqSxeDj_c8RKA&e=
>
> There are several example pipelines constructed with code and/or plain
> text files in the ctakes-examples and ctakes-examples-res modules.  You can
> look at the different "Hello World" examples.
>
> Since you are playing with maven, you can run the profile "runPiperGui".
> mvn clean compile -DskipTests -PrunPiperGui
>
> Sean
>
>
> ________________________________________
> From: Maral Amir <ma...@gmail.com>
> Sent: Thursday, July 18, 2019 2:29 PM
> To: dev@ctakes.apache.org
> Subject: cTAKES Pipeline [EXTERNAL]
>
> Hi,
>
> I just build my developer version of cTAKES with the help of wonderful
> cTAKES developers.
>
> For my next step, I would appreciate if somebody direct me to a right path.
> I am planning to process text clinical documents through the entire
> pipeline to generate xml output. I see the website suggest walking through
> the Default Clinical Pipeline. I understand there are also multiple git
> repositories on developed command line tool based Apache cTAKES.
> My final goal is to integrate cTAKES with some Python packages( OCR, etc.)
> into one pipeline and have some form of web service at the end. I would
> deeply appreciate any suggestions.
>
> Thanks,
> Maral
>

Re: cTAKES Pipeline [EXTERNAL]

Posted by Maral Amir <ma...@gmail.com>.

Hi Sean,

Thank you so much for your very helpful and comprehensive response. I was
able to generate the xmi results in the output directory and used UIMA Cas
Visual Debugger (CVD) as suggested to view the information. I have two
questions:
1. What is the best reference for me to study and understand the
annotations.
2. Is there a CLI equivalent to CVD? I need the annotated outputs in a
readable format without the help of CVD.

Thanks,
Maral


On Thu, Jul 18, 2019 at 12:52 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maral,
>
> This might be what you are talking about with respect to the Default
> Clinical Pipeline
>
> https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline
>
> That lists a command line method for running a set of files and getting
> xml output.
>
> The default clinical pipeline configuration is actually contained in the
> plain text (piper) file
> resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
>
> If you are looking at source code then the file is
> ctakes-clinical-pipeline-res/src/main/resources/ ...
>
> You can also select and run a piper file with a gui
> https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI
>
> Both methods are mentioned near the bottom of one of the pages detailing
> pipeline configuration
> https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
>
> There are several example pipelines constructed with code and/or plain
> text files in the ctakes-examples and ctakes-examples-res modules.  You can
> look at the different "Hello World" examples.
>
> Since you are playing with maven, you can run the profile "runPiperGui".
> mvn clean compile -DskipTests -PrunPiperGui
>
> Sean
>
>
> ________________________________________
> From: Maral Amir <ma...@gmail.com>
> Sent: Thursday, July 18, 2019 2:29 PM
> To: dev@ctakes.apache.org
> Subject: cTAKES Pipeline [EXTERNAL]
>
> Hi,
>
> I just build my developer version of cTAKES with the help of wonderful
> cTAKES developers.
>
> For my next step, I would appreciate if somebody direct me to a right path.
> I am planning to process text clinical documents through the entire
> pipeline to generate xml output. I see the website suggest walking through
> the Default Clinical Pipeline. I understand there are also multiple git
> repositories on developed command line tool based Apache cTAKES.
> My final goal is to integrate cTAKES with some Python packages( OCR, etc.)
> into one pipeline and have some form of web service at the end. I would
> deeply appreciate any suggestions.
>
> Thanks,
> Maral
>