You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by Sandy Ryza <sa...@cloudera.com> on 2013/06/19 21:17:57 UTC

docs on running Clinical Document Pipeline from Java?

Hi cTAKES folks,

I am trying to figure out how to run the Clinical Document Pipeline from
Java.  All the documentation I have found so far has been about how to do
this through a GUI.  Is there anything on how to run the pipeline
programmatically?

thanks for any help!
Sandy

RE: docs on running Clinical Document Pipeline from Java?

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Hi,
One can include the ctakes code as a maven dependency, however there is a current limitation- in order to run the pipeline, it essentially needs the /desc and /resources unpacked somewhere on disk.  There is an effort to streamline the resource loading so that it will make it easier to integrate the modules.
Until then, one will need to essentially perform an "mvn package -DskipTests" to package everything into a single package into ctakes-distribution/target.
Maybe there are other ways...
--Pei

> -----Original Message-----
> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> Sent: Thursday, June 20, 2013 5:24 PM
> To: dev@ctakes.apache.org
> Subject: Re: docs on running Clinical Document Pipeline from Java?
> 
> Thanks for the help!  Is there any advice on the best way to include ctakes as
> a dependency?  I've tried writing some code that points to
> AggregatePlaintextUMLSProcessor.xml, but it doesn't know where to find
> the other files that are referred to.  Is there any good way to package ctakes
> up and refer to a unit?  We want to be able to distribute something that
> relies on ctakes in a cluster.
> 
> (Here's the error I'm getting)
> Import failed.  Could not read from URL
> file:/home/sandy/ctakes-dependency-
> parser/desc/analysis_engine/ClearParserDependencyParserAE.xml.
> (Descriptor:
> file:/home/sandy/datascience/Mayo_cTAKES/mr/AggregatePlaintextUMLSP
> rocessor.xml)
> 
> -Sandy
> 
> 
> On Wed, Jun 19, 2013 at 2:30 PM, Andy McMurry
> <mc...@gmail.com>wrote:
> 
> > Note: The WEKA gui reports the command line arguments for any GUI task.
> > It could be a very helpful timesaver if cTAKES had a similar feature.
> >
> > Otherwise, I fear we will be writing Main methods and docs for each
> > and every cTAKES task.
> > What do you all think?
> >
> > -------
> >
> > Real world example of how this works in Weka.
> > Say you wanted to run Adaboost on a C4.5 decision tree with cost
> > sensitive classification.
> > Weka reports the arguments, which I can re-run from command line
> >
> > Classifier csc = new CostSensitiveClassifier();
> >
> >         String[] adaboost = {
> >                 "-cost-matrix", costMatrix,
> >                 "-S", "1",
> >                 "-W", "weka.classifiers.meta.AdaBoostM1",
> >                 "--",
> >                 "-P", "100",
> >                 "-S", "1",
> >                 "-I", "30",
> >                 //
> >                 "-W", "weka.classifiers.trees.J48",
> >                 "--",
> >                 "-C", String.valueOf(j48Confidence),
> >                 "-M", String.valueOf(j48MinObjects)
> >         };
> >
> > csc.setOptions(adaboost);
> >
> >
> >
> >
> >
> >
> >
> >
> > On Jun 19, 2013, at 5:20 PM, "Chen, Pei"
> > <Pe...@childrens.harvard.edu>
> > wrote:
> >
> > > Also,
> > > Tim recently just checked in a Main class that essentially could be
> > > the
> > beginnings of a Driver program.
> > > Check the main() out at:
> > >
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-clinical-pipeline/
> > src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGene
> > rator.java
> > >
> > > --Pei
> > >
> > >
> > >> -----Original Message-----
> > >> From: Girivaraprasad Nambari [mailto:girinambari@gmail.com]
> > >> Sent: Wednesday, June 19, 2013 3:47 PM
> > >> To: dev@ctakes.apache.org
> > >> Subject: Re: docs on running Clinical Document Pipeline from Java?
> > >>
> > >> Hi,
> > >>
> > >> Welcome to ctakes.
> > >>
> > >> There was a similar discussion initiated by me few months ago (you
> > >> may
> > be
> > >> able to find out if you browse through old discussions) . Here is
> > response
> > >> form Pei Chen & ctakes community:
> > >>
> > >> It is not quite prime time ready but, take a look peek at the below
> > >> (It
> > uses
> > >> uimaFIT to do the above):
> > >>
> > >> **
> > >>
> > >> http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-
> > >>
> > gui/src/main/java/org/chboston/cnlp/ctakes/gui/service/LauncherService
> > .ja
> > >> va
> > >> ****
> > >>
> > >> ** Essentially, it boils down to a few lines of code:
> > >>
> > >> AnalysisEngine aggregateAE =
> > >> AnalysisEngineFactory.createAggregate(****
> > >>
> > >>               engines, componentNames, typeSystemDescription,
> > >> null,****
> > >>
> > >>               new SofaMapping[0]);****
> > >>
> > >>              ****
> > >>
> > >> JCas jcas = aggregateAE.newJCas();****
> > >>
> > >> jcas.setDocumentText(doc.getText());****
> > >>
> > >> aggregateAE.process(jcas);
> > >>
> > >>
> > >> We need to start from UIMA and UIMAfit to get some basic
> > >> understanding, then using ctakes component will be easy.
> > >>
> > >> Good luck!
> > >>
> > >> Thank you,
> > >>
> > >> Giri
> > >>
> > >>
> > >> On Wed, Jun 19, 2013 at 3:17 PM, Sandy Ryza
> > >> <sa...@cloudera.com>
> > >> wrote:
> > >>
> > >>> Hi cTAKES folks,
> > >>>
> > >>> I am trying to figure out how to run the Clinical Document
> > >>> Pipeline from Java.  All the documentation I have found so far has
> > >>> been about how to do this through a GUI.  Is there anything on how
> > >>> to run the pipeline programmatically?
> > >>>
> > >>> thanks for any help!
> > >>> Sandy
> > >>>
> >
> >

Re: docs on running Clinical Document Pipeline from Java?

Posted by Sandy Ryza <sa...@cloudera.com>.

Thanks for the help!  Is there any advice on the best way to include ctakes
as a dependency?  I've tried writing some code that points
to AggregatePlaintextUMLSProcessor.xml, but it doesn't know where to find
the other files that are referred to.  Is there any good way to package
ctakes up and refer to a unit?  We want to be able to distribute something
that relies on ctakes in a cluster.

(Here's the error I'm getting)
Import failed.  Could not read from URL
file:/home/sandy/ctakes-dependency-parser/desc/analysis_engine/ClearParserDependencyParserAE.xml.
(Descriptor:
file:/home/sandy/datascience/Mayo_cTAKES/mr/AggregatePlaintextUMLSProcessor.xml)

-Sandy


On Wed, Jun 19, 2013 at 2:30 PM, Andy McMurry <mc...@gmail.com>wrote:

> Note: The WEKA gui reports the command line arguments for any GUI task.
> It could be a very helpful timesaver if cTAKES had a similar feature.
>
> Otherwise, I fear we will be writing Main methods and docs for each and
> every cTAKES task.
> What do you all think?
>
> -------
>
> Real world example of how this works in Weka.
> Say you wanted to run Adaboost on a C4.5 decision tree with cost sensitive
> classification.
> Weka reports the arguments, which I can re-run from command line
>
> Classifier csc = new CostSensitiveClassifier();
>
>         String[] adaboost = {
>                 "-cost-matrix", costMatrix,
>                 "-S", "1",
>                 "-W", "weka.classifiers.meta.AdaBoostM1",
>                 "--",
>                 "-P", "100",
>                 "-S", "1",
>                 "-I", "30",
>                 //
>                 "-W", "weka.classifiers.trees.J48",
>                 "--",
>                 "-C", String.valueOf(j48Confidence),
>                 "-M", String.valueOf(j48MinObjects)
>         };
>
> csc.setOptions(adaboost);
>
>
>
>
>
>
>
>
> On Jun 19, 2013, at 5:20 PM, "Chen, Pei" <Pe...@childrens.harvard.edu>
> wrote:
>
> > Also,
> > Tim recently just checked in a Main class that essentially could be the
> beginnings of a Driver program.
> > Check the main() out at:
> >
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGenerator.java
> >
> > --Pei
> >
> >
> >> -----Original Message-----
> >> From: Girivaraprasad Nambari [mailto:girinambari@gmail.com]
> >> Sent: Wednesday, June 19, 2013 3:47 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: docs on running Clinical Document Pipeline from Java?
> >>
> >> Hi,
> >>
> >> Welcome to ctakes.
> >>
> >> There was a similar discussion initiated by me few months ago (you may
> be
> >> able to find out if you browse through old discussions) . Here is
> response
> >> form Pei Chen & ctakes community:
> >>
> >> It is not quite prime time ready but, take a look peek at the below (It
> uses
> >> uimaFIT to do the above):
> >>
> >> **
> >>
> >> http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-
> >>
> gui/src/main/java/org/chboston/cnlp/ctakes/gui/service/LauncherService.ja
> >> va
> >> ****
> >>
> >> ** Essentially, it boils down to a few lines of code:
> >>
> >> AnalysisEngine aggregateAE = AnalysisEngineFactory.createAggregate(****
> >>
> >>               engines, componentNames, typeSystemDescription, null,****
> >>
> >>               new SofaMapping[0]);****
> >>
> >>              ****
> >>
> >> JCas jcas = aggregateAE.newJCas();****
> >>
> >> jcas.setDocumentText(doc.getText());****
> >>
> >> aggregateAE.process(jcas);
> >>
> >>
> >> We need to start from UIMA and UIMAfit to get some basic understanding,
> >> then using ctakes component will be easy.
> >>
> >> Good luck!
> >>
> >> Thank you,
> >>
> >> Giri
> >>
> >>
> >> On Wed, Jun 19, 2013 at 3:17 PM, Sandy Ryza <sa...@cloudera.com>
> >> wrote:
> >>
> >>> Hi cTAKES folks,
> >>>
> >>> I am trying to figure out how to run the Clinical Document Pipeline
> >>> from Java.  All the documentation I have found so far has been about
> >>> how to do this through a GUI.  Is there anything on how to run the
> >>> pipeline programmatically?
> >>>
> >>> thanks for any help!
> >>> Sandy
> >>>
>
>

Re: docs on running Clinical Document Pipeline from Java?

Posted by Andy McMurry <mc...@gmail.com>.

Note: The WEKA gui reports the command line arguments for any GUI task. 
It could be a very helpful timesaver if cTAKES had a similar feature. 

Otherwise, I fear we will be writing Main methods and docs for each and every cTAKES task. 
What do you all think? 

-------

Real world example of how this works in Weka. 
Say you wanted to run Adaboost on a C4.5 decision tree with cost sensitive classification. 
Weka reports the arguments, which I can re-run from command line 

Classifier csc = new CostSensitiveClassifier();

        String[] adaboost = {
                "-cost-matrix", costMatrix,
                "-S", "1",
                "-W", "weka.classifiers.meta.AdaBoostM1",
                "--",
                "-P", "100",
                "-S", "1",
                "-I", "30",
                //
                "-W", "weka.classifiers.trees.J48",
                "--",
                "-C", String.valueOf(j48Confidence),
                "-M", String.valueOf(j48MinObjects)
        };

csc.setOptions(adaboost); 








On Jun 19, 2013, at 5:20 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:

> Also,
> Tim recently just checked in a Main class that essentially could be the beginnings of a Driver program.
> Check the main() out at:
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGenerator.java
> 
> --Pei
> 
> 
>> -----Original Message-----
>> From: Girivaraprasad Nambari [mailto:girinambari@gmail.com]
>> Sent: Wednesday, June 19, 2013 3:47 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: docs on running Clinical Document Pipeline from Java?
>> 
>> Hi,
>> 
>> Welcome to ctakes.
>> 
>> There was a similar discussion initiated by me few months ago (you may be
>> able to find out if you browse through old discussions) . Here is response
>> form Pei Chen & ctakes community:
>> 
>> It is not quite prime time ready but, take a look peek at the below (It uses
>> uimaFIT to do the above):
>> 
>> **
>> 
>> http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-
>> gui/src/main/java/org/chboston/cnlp/ctakes/gui/service/LauncherService.ja
>> va
>> ****
>> 
>> ** Essentially, it boils down to a few lines of code:
>> 
>> AnalysisEngine aggregateAE = AnalysisEngineFactory.createAggregate(****
>> 
>>               engines, componentNames, typeSystemDescription, null,****
>> 
>>               new SofaMapping[0]);****
>> 
>>              ****
>> 
>> JCas jcas = aggregateAE.newJCas();****
>> 
>> jcas.setDocumentText(doc.getText());****
>> 
>> aggregateAE.process(jcas);
>> 
>> 
>> We need to start from UIMA and UIMAfit to get some basic understanding,
>> then using ctakes component will be easy.
>> 
>> Good luck!
>> 
>> Thank you,
>> 
>> Giri
>> 
>> 
>> On Wed, Jun 19, 2013 at 3:17 PM, Sandy Ryza <sa...@cloudera.com>
>> wrote:
>> 
>>> Hi cTAKES folks,
>>> 
>>> I am trying to figure out how to run the Clinical Document Pipeline
>>> from Java.  All the documentation I have found so far has been about
>>> how to do this through a GUI.  Is there anything on how to run the
>>> pipeline programmatically?
>>> 
>>> thanks for any help!
>>> Sandy
>>>

RE: docs on running Clinical Document Pipeline from Java?

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Also,
Tim recently just checked in a Main class that essentially could be the beginnings of a Driver program.
Check the main() out at:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGenerator.java

--Pei


> -----Original Message-----
> From: Girivaraprasad Nambari [mailto:girinambari@gmail.com]
> Sent: Wednesday, June 19, 2013 3:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: docs on running Clinical Document Pipeline from Java?
> 
> Hi,
> 
> Welcome to ctakes.
> 
> There was a similar discussion initiated by me few months ago (you may be
> able to find out if you browse through old discussions) . Here is response
> form Pei Chen & ctakes community:
> 
> It is not quite prime time ready but, take a look peek at the below (It uses
> uimaFIT to do the above):
> 
> **
> 
> http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-
> gui/src/main/java/org/chboston/cnlp/ctakes/gui/service/LauncherService.ja
> va
> ****
> 
> ** Essentially, it boils down to a few lines of code:
> 
> AnalysisEngine aggregateAE = AnalysisEngineFactory.createAggregate(****
> 
>                engines, componentNames, typeSystemDescription, null,****
> 
>                new SofaMapping[0]);****
> 
>               ****
> 
> JCas jcas = aggregateAE.newJCas();****
> 
> jcas.setDocumentText(doc.getText());****
> 
> aggregateAE.process(jcas);
> 
> 
> We need to start from UIMA and UIMAfit to get some basic understanding,
> then using ctakes component will be easy.
> 
> Good luck!
> 
> Thank you,
> 
> Giri
> 
> 
> On Wed, Jun 19, 2013 at 3:17 PM, Sandy Ryza <sa...@cloudera.com>
> wrote:
> 
> > Hi cTAKES folks,
> >
> > I am trying to figure out how to run the Clinical Document Pipeline
> > from Java.  All the documentation I have found so far has been about
> > how to do this through a GUI.  Is there anything on how to run the
> > pipeline programmatically?
> >
> > thanks for any help!
> > Sandy
> >

Re: docs on running Clinical Document Pipeline from Java?

Posted by Girivaraprasad Nambari <gi...@gmail.com>.

Hi,

Welcome to ctakes.

There was a similar discussion initiated by me few months ago (you may be
able to find out if you browse through old discussions) . Here is response
form Pei Chen & ctakes community:

It is not quite prime time ready but, take a look peek at the below (It
uses uimaFIT to do the above):

**

http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-gui/src/main/java/org/chboston/cnlp/ctakes/gui/service/LauncherService.java
****

** Essentially, it boils down to a few lines of code:

AnalysisEngine aggregateAE = AnalysisEngineFactory.createAggregate(****

               engines, componentNames, typeSystemDescription, null,****

               new SofaMapping[0]);****

              ****

JCas jcas = aggregateAE.newJCas();****

jcas.setDocumentText(doc.getText());****

aggregateAE.process(jcas);

We need to start from UIMA and UIMAfit to get some basic understanding,
then using ctakes component will be easy.

Good luck!

Thank you,

Giri

On Wed, Jun 19, 2013 at 3:17 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi cTAKES folks,
>
> I am trying to figure out how to run the Clinical Document Pipeline from
> Java.  All the documentation I have found so far has been about how to do
> this through a GUI.  Is there anything on how to run the pipeline
> programmatically?
>
> thanks for any help!
> Sandy
>