You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by John Green <jo...@gmail.com> on 2014/07/02 02:15:02 UTC

ytex DBconsumer and groovy parser

If someone has a free minute, which, judging from my own life is probably
not the case - where in the groovy scrips in sandbox do you define the
consumer to use? There is one comment that says "dont put the .xml here"
then there is a path to the dictionary ae. Im working by ssh from the
hospital a lot in my "free time" in the ICU and running gui CPEs isn't
gonna cut it.

Apropos the ytex dbconsumer - I should be able to just tack this on to the
end of the ytex aggregate pipeline?

I'm probably still asking very naive questions but to date I still haven't
had the time to dive into UIMA's base very well, so I apologize.

My goal is to run the full ytex pipeline from the command line with the
ytex dbconsumer ...

Thanks for everyone's patience,
John

Re: ytex DBconsumer and groovy parser

Posted by John Green <jo...@gmail.com>.
Thanks for the wonderful explanation Richard! That was very well said.

Such a learning curve!

JG


On Wed, Jul 2, 2014 at 1:11 AM, Richard Eckart de Castilho <re...@apache.org>
wrote:

> Hi John,
>
> there is actually no grand difference between analysis engines and
> consumers.
>
> Per default, a UIMA runtime may create multiple instances of an analysis
> engine and run them in parallel (if the runtime supports that),
> but a "consumer" must see all data going through the pipeline, so there
> can only be once instance.
>
> The default value of flag about being allowing multiple instances or not
> is the only real difference.
>
> Basically any analysis engine that does only read annotations from the CAS
> but not add/change anything is a consumer. Consequently, a consumer can be
> added anywhere in the pipeline, not only at the end (I sometimes do that to
> see intermediate results).
>
> If a component has the "allow multiple instances" flag set to "false"
> (which is usually what you want), then runtimes may react to that
> differently. E.g. the Collection Processing Engine (CPE) will single-thread
> all components (analysis engines or consumers) after it hits the first
> component with "allow multiple instances" set to false (which is typically
> a consumer). So to make optimal use of the CPEs multi-threading
> capabilities, such components should be towards the end of the CPE pipeline.
>
> I believe there is a Java interface declaration and base classes for
> "CasConsumers" in UIMA - I haven't used these in years. The uimaFIT API
> doesn't even support these because everything can also be (and is within
> uimaFIT) nicely modeled using analysis engines and the "allow multiple
> instances" flag.
>
> Cheers,
>
> -- Richard
>
> On 02.07.2014, at 04:01, Masanz, James J. <Ma...@mayo.edu> wrote:
>
> > Hi John,
> >
> > Not positive this is the line you are referring to, but there is a line
> in cTAKES_clinical_pipeline.groovy (which is not in sandbox, btw) that has
> a comment about
> >
> > "createAnalysisEngineDescription  expects name to not end in .xml even
> though filename actually does"
> >
> > I am guessing the comment you see is trying to say the same thing.
> >
> > cTAKES_clinical_pipeline.groovy is in  ctakes-core/scripts/groovy
> >
> > In that script, line 321 is where the writer is specified. There is no
> separately defined "consumer" in the same sense that the CPE GUI has
> consumers that are separate from annotators. The script just uses the last
> "annotator"  as a consumer and convention is AFAIK to call them writers in
> this case.
> >
> > Hope that helps,
> > -- James
> >
> > -----Original Message-----
> > From: John Green [mailto:john.travis.green@gmail.com]
> > Sent: Tuesday, July 01, 2014 7:15 PM
> > To: dev@ctakes.apache.org
> > Subject: ytex DBconsumer and groovy parser
> >
> > If someone has a free minute, which, judging from my own life is probably
> > not the case - where in the groovy scrips in sandbox do you define the
> > consumer to use? There is one comment that says "dont put the .xml here"
> > then there is a path to the dictionary ae. Im working by ssh from the
> > hospital a lot in my "free time" in the ICU and running gui CPEs isn't
> > gonna cut it.
> >
> > Apropos the ytex dbconsumer - I should be able to just tack this on to
> the
> > end of the ytex aggregate pipeline?
> >
> > I'm probably still asking very naive questions but to date I still
> haven't
> > had the time to dive into UIMA's base very well, so I apologize.
> >
> > My goal is to run the full ytex pipeline from the command line with the
> > ytex dbconsumer ...
> >
> > Thanks for everyone's patience,
> > John
>
>

Re: ytex DBconsumer and groovy parser

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi John,

there is actually no grand difference between analysis engines and consumers.

Per default, a UIMA runtime may create multiple instances of an analysis engine and run them in parallel (if the runtime supports that),
but a "consumer" must see all data going through the pipeline, so there can only be once instance.

The default value of flag about being allowing multiple instances or not is the only real difference.

Basically any analysis engine that does only read annotations from the CAS but not add/change anything is a consumer. Consequently, a consumer can be added anywhere in the pipeline, not only at the end (I sometimes do that to see intermediate results).

If a component has the "allow multiple instances" flag set to "false" (which is usually what you want), then runtimes may react to that differently. E.g. the Collection Processing Engine (CPE) will single-thread all components (analysis engines or consumers) after it hits the first component with "allow multiple instances" set to false (which is typically a consumer). So to make optimal use of the CPEs multi-threading capabilities, such components should be towards the end of the CPE pipeline.

I believe there is a Java interface declaration and base classes for "CasConsumers" in UIMA - I haven't used these in years. The uimaFIT API doesn't even support these because everything can also be (and is within uimaFIT) nicely modeled using analysis engines and the "allow multiple instances" flag.

Cheers,

-- Richard

On 02.07.2014, at 04:01, Masanz, James J. <Ma...@mayo.edu> wrote:

> Hi John,
> 
> Not positive this is the line you are referring to, but there is a line in cTAKES_clinical_pipeline.groovy (which is not in sandbox, btw) that has a comment about 
> 
> "createAnalysisEngineDescription  expects name to not end in .xml even though filename actually does"
> 
> I am guessing the comment you see is trying to say the same thing. 
> 
> cTAKES_clinical_pipeline.groovy is in  ctakes-core/scripts/groovy
> 
> In that script, line 321 is where the writer is specified. There is no separately defined "consumer" in the same sense that the CPE GUI has consumers that are separate from annotators. The script just uses the last "annotator"  as a consumer and convention is AFAIK to call them writers in this case.
> 
> Hope that helps,
> -- James
> 
> -----Original Message-----
> From: John Green [mailto:john.travis.green@gmail.com] 
> Sent: Tuesday, July 01, 2014 7:15 PM
> To: dev@ctakes.apache.org
> Subject: ytex DBconsumer and groovy parser
> 
> If someone has a free minute, which, judging from my own life is probably
> not the case - where in the groovy scrips in sandbox do you define the
> consumer to use? There is one comment that says "dont put the .xml here"
> then there is a path to the dictionary ae. Im working by ssh from the
> hospital a lot in my "free time" in the ICU and running gui CPEs isn't
> gonna cut it.
> 
> Apropos the ytex dbconsumer - I should be able to just tack this on to the
> end of the ytex aggregate pipeline?
> 
> I'm probably still asking very naive questions but to date I still haven't
> had the time to dive into UIMA's base very well, so I apologize.
> 
> My goal is to run the full ytex pipeline from the command line with the
> ytex dbconsumer ...
> 
> Thanks for everyone's patience,
> John


Re: ytex DBconsumer and groovy parser

Posted by John Green <jo...@gmail.com>.
Thanks James, that was a forehead slapper - I was looking in a partial
build (the ytex download).

JG


On Tue, Jul 1, 2014 at 10:01 PM, Masanz, James J. <Ma...@mayo.edu>
wrote:

> Hi John,
>
> Not positive this is the line you are referring to, but there is a line in
> cTAKES_clinical_pipeline.groovy (which is not in sandbox, btw) that has a
> comment about
>
> "createAnalysisEngineDescription  expects name to not end in .xml even
> though filename actually does"
>
> I am guessing the comment you see is trying to say the same thing.
>
> cTAKES_clinical_pipeline.groovy is in  ctakes-core/scripts/groovy
>
> In that script, line 321 is where the writer is specified. There is no
> separately defined "consumer" in the same sense that the CPE GUI has
> consumers that are separate from annotators. The script just uses the last
> "annotator"  as a consumer and convention is AFAIK to call them writers in
> this case.
>
> Hope that helps,
> -- James
>
> -----Original Message-----
> From: John Green [mailto:john.travis.green@gmail.com]
> Sent: Tuesday, July 01, 2014 7:15 PM
> To: dev@ctakes.apache.org
> Subject: ytex DBconsumer and groovy parser
>
> If someone has a free minute, which, judging from my own life is probably
> not the case - where in the groovy scrips in sandbox do you define the
> consumer to use? There is one comment that says "dont put the .xml here"
> then there is a path to the dictionary ae. Im working by ssh from the
> hospital a lot in my "free time" in the ICU and running gui CPEs isn't
> gonna cut it.
>
> Apropos the ytex dbconsumer - I should be able to just tack this on to the
> end of the ytex aggregate pipeline?
>
> I'm probably still asking very naive questions but to date I still haven't
> had the time to dive into UIMA's base very well, so I apologize.
>
> My goal is to run the full ytex pipeline from the command line with the
> ytex dbconsumer ...
>
> Thanks for everyone's patience,
> John
>

RE: ytex DBconsumer and groovy parser

Posted by "Masanz, James J." <Ma...@mayo.edu>.
Hi John,

Not positive this is the line you are referring to, but there is a line in cTAKES_clinical_pipeline.groovy (which is not in sandbox, btw) that has a comment about 

"createAnalysisEngineDescription  expects name to not end in .xml even though filename actually does"

I am guessing the comment you see is trying to say the same thing. 

cTAKES_clinical_pipeline.groovy is in  ctakes-core/scripts/groovy

In that script, line 321 is where the writer is specified. There is no separately defined "consumer" in the same sense that the CPE GUI has consumers that are separate from annotators. The script just uses the last "annotator"  as a consumer and convention is AFAIK to call them writers in this case.

Hope that helps,
-- James
 
-----Original Message-----
From: John Green [mailto:john.travis.green@gmail.com] 
Sent: Tuesday, July 01, 2014 7:15 PM
To: dev@ctakes.apache.org
Subject: ytex DBconsumer and groovy parser

If someone has a free minute, which, judging from my own life is probably
not the case - where in the groovy scrips in sandbox do you define the
consumer to use? There is one comment that says "dont put the .xml here"
then there is a path to the dictionary ae. Im working by ssh from the
hospital a lot in my "free time" in the ICU and running gui CPEs isn't
gonna cut it.

Apropos the ytex dbconsumer - I should be able to just tack this on to the
end of the ytex aggregate pipeline?

I'm probably still asking very naive questions but to date I still haven't
had the time to dive into UIMA's base very well, so I apologize.

My goal is to run the full ytex pipeline from the command line with the
ytex dbconsumer ...

Thanks for everyone's patience,
John