You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by "Miller, Timothy" <Ti...@childrens.harvard.edu> on 2014/04/15 15:52:51 UTC

suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim

Re: suggestion for default pipelines

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Yes. I was thinking of the use case for example- the ytex component need SentenceDectectorA but dictionary lookup component expects SentenceDectectorB. It's probably not too common but something to consider with the cool dynamic/plugin n play pipelines idea. 

Sent from my iPhone

> On Apr 28, 2014, at 5:46 AM, "Richard Eckart de Castilho" <re...@apache.org> wrote:
> 
> At the time a factory method becomes callable, the Maven/Ivy-magic should already have taken place, no?
> 
> -- Richard
> 
>> On 27.04.2014, at 17:52, Chen, Pei <Pe...@childrens.harvard.edu> wrote:
>> 
>> My vote would be for the latter. Have the "Factory" create pipelines instead. It could just be a naming thing though...
>> 
>> +1 for building dynamic pipelines. I think this idea has been thrown around for sometime, but it hasn't been really worked on so it would be cool to see it in action. I think the tricky part is handling pipeline dependencies- ie. Similar concept to Maven/Ivy. 
>> 
>> Sent from my iPhone
>> 
>>> On Apr 24, 2014, at 5:48 PM, "Miller, Timothy" <Ti...@childrens.harvard.edu> wrote:
>>> 
>>> Any preference for separate factory classes:
>>> 
>>> class SentenceDetectorAnnotatorFactory:
>>> 
>>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>>> 
>>> VS
>>> 
>>> static methods added to primitive annotators:
>>> 
>>> class SentenceDetector (existing)
>>> 
>>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>>> 
>>> ?
>>> 
>>> The former can clutter up the class space while the latter extends the
>>> length of classes, especially if there are multiple versions
>>> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
>>> getMeshDictionaryAnnotator(), etc.)
>>> 
>>> Tim
>>> 
>>>> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>>>> It would be nice if uimaFIT provided a Maven plugin to automatically
>>>> generate descriptors for aggregates. Maybe if we come up with a 
>>>> convention for factories, e.g. a "class with static methods that do
>>>> not take any parameters and that return descriptors", or "methods
>>>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>>>> it should be possible to implement such a Maven plugin.
>>>> 
>>>> Cheers,
>>>> 
>>>> -- Richard
>>>> 
>>>>> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>>>>> 
>>>>> +1. And note that once you have a descriptor, you can generate the
>>>>> XML, so we should arrange to replace the current XML descriptors with
>>>>> ones generated automatically from the uimaFIT code. That should reduce
>>>>> some synchronization problems when the Java code was changed but the
>>>>> XML descriptor was not.
>>>>> 
>>>>> Steve
>>>>> 
>>>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>>>> builders in the temporal project that maybe could be moved into
>>>>>> clinical-pipeline. For example, look to this file:
>>>>>> 
>>>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>>>> 
>>>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>>>> 
>>>>>> So my idea would be to create a class in clinical-pipeline
>>>>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>>>> 
>>>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>>>> AggregatePlaintextUMLSProcessor.xml
>>>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>>>> etc., every component in ctakes
>>>>>> 
>>>>>> We could then potentially merge our entry points -- I think Abraham's
>>>>>> experience points out that this is currently confusing, as well as
>>>>>> probably not implemented optimally. For example, either
>>>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>>>> our xml descriptors too unless people feel strongly about keeping those
>>>>>> around.
>>>>>> 
>>>>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>>>> 
>>>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>>>> 
>>>>>> 
>>>>>> I think this would actually be pretty easy to implement, but hoping to
>>>>>> get some feedback on whether this is a good direction.
>>>>>> 
>>>>>> Tim
>>> 
>>> -- 
>>> Tim Miller
>>> Instructor
>>> Boston Children's Hospital and Harvard Medical School
>>> timothy.miller@childrens.harvard.edu
>>> 617-919-1223
>

Re: suggestion for default pipelines

Posted by Richard Eckart de Castilho <re...@apache.org>.

At the time a factory method becomes callable, the Maven/Ivy-magic should already have taken place, no?

-- Richard

On 27.04.2014, at 17:52, Chen, Pei <Pe...@childrens.harvard.edu> wrote:

> My vote would be for the latter. Have the "Factory" create pipelines instead. It could just be a naming thing though...
> 
> +1 for building dynamic pipelines. I think this idea has been thrown around for sometime, but it hasn't been really worked on so it would be cool to see it in action. I think the tricky part is handling pipeline dependencies- ie. Similar concept to Maven/Ivy. 
> 
> Sent from my iPhone
> 
>> On Apr 24, 2014, at 5:48 PM, "Miller, Timothy" <Ti...@childrens.harvard.edu> wrote:
>> 
>> Any preference for separate factory classes:
>> 
>> class SentenceDetectorAnnotatorFactory:
>> 
>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>> 
>> VS
>> 
>> static methods added to primitive annotators:
>> 
>> class SentenceDetector (existing)
>> 
>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>> 
>> ?
>> 
>> The former can clutter up the class space while the latter extends the
>> length of classes, especially if there are multiple versions
>> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
>> getMeshDictionaryAnnotator(), etc.)
>> 
>> Tim
>> 
>>> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>>> It would be nice if uimaFIT provided a Maven plugin to automatically
>>> generate descriptors for aggregates. Maybe if we come up with a 
>>> convention for factories, e.g. a "class with static methods that do
>>> not take any parameters and that return descriptors", or "methods
>>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>>> it should be possible to implement such a Maven plugin.
>>> 
>>> Cheers,
>>> 
>>> -- Richard
>>> 
>>>> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>>>> 
>>>> +1. And note that once you have a descriptor, you can generate the
>>>> XML, so we should arrange to replace the current XML descriptors with
>>>> ones generated automatically from the uimaFIT code. That should reduce
>>>> some synchronization problems when the Java code was changed but the
>>>> XML descriptor was not.
>>>> 
>>>> Steve
>>>> 
>>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>>> builders in the temporal project that maybe could be moved into
>>>>> clinical-pipeline. For example, look to this file:
>>>>> 
>>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>>> 
>>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>>> 
>>>>> So my idea would be to create a class in clinical-pipeline
>>>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>>> 
>>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>>> AggregatePlaintextUMLSProcessor.xml
>>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>>> etc., every component in ctakes
>>>>> 
>>>>> We could then potentially merge our entry points -- I think Abraham's
>>>>> experience points out that this is currently confusing, as well as
>>>>> probably not implemented optimally. For example, either
>>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>>> our xml descriptors too unless people feel strongly about keeping those
>>>>> around.
>>>>> 
>>>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>>> 
>>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>>> 
>>>>> 
>>>>> I think this would actually be pretty easy to implement, but hoping to
>>>>> get some feedback on whether this is a good direction.
>>>>> 
>>>>> Tim
>> 
>> -- 
>> Tim Miller
>> Instructor
>> Boston Children's Hospital and Harvard Medical School
>> timothy.miller@childrens.harvard.edu
>> 617-919-1223

Re: suggestion for default pipelines

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

My vote would be for the latter. Have the "Factory" create pipelines instead. It could just be a naming thing though...

+1 for building dynamic pipelines. I think this idea has been thrown around for sometime, but it hasn't been really worked on so it would be cool to see it in action. I think the tricky part is handling pipeline dependencies- ie. Similar concept to Maven/Ivy. 

Sent from my iPhone

> On Apr 24, 2014, at 5:48 PM, "Miller, Timothy" <Ti...@childrens.harvard.edu> wrote:
> 
> Any preference for separate factory classes:
> 
> class SentenceDetectorAnnotatorFactory:
> 
> static AnalysisEngineDescription getSentenceDetectorAnnotator()
> 
> VS
> 
> static methods added to primitive annotators:
> 
> class SentenceDetector (existing)
> 
> static AnalysisEngineDescription getSentenceDetectorAnnotator()
> 
> ?
> 
> The former can clutter up the class space while the latter extends the
> length of classes, especially if there are multiple versions
> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
> getMeshDictionaryAnnotator(), etc.)
> 
> Tim
> 
>> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>> It would be nice if uimaFIT provided a Maven plugin to automatically
>> generate descriptors for aggregates. Maybe if we come up with a 
>> convention for factories, e.g. a "class with static methods that do
>> not take any parameters and that return descriptors", or "methods
>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>> it should be possible to implement such a Maven plugin.
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>>> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>>> 
>>> +1. And note that once you have a descriptor, you can generate the
>>> XML, so we should arrange to replace the current XML descriptors with
>>> ones generated automatically from the uimaFIT code. That should reduce
>>> some synchronization problems when the Java code was changed but the
>>> XML descriptor was not.
>>> 
>>> Steve
>>> 
>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>> <Ti...@childrens.harvard.edu> wrote:
>>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>> builders in the temporal project that maybe could be moved into
>>>> clinical-pipeline. For example, look to this file:
>>>> 
>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>> 
>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>> 
>>>> So my idea would be to create a class in clinical-pipeline
>>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>> 
>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>> AggregatePlaintextUMLSProcessor.xml
>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>> etc., every component in ctakes
>>>> 
>>>> We could then potentially merge our entry points -- I think Abraham's
>>>> experience points out that this is currently confusing, as well as
>>>> probably not implemented optimally. For example, either
>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>> our xml descriptors too unless people feel strongly about keeping those
>>>> around.
>>>> 
>>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>> 
>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>> 
>>>> 
>>>> I think this would actually be pretty easy to implement, but hoping to
>>>> get some feedback on whether this is a good direction.
>>>> 
>>>> Tim
> 
> -- 
> Tim Miller
> Instructor
> Boston Children's Hospital and Harvard Medical School
> timothy.miller@childrens.harvard.edu
> 617-919-1223
>

RE: suggestion for default pipelines

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Please take a look at what I've done so far if you're interested:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java?view=markup

I don't have the full pipeline supported yet, but I have all the components up to the dictionary with their own static primitive creating methods, and then some methods for getting common aggregates. If anyone has any concerns at this point please let me know otherwise I'll keep going along this track.

Thanks
Tim


PS Sean Finan and I had an offline discussion about interesting next steps -- first, self-building pipelines, where you can get aggregates from primitives by having them build pipelines with their own prerequisites. Specifically, a method in the dictionary annotator that builds a pipeline with the lookup window annotator and adds itself at the end, where the lookup window annotator builds itself in a similar way, and so on recursively. Then we thought it would be cool as well to just have the APi programmer just specify what types they want (EventMention, EntityMention), and have the pipeline built to get those types. That requires a bit more infrastructure, but would be really cool!



________________________________________
From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Thursday, April 24, 2014 5:48 PM
To: dev@ctakes.apache.org
Subject: Re: suggestion for default pipelines

Any preference for separate factory classes:

class SentenceDetectorAnnotatorFactory:

static AnalysisEngineDescription getSentenceDetectorAnnotator()

VS

static methods added to primitive annotators:

class SentenceDetector (existing)

static AnalysisEngineDescription getSentenceDetectorAnnotator()

?

The former can clutter up the class space while the latter extends the
length of classes, especially if there are multiple versions
(getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
getMeshDictionaryAnnotator(), etc.)

Tim

On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
> It would be nice if uimaFIT provided a Maven plugin to automatically
> generate descriptors for aggregates. Maybe if we come up with a
> convention for factories, e.g. a "class with static methods that do
> not take any parameters and that return descriptors", or "methods
> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
> it should be possible to implement such a Maven plugin.
>
> Cheers,
>
> -- Richard
>
> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>
>> +1. And note that once you have a descriptor, you can generate the
>> XML, so we should arrange to replace the current XML descriptors with
>> ones generated automatically from the uimaFIT code. That should reduce
>> some synchronization problems when the Java code was changed but the
>> XML descriptor was not.
>>
>> Steve
>>
>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>> builders in the temporal project that maybe could be moved into
>>> clinical-pipeline. For example, look to this file:
>>>
>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>
>>> with the static methods getPreprocessorAggregateBuilder() and
>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>
>>> So my idea would be to create a class in clinical-pipeline
>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>
>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>> AggregatePlaintextUMLSProcessor.xml
>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>> etc., every component in ctakes
>>>
>>> We could then potentially merge our entry points -- I think Abraham's
>>> experience points out that this is currently confusing, as well as
>>> probably not implemented optimally. For example, either
>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>> our xml descriptors too unless people feel strongly about keeping those
>>> around.
>>>
>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>
>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>
>>>
>>> I think this would actually be pretty easy to implement, but hoping to
>>> get some feedback on whether this is a good direction.
>>>
>>> Tim
>

--
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy.miller@childrens.harvard.edu
617-919-1223

Re: suggestion for default pipelines

Posted by Richard Eckart de Castilho <re...@apache.org>.

There is already code scanning for annotations in the "generate" goal of the
uimafit-maven-plugin. It may not be much effort to extend that to scan for
and invoke such methods. 

-- Richard

On 27.04.2014, at 10:39, Richard Eckart de Castilho <re...@apache.org> wrote:

> Maybe that choice should be left to the user. Factory methods could be marked using
> a special Java annotation that is scanned for at build time, e.g. something along the
> lines of
> 
> @DescriptionGenerator
> static AnalysisEngineDescription getSentenceDetectorAnnotator()
> 
> -- Richard
> 
> On 24.04.2014, at 23:41, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> 
>> Any preference for separate factory classes:
>> 
>> class SentenceDetectorAnnotatorFactory:
>> 
>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>> 
>> VS
>> 
>> static methods added to primitive annotators:
>> 
>> class SentenceDetector (existing)
>> 
>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>> 
>> ?
>> 
>> The former can clutter up the class space while the latter extends the
>> length of classes, especially if there are multiple versions
>> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
>> getMeshDictionaryAnnotator(), etc.)
>> 
>> Tim
>> 
>> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>>> It would be nice if uimaFIT provided a Maven plugin to automatically
>>> generate descriptors for aggregates. Maybe if we come up with a 
>>> convention for factories, e.g. a "class with static methods that do
>>> not take any parameters and that return descriptors", or "methods
>>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>>> it should be possible to implement such a Maven plugin.
>>> 
>>> Cheers,
>>> 
>>> -- Richard
>>> 
>>> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>>> 
>>>> +1. And note that once you have a descriptor, you can generate the
>>>> XML, so we should arrange to replace the current XML descriptors with
>>>> ones generated automatically from the uimaFIT code. That should reduce
>>>> some synchronization problems when the Java code was changed but the
>>>> XML descriptor was not.
>>>> 
>>>> Steve
>>>> 
>>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>>> builders in the temporal project that maybe could be moved into
>>>>> clinical-pipeline. For example, look to this file:
>>>>> 
>>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>>> 
>>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>>> 
>>>>> So my idea would be to create a class in clinical-pipeline
>>>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>>> 
>>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>>> AggregatePlaintextUMLSProcessor.xml
>>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>>> etc., every component in ctakes
>>>>> 
>>>>> We could then potentially merge our entry points -- I think Abraham's
>>>>> experience points out that this is currently confusing, as well as
>>>>> probably not implemented optimally. For example, either
>>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>>> our xml descriptors too unless people feel strongly about keeping those
>>>>> around.
>>>>> 
>>>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>>> 
>>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>>> 
>>>>> 
>>>>> I think this would actually be pretty easy to implement, but hoping to
>>>>> get some feedback on whether this is a good direction.
>>>>> 
>>>>> Tim
>>> 
>> 
>> -- 
>> Tim Miller
>> Instructor
>> Boston Children's Hospital and Harvard Medical School
>> timothy.miller@childrens.harvard.edu
>> 617-919-1223
>> 
>

Re: suggestion for default pipelines

Posted by Richard Eckart de Castilho <re...@apache.org>.

Maybe that choice should be left to the user. Factory methods could be marked using
a special Java annotation that is scanned for at build time, e.g. something along the
lines of

@DescriptionGenerator
static AnalysisEngineDescription getSentenceDetectorAnnotator()

-- Richard

On 24.04.2014, at 23:41, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:

> Any preference for separate factory classes:
> 
> class SentenceDetectorAnnotatorFactory:
> 
> static AnalysisEngineDescription getSentenceDetectorAnnotator()
> 
> VS
> 
> static methods added to primitive annotators:
> 
> class SentenceDetector (existing)
> 
> static AnalysisEngineDescription getSentenceDetectorAnnotator()
> 
> ?
> 
> The former can clutter up the class space while the latter extends the
> length of classes, especially if there are multiple versions
> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
> getMeshDictionaryAnnotator(), etc.)
> 
> Tim
> 
> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>> It would be nice if uimaFIT provided a Maven plugin to automatically
>> generate descriptors for aggregates. Maybe if we come up with a 
>> convention for factories, e.g. a "class with static methods that do
>> not take any parameters and that return descriptors", or "methods
>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>> it should be possible to implement such a Maven plugin.
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>> 
>>> +1. And note that once you have a descriptor, you can generate the
>>> XML, so we should arrange to replace the current XML descriptors with
>>> ones generated automatically from the uimaFIT code. That should reduce
>>> some synchronization problems when the Java code was changed but the
>>> XML descriptor was not.
>>> 
>>> Steve
>>> 
>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>> <Ti...@childrens.harvard.edu> wrote:
>>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>> builders in the temporal project that maybe could be moved into
>>>> clinical-pipeline. For example, look to this file:
>>>> 
>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>> 
>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>> 
>>>> So my idea would be to create a class in clinical-pipeline
>>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>> 
>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>> AggregatePlaintextUMLSProcessor.xml
>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>> etc., every component in ctakes
>>>> 
>>>> We could then potentially merge our entry points -- I think Abraham's
>>>> experience points out that this is currently confusing, as well as
>>>> probably not implemented optimally. For example, either
>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>> our xml descriptors too unless people feel strongly about keeping those
>>>> around.
>>>> 
>>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>> 
>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>> 
>>>> 
>>>> I think this would actually be pretty easy to implement, but hoping to
>>>> get some feedback on whether this is a good direction.
>>>> 
>>>> Tim
>> 
> 
> -- 
> Tim Miller
> Instructor
> Boston Children's Hospital and Harvard Medical School
> timothy.miller@childrens.harvard.edu
> 617-919-1223
>

Re: suggestion for default pipelines

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Any preference for separate factory classes:

class SentenceDetectorAnnotatorFactory:

static AnalysisEngineDescription getSentenceDetectorAnnotator()

VS

static methods added to primitive annotators:

class SentenceDetector (existing)

static AnalysisEngineDescription getSentenceDetectorAnnotator()

?

The former can clutter up the class space while the latter extends the
length of classes, especially if there are multiple versions
(getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
getMeshDictionaryAnnotator(), etc.)

Tim

On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
> It would be nice if uimaFIT provided a Maven plugin to automatically
> generate descriptors for aggregates. Maybe if we come up with a 
> convention for factories, e.g. a "class with static methods that do
> not take any parameters and that return descriptors", or "methods
> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
> it should be possible to implement such a Maven plugin.
>
> Cheers,
>
> -- Richard
>
> On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:
>
>> +1. And note that once you have a descriptor, you can generate the
>> XML, so we should arrange to replace the current XML descriptors with
>> ones generated automatically from the uimaFIT code. That should reduce
>> some synchronization problems when the Java code was changed but the
>> XML descriptor was not.
>>
>> Steve
>>
>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>> <Ti...@childrens.harvard.edu> wrote:
>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>> builders in the temporal project that maybe could be moved into
>>> clinical-pipeline. For example, look to this file:
>>>
>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>
>>> with the static methods getPreprocessorAggregateBuilder() and
>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>
>>> So my idea would be to create a class in clinical-pipeline
>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>
>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>> AggregatePlaintextUMLSProcessor.xml
>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>> etc., every component in ctakes
>>>
>>> We could then potentially merge our entry points -- I think Abraham's
>>> experience points out that this is currently confusing, as well as
>>> probably not implemented optimally. For example, either
>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>> our xml descriptors too unless people feel strongly about keeping those
>>> around.
>>>
>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>
>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>
>>>
>>> I think this would actually be pretty easy to implement, but hoping to
>>> get some feedback on whether this is a good direction.
>>>
>>> Tim
>

-- 
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy.miller@childrens.harvard.edu
617-919-1223

Re: suggestion for default pipelines

Posted by Richard Eckart de Castilho <re...@apache.org>.

It would be nice if uimaFIT provided a Maven plugin to automatically
generate descriptors for aggregates. Maybe if we come up with a 
convention for factories, e.g. a "class with static methods that do
not take any parameters and that return descriptors", or "methods
that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
it should be possible to implement such a Maven plugin.

Cheers,

-- Richard

On 16.04.2014, at 05:21, Steven Bethard <st...@gmail.com> wrote:

> +1. And note that once you have a descriptor, you can generate the
> XML, so we should arrange to replace the current XML descriptors with
> ones generated automatically from the uimaFIT code. That should reduce
> some synchronization problems when the Java code was changed but the
> XML descriptor was not.
> 
> Steve
> 
> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
> <Ti...@childrens.harvard.edu> wrote:
>> The discussion in the other thread with Abraham Tom gave me an idea I
>> wanted to float to the list. We have been using some UIMAFit pipeline
>> builders in the temporal project that maybe could be moved into
>> clinical-pipeline. For example, look to this file:
>> 
>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>> 
>> with the static methods getPreprocessorAggregateBuilder() and
>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>> 
>> So my idea would be to create a class in clinical-pipeline
>> (CTakesPipelines) with static methods for some standard pipelines (to
>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>> 
>> getStandardUMLSPipeline()  -- builds pipeline currently in
>> AggregatePlaintextUMLSProcessor.xml
>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>> etc., every component in ctakes
>> 
>> We could then potentially merge our entry points -- I think Abraham's
>> experience points out that this is currently confusing, as well as
>> probably not implemented optimally. For example, either
>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>> our xml descriptors too unless people feel strongly about keeping those
>> around.
>> 
>> Another benefit is that the cTAKES API is then trivial -- if you import
>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>> 
>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>> 
>> 
>> I think this would actually be pretty easy to implement, but hoping to
>> get some feedback on whether this is a good direction.
>> 
>> Tim

Re: suggestion for default pipelines

Posted by Steven Bethard <st...@gmail.com>.

+1. And note that once you have a descriptor, you can generate the
XML, so we should arrange to replace the current XML descriptors with
ones generated automatically from the uimaFIT code. That should reduce
some synchronization problems when the Java code was changed but the
XML descriptor was not.

Steve

On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
<Ti...@childrens.harvard.edu> wrote:
> The discussion in the other thread with Abraham Tom gave me an idea I
> wanted to float to the list. We have been using some UIMAFit pipeline
> builders in the temporal project that maybe could be moved into
> clinical-pipeline. For example, look to this file:
>
> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>
> with the static methods getPreprocessorAggregateBuilder() and
> getLightweightPreprocessorAggregateBuilder()   [no umls].
>
> So my idea would be to create a class in clinical-pipeline
> (CTakesPipelines) with static methods for some standard pipelines (to
> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>
> getStandardUMLSPipeline()  -- builds pipeline currently in
> AggregatePlaintextUMLSProcessor.xml
> getFullPipeline() -- same as above but with SRL, constituency parsing,
> etc., every component in ctakes
>
> We could then potentially merge our entry points -- I think Abraham's
> experience points out that this is currently confusing, as well as
> probably not implemented optimally. For example, either
> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
> our xml descriptors too unless people feel strongly about keeping those
> around.
>
> Another benefit is that the cTAKES API is then trivial -- if you import
> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>
> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>
>
> I think this would actually be pretty easy to implement, but hoping to
> get some feedback on whether this is a good direction.
>
> Tim
>
>
>

RE: suggestion for default pipelines

Posted by "Masanz, James J." <Ma...@mayo.edu>.

+1

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:05 AM
To: dev@ctakes.apache.org
Subject: RE: suggestion for default pipelines

+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is the way to go until a replacement comes along.  

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());

I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim

RE: suggestion for default pipelines

Posted by Abraham Tom <at...@practicefusion.com>.

+1



Best regards,

Abraham Tom
____________________________
Abraham Tom
Data Warehouse Engineer
415.757.4674 (p) | 415.356.0950 (f)
atom@practicefusion.com
http://www.practicefusion.com
www.facebook.com/practicefusion

The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or contact Practice Fusion at 415.346.7700 ext 4 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 7:05 AM
To: dev@ctakes.apache.org
Subject: RE: suggestion for default pipelines

+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is the way to go until a replacement comes along.  



-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the list. We have been using some UIMAFit pipeline builders in the temporal project that maybe could be moved into clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's experience points out that this is currently confusing, as well as probably not implemented optimally. For example, either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people feel strongly about keeping those around.

Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to get some feedback on whether this is a good direction.

Tim

RE: suggestion for default pipelines

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is the way to go until a replacement comes along.  

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());

I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim