You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Roberto Franchini <ro...@gmail.com> on 2008/07/09 17:10:05 UTC

Spring factoryBean for producing AE: processors, consumer, readers and PEAR

Hi,
I wrote some components usefull for integrate UIMA-components inside a
Spring framework.
This components are Spring FactoryBeans that are able to produce
CasProcessors/Consumers , CollectionReaders and type systems.
The production can be made "totally programmatically", from descriptor
or a PEAR.
I want to release this components to the community, if it sounds good.
This works starts over code posted by Steven Bethard on this ml.
Thank a lot Steven!

I give some use's examples:

<!-- collection reader -->
	<bean name="cr" class="it.celi.uima.bean.CollectionReaderFactoryBean"
parent="baseAnnotator">
		<property name="componentClass"
value="it.celi.components.collection.RecursiveFileSytemCollectionReader"
/>
		<property name="configurationParameters">
			<map>
				<entry key="application" value="language" />
				<entry key="language" value="it" />
			</map>
		</property>
	</bean>

where baseAnnotator is:
	<bean name="baseAnnotator"
class="it.celi.uima.bean.AbstractUIMAComponentsFactoryBean"
abstract="true">
		<property name="typeSystem" ref="typeSystem" />
	</bean>

	<bean name="typeSystem" class="it.celi.uima.bean.TypeSytemFactoryBean">
		<property name="typeSytemPath"
value="file:../dd4-typeSystem/src/main/resources/CeliTypeSystem.xml"
/>
	</bean>
	

Processor/consumers:

	<bean name="sentenceAnnotator"
class="it.celi.uima.bean.CasProcessorFactoryBean"
parent="baseAnnotator">
		<property name="componentClass"
value="it.celi.annotators.language.SentenceAnnotator" />
		<property name="configurationParameters">
			<map>
				<entry key="abbreviationsFiles" value="abbreviations_*.txt" />
				<entry key="additionalSeparatorsFiles" value="sentenceSeparators_*.txt" />
			</map>
		</property>
	</bean>

	<bean name="xslSerializerCasConsumer"
class="it.celi.uima.bean.CasConsumerFactoryBean"
parent="baseAnnotator">
		<property name="componentClass"
value="it.celi.components.consumer.XslSerializerCasConsumer" />
		<property name="configurationParameters">
			<map>
				<entry key="fileExtension" value=".xml" />
			</map>
		</property>
	</bean>


PEAR files (configuraiton parameters override is not allowed!):

	<bean name="japeAnnotator" class="it.celi.uima.bean.CasProcessorFactoryBean">
		<property name="descriptorPath" value="file:./pears/JapeAnnotator.pear" />
		<property name="redeployPear" value="true"/>

		<property name="configurationParameters">
			<map>
			</map>
		</property>
	</bean>

from descriptor with params override:

	<bean name="japeAnnotator" class="it.celi.uima.bean.CasProcessorFactoryBean">
		<property name="descriptorPath" value="file:./desc/RegExpTokenizer.xml" />
		<property name="configurationParameters">
			<map>
				<entry key="commandsFileName" value="commands_tokenizer_*.xml" />
			</map>
		</property>
	</bean>


A simple use case coul be:

Configuration:

<bean name="cpm" class="org.apache.uima.UIMAFramework"
factory-method="newCollectionProcessingManager">

</bean>

	<bean name="uimaCPM" class="it.celi.uima.engine.CpmUIMAEngine">
		<property name="cpm" ref="cpm" />
		<property name="listeners">
		</property>
		<property name="readers">

			<list>
				<ref bean="rfcr" />
			</list>
		</property>
		<property name="processors">
			<list>
				<ref bean="sentenceAnnotator" />
				<ref bean="regExpTokenizer" />
				<ref bean="japeAnnotator" />

			</list>
		</property>
		<property name="consumers">
			<list>
				<ref bean="xslSerializerCasConsumer" />
			</list>
		</property>
	</bean>


The last element is a CPMWrapper that inside do this:

Methods to add consumers and processors to cpm (lists are injected by
conf above):

	private void addAllConsumersToCpm() {
		for (CasConsumer casConsumer : consumers) {
			String name = casConsumer.getProcessingResourceMetaData().getName();
			try {
				logger.info("adding consumer to pipeline::" + name);
				cpm.addCasConsumer(casConsumer);

			} catch (ResourceConfigurationException e) {

				logger.error("unable to add processor  :: " + name, e);
			}
		}

	}

	private void addAllProcessorToCpm() {
		for (CasProcessor casProcessor : processors) {
			String name = casProcessor.getProcessingResourceMetaData().getName();

			try {
				logger.info("adding processor to pipeline::" + name);
				cpm.addCasProcessor(casProcessor);
			} catch (ResourceConfigurationException e) {
				logger.error("unable to add processor  :: " + name, e);
			}
		}

	}

and then in a method can do:

			cpm.setCollectionReader(reader);
			cpm.process();


Some advantage:
-only one simple file to configure a cpm
-easy to inject components
-easy to embed cpm/AE inside existing applications
-can use SpringIDE inside Eclipse
-....whatever?
Disadvantage:
-if you don't use Spring, there's another framework to learn
-you can't use the Eclipse's UIMA plugins to edit/manage descriptors
-Aggregate are not supported programmatically (via descriptors there's
no problem)
-....whatever?

Is it interesting? Let me now.

Roberto
-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: Spring factoryBean for producing AE: processors, consumer, readers and PEAR

Posted by Roberto Franchini <ro...@gmail.com>.
On Thu, Jul 10, 2008 at 3:31 PM, Eddie Epstein <ea...@gmail.com> wrote:
> Look in
> http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/uima-as/uima-as-distr/src/main/scripts
> dd2spring.bat/sh are command line scripts to do the conversion. They point
> at the source used.
>
> One can manually write the Spring context file, but UIMA AS code has many
> assumptions about the beans to be deployed and the parameters to be
> specified. This option is more for debugging and experimentation.
>
> Eddie

thx, I'll take a look and send some feedback about
-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: Spring factoryBean for producing AE: processors, consumer, readers and PEAR

Posted by Eddie Epstein <ea...@gmail.com>.
Look in
http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/uima-as/uima-as-distr/src/main/scripts
dd2spring.bat/sh are command line scripts to do the conversion. They point
at the source used.

One can manually write the Spring context file, but UIMA AS code has many
assumptions about the beans to be deployed and the parameters to be
specified. This option is more for debugging and experimentation.

Eddie

On Thu, Jul 10, 2008 at 9:07 AM, Roberto Franchini <ro...@gmail.com>
wrote:

> On Thu, Jul 10, 2008 at 3:00 PM, Eddie Epstein <ea...@gmail.com>
> wrote:
> > Hi Roberto,
> >
> > FYI, Spring plays a central role in the implementation of UIMA AS. In
> UIMA
> > AS the CPE descriptor is eliminated. The standard UIMA component
> descriptor
> > aggregates all components (including collection readers), and a new
> > "deployment descriptor" is used to specify deployment and error handling
> > options for each UIMA component. At runtime the deployment descriptor is
> > transparently converted into a Spring context file. Users don't have to
> > learn anything about Spring, just about the new deployment descriptor
> which
> > is supported by the Component Descriptor Editor.
> >
> > Eddie
>
> it's sounds good to me. Can you tell me which sources of uima-as are
> involved in this process?
> Maybe we can integrate/improve something in this Spring/uima bridge.
> My interest is in the "Spring context file": is it possible to write
> it directly?
> Thanks
> Roberto
>
> --
> Roberto Franchini
> http://www.celi.it
> http://www.blogmeter.it
> http://www.memesphere.it
> Tel +39-011-6600814
> jabber:ro.franchini@gmail.com <ja...@gmail.com>skype:ro.franchini
>

Re: Spring factoryBean for producing AE: processors, consumer, readers and PEAR

Posted by Roberto Franchini <ro...@gmail.com>.
On Thu, Jul 10, 2008 at 3:00 PM, Eddie Epstein <ea...@gmail.com> wrote:
> Hi Roberto,
>
> FYI, Spring plays a central role in the implementation of UIMA AS. In UIMA
> AS the CPE descriptor is eliminated. The standard UIMA component descriptor
> aggregates all components (including collection readers), and a new
> "deployment descriptor" is used to specify deployment and error handling
> options for each UIMA component. At runtime the deployment descriptor is
> transparently converted into a Spring context file. Users don't have to
> learn anything about Spring, just about the new deployment descriptor which
> is supported by the Component Descriptor Editor.
>
> Eddie

it's sounds good to me. Can you tell me which sources of uima-as are
involved in this process?
Maybe we can integrate/improve something in this Spring/uima bridge.
My interest is in the "Spring context file": is it possible to write
it directly?
Thanks
Roberto

-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: Spring factoryBean for producing AE: processors, consumer, readers and PEAR

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Roberto,

FYI, Spring plays a central role in the implementation of UIMA AS. In UIMA
AS the CPE descriptor is eliminated. The standard UIMA component descriptor
aggregates all components (including collection readers), and a new
"deployment descriptor" is used to specify deployment and error handling
options for each UIMA component. At runtime the deployment descriptor is
transparently converted into a Spring context file. Users don't have to
learn anything about Spring, just about the new deployment descriptor which
is supported by the Component Descriptor Editor.

Eddie


On Wed, Jul 9, 2008 at 11:10 AM, Roberto Franchini <ro...@gmail.com>
wrote:

> Hi,
> I wrote some components usefull for integrate UIMA-components inside a
> Spring framework.
> This components are Spring FactoryBeans that are able to produce
> CasProcessors/Consumers , CollectionReaders and type systems.
> The production can be made "totally programmatically", from descriptor
> or a PEAR.
> I want to release this components to the community, if it sounds good.
> This works starts over code posted by Steven Bethard on this ml.
> Thank a lot Steven!
>
> I give some use's examples:
>
> <!-- collection reader -->
>        <bean name="cr"
> class="it.celi.uima.bean.CollectionReaderFactoryBean"
> parent="baseAnnotator">
>                <property name="componentClass"
> value="it.celi.components.collection.RecursiveFileSytemCollectionReader"
> />
>                <property name="configurationParameters">
>                        <map>
>                                <entry key="application" value="language" />
>                                <entry key="language" value="it" />
>                        </map>
>                </property>
>        </bean>
>
> where baseAnnotator is:
>        <bean name="baseAnnotator"
> class="it.celi.uima.bean.AbstractUIMAComponentsFactoryBean"
> abstract="true">
>                <property name="typeSystem" ref="typeSystem" />
>        </bean>
>
>        <bean name="typeSystem"
> class="it.celi.uima.bean.TypeSytemFactoryBean">
>                <property name="typeSytemPath"
> value="file:../dd4-typeSystem/src/main/resources/CeliTypeSystem.xml"
> />
>        </bean>
>
>
> Processor/consumers:
>
>        <bean name="sentenceAnnotator"
> class="it.celi.uima.bean.CasProcessorFactoryBean"
> parent="baseAnnotator">
>                <property name="componentClass"
> value="it.celi.annotators.language.SentenceAnnotator" />
>                <property name="configurationParameters">
>                        <map>
>                                <entry key="abbreviationsFiles"
> value="abbreviations_*.txt" />
>                                <entry key="additionalSeparatorsFiles"
> value="sentenceSeparators_*.txt" />
>                        </map>
>                </property>
>        </bean>
>
>        <bean name="xslSerializerCasConsumer"
> class="it.celi.uima.bean.CasConsumerFactoryBean"
> parent="baseAnnotator">
>                <property name="componentClass"
> value="it.celi.components.consumer.XslSerializerCasConsumer" />
>                <property name="configurationParameters">
>                        <map>
>                                <entry key="fileExtension" value=".xml" />
>                        </map>
>                </property>
>        </bean>
>
>
> PEAR files (configuraiton parameters override is not allowed!):
>
>        <bean name="japeAnnotator"
> class="it.celi.uima.bean.CasProcessorFactoryBean">
>                <property name="descriptorPath"
> value="file:./pears/JapeAnnotator.pear" />
>                <property name="redeployPear" value="true"/>
>
>                <property name="configurationParameters">
>                        <map>
>                        </map>
>                </property>
>        </bean>
>
> from descriptor with params override:
>
>        <bean name="japeAnnotator"
> class="it.celi.uima.bean.CasProcessorFactoryBean">
>                <property name="descriptorPath"
> value="file:./desc/RegExpTokenizer.xml" />
>                <property name="configurationParameters">
>                        <map>
>                                <entry key="commandsFileName"
> value="commands_tokenizer_*.xml" />
>                        </map>
>                </property>
>        </bean>
>
>
> A simple use case coul be:
>
> Configuration:
>
> <bean name="cpm" class="org.apache.uima.UIMAFramework"
> factory-method="newCollectionProcessingManager">
>
> </bean>
>
>        <bean name="uimaCPM" class="it.celi.uima.engine.CpmUIMAEngine">
>                <property name="cpm" ref="cpm" />
>                <property name="listeners">
>                </property>
>                <property name="readers">
>
>                        <list>
>                                <ref bean="rfcr" />
>                        </list>
>                </property>
>                <property name="processors">
>                        <list>
>                                <ref bean="sentenceAnnotator" />
>                                <ref bean="regExpTokenizer" />
>                                <ref bean="japeAnnotator" />
>
>                        </list>
>                </property>
>                <property name="consumers">
>                        <list>
>                                <ref bean="xslSerializerCasConsumer" />
>                        </list>
>                </property>
>        </bean>
>
>
> The last element is a CPMWrapper that inside do this:
>
> Methods to add consumers and processors to cpm (lists are injected by
> conf above):
>
>        private void addAllConsumersToCpm() {
>                for (CasConsumer casConsumer : consumers) {
>                        String name =
> casConsumer.getProcessingResourceMetaData().getName();
>                        try {
>                                logger.info("adding consumer to pipeline::"
> + name);
>                                cpm.addCasConsumer(casConsumer);
>
>                        } catch (ResourceConfigurationException e) {
>
>                                logger.error("unable to add processor  :: "
> + name, e);
>                        }
>                }
>
>        }
>
>        private void addAllProcessorToCpm() {
>                for (CasProcessor casProcessor : processors) {
>                        String name =
> casProcessor.getProcessingResourceMetaData().getName();
>
>                        try {
>                                logger.info("adding processor to
> pipeline::" + name);
>                                cpm.addCasProcessor(casProcessor);
>                        } catch (ResourceConfigurationException e) {
>                                logger.error("unable to add processor  :: "
> + name, e);
>                        }
>                }
>
>        }
>
> and then in a method can do:
>
>                        cpm.setCollectionReader(reader);
>                        cpm.process();
>
>
> Some advantage:
> -only one simple file to configure a cpm
> -easy to inject components
> -easy to embed cpm/AE inside existing applications
> -can use SpringIDE inside Eclipse
> -....whatever?
> Disadvantage:
> -if you don't use Spring, there's another framework to learn
> -you can't use the Eclipse's UIMA plugins to edit/manage descriptors
> -Aggregate are not supported programmatically (via descriptors there's
> no problem)
> -....whatever?
>
> Is it interesting? Let me now.
>
> Roberto
> --
> Roberto Franchini
> http://www.celi.it
> http://www.blogmeter.it
> http://www.memesphere.it
> Tel +39-011-6600814
> jabber:ro.franchini@gmail.com <ja...@gmail.com>skype:ro.franchini
>