You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by swirl <sw...@yahoo.com> on 2013/05/07 03:47:24 UTC

Re: How to create and use a repository for UIMA annotators?

Richard Eckart de Castilho <ec...@...> writes:

> 
> Hello Greg,
> 
> > It's sort of a "maven-like" model (i.e. when using a Nexus server).  Or 
maybe I should just actually use
> maven and nexus?
> > 
> > Has anyone out there tried to create a "UIMA Repository" that can be 
directly referenced from a component
> descriptor file?  How did you make it work?
> 
> We consider ourselves to have a "UIMA Repository" based on Maven - cf. 
DKPro Core http://code.google.com/p/dkpro-core-asl/
> 
> I would like to point out that we have largely abandonded static UIMA 
descriptors (except type descriptors).
> 
> We feel very comfortable programming on the Java level, dynamically 
creating descriptors using uimaFIT
> and running our pipelines directly from within Java (no CPE GUI or such).
> For this scenario, Maven works like a charm for us. We do not even worry 
too much about type systems, because
> we have packaged their XML descriptors and JCas
> wrappers in JARs as well and can simply add them as Maven dependencies. We 
use uimaFIT's automatic type
> system detection feature to dynamically construct a
> global type system description from all type system description files that 
could be found in a
> well-defined location in the classpath (that is, in the afore
> mentioned JARs). A short example:
> 
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for 
TextReader)
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for 
BreakIteratorSegmenter)
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-
asl (for DictionaryAnnotator)
>   * dependency on uimaFIT automatically added (for CASDumpWriter)
>   * dependencies on type systems and JCas wrappers automatically added by 
Maven
> 
> Then we can immediately assemble and run a pipeline:
> 
>     CollectionReader reader = createCollectionReader(TextReader.class,
>         TextReader.PARAM_PATH, "src/test/resources/text",
>         TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-
]broken.txt" },
>         TextReader.PARAM_LANGUAGE, "en");
> 
>     AnalysisEngine tokenizer = 
createPrimitive(BreakIteratorSegmenter.class);
> 
>     AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
>         DictionaryAnnotator.PARAM_PHRASE_FILE, 
"src/test/resources/dictionaries/names.txt",
>         DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());
> 
>     AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
>         CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
> 
>     SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
> 
> Notice that no line references a type system whatsoever. This is because 
we let uimaFIT automatically scan
> the classpath and simply make all
> types it finds available to every created component.
> 
> Our approach seems to work great for our researchers to assemble and run 
pipelines on a single machine. We do
> currently not scale out UIMA.
> 
> Cheers,
> 
> Richard
> 


Hi Richard,
Would you mind showing me how the uimafit is able to "dynamically construct 
a global type system description from all type system description files that 
could be found in a well-defined location in the classpath".

Do you rely on using the umiafit/types.txt file?
If so, how you specify it such that it is able to pick up the type 
description files in the classpath? I looked into the 
de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type 
description file inside the JAR itself.

Re: How to create and use a repository for UIMA annotators?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.

Am 07.05.2013 um 03:47 schrieb swirl <sw...@yahoo.com>:

> Hi Richard,
> Would you mind showing me how the uimafit is able to "dynamically construct 
> a global type system description from all type system description files that 
> could be found in a well-defined location in the classpath".
> 
> Do you rely on using the umiafit/types.txt file?
> If so, how you specify it such that it is able to pick up the type 
> description files in the classpath? I looked into the 
> de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type 
> description file inside the JAR itself.

Check out

https://code.google.com/p/uimafit/wiki/TypeDescriptorDetection

Cheers,

-- Richard

AW: How to create and use a repository for UIMA annotators?

Posted by Ar...@bka.bund.de.

Hi,

In my opinion, the best way to do it, is to use an empty type system with the collection reader. You can create one with TypeSystemDescriptionFactory.createTypeSystemDescription(). If one of your annotators needs a type system, add it there, e. g. AnalysisEngineFactory.createPrimitiveDescription(Annotator.class, typeSystem, ...). As one of the last component of your pipeline use a consumer, which writes the overall type system description to file. The core of the consumer is

public final void process(final CAS cas) throws AnalysisEngineProcessException {
	try {
		TypeSystemUtil.typeSystem2TypeSystemDescription(cas.getTypeSystem()).toXML(new FileOutputStream(new File(mOutputFilePathString)));
	} catch ...
	}
}

Cheers,
Armin

-----Ursprüngliche Nachricht-----
Von: swirl [mailto:swirlobt@yahoo.com] 
Gesendet: Dienstag, 7. Mai 2013 03:47
An: user@uima.apache.org
Betreff: Re: How to create and use a repository for UIMA annotators?

Richard Eckart de Castilho <ec...@...> writes:

> 
> Hello Greg,
> 
> > It's sort of a "maven-like" model (i.e. when using a Nexus server).  
> > Or
maybe I should just actually use
> maven and nexus?
> > 
> > Has anyone out there tried to create a "UIMA Repository" that can be
directly referenced from a component
> descriptor file?  How did you make it work?
> 
> We consider ourselves to have a "UIMA Repository" based on Maven - cf. 
DKPro Core http://code.google.com/p/dkpro-core-asl/
> 
> I would like to point out that we have largely abandonded static UIMA
descriptors (except type descriptors).
> 
> We feel very comfortable programming on the Java level, dynamically
creating descriptors using uimaFIT
> and running our pipelines directly from within Java (no CPE GUI or such).
> For this scenario, Maven works like a charm for us. We do not even 
> worry
too much about type systems, because
> we have packaged their XML descriptors and JCas wrappers in JARs as 
> well and can simply add them as Maven dependencies. We
use uimaFIT's automatic type
> system detection feature to dynamically construct a global type system 
> description from all type system description files that
could be found in a
> well-defined location in the classpath (that is, in the afore 
> mentioned JARs). A short example:
> 
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for
TextReader)
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for
BreakIteratorSegmenter)
>   * add dependency on 
> de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-
asl (for DictionaryAnnotator)
>   * dependency on uimaFIT automatically added (for CASDumpWriter)
>   * dependencies on type systems and JCas wrappers automatically added 
> by
Maven
> 
> Then we can immediately assemble and run a pipeline:
> 
>     CollectionReader reader = createCollectionReader(TextReader.class,
>         TextReader.PARAM_PATH, "src/test/resources/text",
>         TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-
]broken.txt" },
>         TextReader.PARAM_LANGUAGE, "en");
> 
>     AnalysisEngine tokenizer =
createPrimitive(BreakIteratorSegmenter.class);
> 
>     AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
>         DictionaryAnnotator.PARAM_PHRASE_FILE,
"src/test/resources/dictionaries/names.txt",
>         DictionaryAnnotator.PARAM_ANNOTATION_TYPE, 
> Name.class.getName());
> 
>     AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
>         CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
> 
>     SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
> 
> Notice that no line references a type system whatsoever. This is 
> because
we let uimaFIT automatically scan
> the classpath and simply make all
> types it finds available to every created component.
> 
> Our approach seems to work great for our researchers to assemble and 
> run
pipelines on a single machine. We do
> currently not scale out UIMA.
> 
> Cheers,
> 
> Richard
> 


Hi Richard,
Would you mind showing me how the uimafit is able to "dynamically construct a global type system description from all type system description files that could be found in a well-defined location in the classpath".

Do you rely on using the umiafit/types.txt file?
If so, how you specify it such that it is able to pick up the type description files in the classpath? I looked into the de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type description file inside the JAR itself.