You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by swirl <sw...@yahoo.com> on 2013/05/07 03:47:24 UTC
Re: How to create and use a repository for UIMA annotators?
Richard Eckart de Castilho <ec...@...> writes:
>
> Hello Greg,
>
> > It's sort of a "maven-like" model (i.e. when using a Nexus server). Or
maybe I should just actually use
> maven and nexus?
> >
> > Has anyone out there tried to create a "UIMA Repository" that can be
directly referenced from a component
> descriptor file? How did you make it work?
>
> We consider ourselves to have a "UIMA Repository" based on Maven - cf.
DKPro Core http://code.google.com/p/dkpro-core-asl/
>
> I would like to point out that we have largely abandonded static UIMA
descriptors (except type descriptors).
>
> We feel very comfortable programming on the Java level, dynamically
creating descriptors using uimaFIT
> and running our pipelines directly from within Java (no CPE GUI or such).
> For this scenario, Maven works like a charm for us. We do not even worry
too much about type systems, because
> we have packaged their XML descriptors and JCas
> wrappers in JARs as well and can simply add them as Maven dependencies. We
use uimaFIT's automatic type
> system detection feature to dynamically construct a
> global type system description from all type system description files that
could be found in a
> well-defined location in the classpath (that is, in the afore
> mentioned JARs). A short example:
>
> * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for
TextReader)
> * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for
BreakIteratorSegmenter)
> * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-
asl (for DictionaryAnnotator)
> * dependency on uimaFIT automatically added (for CASDumpWriter)
> * dependencies on type systems and JCas wrappers automatically added by
Maven
>
> Then we can immediately assemble and run a pipeline:
>
> CollectionReader reader = createCollectionReader(TextReader.class,
> TextReader.PARAM_PATH, "src/test/resources/text",
> TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-
]broken.txt" },
> TextReader.PARAM_LANGUAGE, "en");
>
> AnalysisEngine tokenizer =
createPrimitive(BreakIteratorSegmenter.class);
>
> AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
> DictionaryAnnotator.PARAM_PHRASE_FILE,
"src/test/resources/dictionaries/names.txt",
> DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());
>
> AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
> CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
>
> SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
>
> Notice that no line references a type system whatsoever. This is because
we let uimaFIT automatically scan
> the classpath and simply make all
> types it finds available to every created component.
>
> Our approach seems to work great for our researchers to assemble and run
pipelines on a single machine. We do
> currently not scale out UIMA.
>
> Cheers,
>
> Richard
>
Hi Richard,
Would you mind showing me how the uimafit is able to "dynamically construct
a global type system description from all type system description files that
could be found in a well-defined location in the classpath".
Do you rely on using the umiafit/types.txt file?
If so, how you specify it such that it is able to pick up the type
description files in the classpath? I looked into the
de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type
description file inside the JAR itself.
Re: How to create and use a repository for UIMA annotators?
Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Am 07.05.2013 um 03:47 schrieb swirl <sw...@yahoo.com>:
> Hi Richard,
> Would you mind showing me how the uimafit is able to "dynamically construct
> a global type system description from all type system description files that
> could be found in a well-defined location in the classpath".
>
> Do you rely on using the umiafit/types.txt file?
> If so, how you specify it such that it is able to pick up the type
> description files in the classpath? I looked into the
> de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type
> description file inside the JAR itself.
Check out
https://code.google.com/p/uimafit/wiki/TypeDescriptorDetection
Cheers,
-- Richard
AW: How to create and use a repository for UIMA annotators?
Posted by Ar...@bka.bund.de.
Hi,
In my opinion, the best way to do it, is to use an empty type system with the collection reader. You can create one with TypeSystemDescriptionFactory.createTypeSystemDescription(). If one of your annotators needs a type system, add it there, e. g. AnalysisEngineFactory.createPrimitiveDescription(Annotator.class, typeSystem, ...). As one of the last component of your pipeline use a consumer, which writes the overall type system description to file. The core of the consumer is
public final void process(final CAS cas) throws AnalysisEngineProcessException {
try {
TypeSystemUtil.typeSystem2TypeSystemDescription(cas.getTypeSystem()).toXML(new FileOutputStream(new File(mOutputFilePathString)));
} catch ...
}
}
Cheers,
Armin
-----Ursprüngliche Nachricht-----
Von: swirl [mailto:swirlobt@yahoo.com]
Gesendet: Dienstag, 7. Mai 2013 03:47
An: user@uima.apache.org
Betreff: Re: How to create and use a repository for UIMA annotators?
Richard Eckart de Castilho <ec...@...> writes:
>
> Hello Greg,
>
> > It's sort of a "maven-like" model (i.e. when using a Nexus server).
> > Or
maybe I should just actually use
> maven and nexus?
> >
> > Has anyone out there tried to create a "UIMA Repository" that can be
directly referenced from a component
> descriptor file? How did you make it work?
>
> We consider ourselves to have a "UIMA Repository" based on Maven - cf.
DKPro Core http://code.google.com/p/dkpro-core-asl/
>
> I would like to point out that we have largely abandonded static UIMA
descriptors (except type descriptors).
>
> We feel very comfortable programming on the Java level, dynamically
creating descriptors using uimaFIT
> and running our pipelines directly from within Java (no CPE GUI or such).
> For this scenario, Maven works like a charm for us. We do not even
> worry
too much about type systems, because
> we have packaged their XML descriptors and JCas wrappers in JARs as
> well and can simply add them as Maven dependencies. We
use uimaFIT's automatic type
> system detection feature to dynamically construct a global type system
> description from all type system description files that
could be found in a
> well-defined location in the classpath (that is, in the afore
> mentioned JARs). A short example:
>
> * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for
TextReader)
> * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for
BreakIteratorSegmenter)
> * add dependency on
> de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-
asl (for DictionaryAnnotator)
> * dependency on uimaFIT automatically added (for CASDumpWriter)
> * dependencies on type systems and JCas wrappers automatically added
> by
Maven
>
> Then we can immediately assemble and run a pipeline:
>
> CollectionReader reader = createCollectionReader(TextReader.class,
> TextReader.PARAM_PATH, "src/test/resources/text",
> TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-
]broken.txt" },
> TextReader.PARAM_LANGUAGE, "en");
>
> AnalysisEngine tokenizer =
createPrimitive(BreakIteratorSegmenter.class);
>
> AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
> DictionaryAnnotator.PARAM_PHRASE_FILE,
"src/test/resources/dictionaries/names.txt",
> DictionaryAnnotator.PARAM_ANNOTATION_TYPE,
> Name.class.getName());
>
> AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
> CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
>
> SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
>
> Notice that no line references a type system whatsoever. This is
> because
we let uimaFIT automatically scan
> the classpath and simply make all
> types it finds available to every created component.
>
> Our approach seems to work great for our researchers to assemble and
> run
pipelines on a single machine. We do
> currently not scale out UIMA.
>
> Cheers,
>
> Richard
>
Hi Richard,
Would you mind showing me how the uimafit is able to "dynamically construct a global type system description from all type system description files that could be found in a well-defined location in the classpath".
Do you rely on using the umiafit/types.txt file?
If so, how you specify it such that it is able to pick up the type description files in the classpath? I looked into the de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type description file inside the JAR itself.