You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Aleksandar Dimitrov <al...@gmail.com> on 2015/01/25 22:59:45 UTC

Using OpenNLP type annotations with UIMAfit

Hi,

The UIMAfit manual (5.1) states that the preferred way to iterate over tokens in
the CAS is the following:

    // JCas version
    for (Token token : JCasUtil.select(jcas, Token.class)) {
      ...
    }

This assumes a Token.class is importable somewhere. But I'm using the OpenNLP
tools, which don't provide such a type. Instead, it seems to be generated at run
time during configuration steps, and is not accessible as a class in the AE (to
my knowledge.)

Additionally, when extending o.a.u.fit.component.JCasAnnotator_ImplBase instead
of o.a.u.component.JCasAnnotator_ImplBase, the method void typeSystemInit(TypeSytem)
is not provided, which makes instatiating the type system the same way OpenNLP
does it rather cumbersome (I generate an empty CAS with the typSystemDescription,
then get its TypSystem and provide the Type and Feature objects from this
TypeSystem instance as UIMAfit configuration parameters before deploying my AE.)

Even then, I can only use the less type-safe method of iterating over
annotations: for (AnnotationFS token : cas.getAnnotationIndex(tokenType)) where
tokenType is the Type instance I acquired from the TypeSystem either during
typeSystemInit() or during configuration with the above hack.

Is there some good way of solving this dilemma while still using UIMAfit's
classes? Obviously, I could go back to using just plain UIMA, but I quite like
UIMAfit's way of dealing with external resources! And I don't like the
type-system-through-cas hack.

I'm using opennlp-uima 1.5.3 and uima-fit 2.1.0, uima 2.6.0.

Cheers,
Aleks

Re: missing initTypeSystem() in UIMAfit's JCasAnnotator_ImplBase (was: Using OpenNLP type annotations with UIMAfit)

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 25.01.2015, at 23:08, Aleksandar Dimitrov <al...@gmail.com> wrote:

> Scrap that, it doesn't work this way. So my question becomes: can I instantiate
> the type system of an UIMAfit annotator the same way OpenNLP does it (working
> around the missing typeSystemInit method somehow) or do I have to give up
> UIMAfit components? (Or maybe there's an alternative? I'd prefer not to have to
> run JCasGen on opennlp's TypeSystem.xml)

Why don't you want to run JCasGen?

OpenNLP uses a CAS-based API that doesn't use the generated JCas classes. But as you have noticed, the CAS-based API is a bit verbose and it is not type-safe. It's like reflections in Java.

Cheers,

-- Richard

missing initTypeSystem() in UIMAfit's JCasAnnotator_ImplBase (was: Using OpenNLP type annotations with UIMAfit)

Posted by Aleksandar Dimitrov <al...@gmail.com>.
On Sun, Jan 25, 2015 at 10:59:45PM +0100, Aleksandar Dimitrov wrote:
> Hi,
> 
> The UIMAfit manual (5.1) states that the preferred way to iterate over tokens in
> the CAS is the following:
> 
>     // JCas version
>     for (Token token : JCasUtil.select(jcas, Token.class)) {
>       ...
>     }
> 
> This assumes a Token.class is importable somewhere. But I'm using the OpenNLP
> tools, which don't provide such a type. Instead, it seems to be generated at run
> time during configuration steps, and is not accessible as a class in the AE (to
> my knowledge.)
> 
> Additionally, when extending o.a.u.fit.component.JCasAnnotator_ImplBase instead
> of o.a.u.component.JCasAnnotator_ImplBase, the method void typeSystemInit(TypeSytem)
> is not provided, which makes instatiating the type system the same way OpenNLP
> does it rather cumbersome (I generate an empty CAS with the typSystemDescription,
> then get its TypSystem and provide the Type and Feature objects from this
> TypeSystem instance as UIMAfit configuration parameters before deploying my AE.)

Scrap that, it doesn't work this way. So my question becomes: can I instantiate
the type system of an UIMAfit annotator the same way OpenNLP does it (working
around the missing typeSystemInit method somehow) or do I have to give up
UIMAfit components? (Or maybe there's an alternative? I'd prefer not to have to
run JCasGen on opennlp's TypeSystem.xml)

Cheers,
Aleks

AW: Using OpenNLP type annotations with UIMAfit

Posted by Ar...@bka.bund.de.
Hi Aleksandar!

For full flexibility I use CAS (not JCas). It's a bit inelegant to use, but you can introduce new types at runtime. Together with UIMAfit it is very nice in JUnit tests. And you can set types (type names) as annotator parameters. For example, you can choose the input and output types of an annotator at runtime. But, as Richard has already mentioned, you are responsible for type checking yourself. Giving a wrong or non-existing type name can cause difficult to find NullPointerExceptions. Check for null and handle exceptions carefully.

Cheers,
Armin

-----Ursprüngliche Nachricht-----
Von: Aleksandar Dimitrov [mailto:aleks.dimitrov@gmail.com] 
Gesendet: Montag, 26. Januar 2015 00:36
An: user@uima.apache.org
Betreff: Re: Using OpenNLP type annotations with UIMAfit [Signatur ungültig]

Hi,

Thanks for taking the time to answer! Your mail helped lift quite some
confusions I had.

> No, it is not generated at runtime. It is generated manually or at build-time,
> e.g. using the maven-jcasgen-plugin.

Right, I was wondering when that happened, and just thought it would be
run-time, since I never saw the familiar <typeName>.java and
<typeName>_Type.java files anywhere.

> OpenNLP aims to be configurable with regards to types. So you must have *some*
> type system that you configure OpenNLP to use, right?

Yes, the OpenNLP type system that ships with the OpenNlp source. Though I think
after our discussion, I might just switch over to my own type system (most of
which will be a verbatim copy.)

> Open it in the Eclipse
> UIMA Type-System Editor and hit the "JCasGen" button - it will generate the
> JCas classes that you can use with uimaFIT JCasUtil.

I'm not using Eclipse, but I believe that maven-jcasgen-plugin would help me
here.

> typeSystemInit() is meant for CAS-based analysis engines, not for JCas-based annotators. 

Oh, that's interesting! I was confused, because its an overridable method for
JCasAnnotator.

> You need the CAS-based API only if you want to configure your components at
> runtime with regards to the annotation types they should use.

So that's the way OpenNLP does it?

> If you can stick to a specific type system, use the JCas-based analysis
> engines.

I can, and I think I shall.

> The CAS-API is not type-safe. Neither is the UIMA-JCas API, but the uimaFIT JCas-API is ;)

Rigt, so I usually dislike having to have the JCas-Gen-generated files lying
around (no real reason, except that it adds another step to the source setup
and compilation process, and I'm not using Eclipse) but I *very* much enjoy
uimaFIT's type safety in AnalysisEngines, so I think I'll bite the bullet.

> You could alternatively use an alternative OpenNLP binding for UIMA, e.g. the
> one provided by DKPro Core [1] (not an Apache project, but one I'm working on
> too).

I've been looking at and reading the DKPro code extensively. It's nice and easy
to read, and has helped me along with finding solutions to some problems
(I've been using it mostly as example code for an uimaFIT setup!)
Just this one I couldn't figure out :-) And sometimes I chose different avenues,
for example I ended up using a URLStreamHandlerFactory instead of dkpro's
ResourceUtils.resolveLocation() to load my resource files from within JARs (or
indeed any other location.)

Cheers,
Aleks

Re: Using OpenNLP type annotations with UIMAfit

Posted by Aleksandar Dimitrov <al...@gmail.com>.
Hi,

Thanks for taking the time to answer! Your mail helped lift quite some
confusions I had.

> No, it is not generated at runtime. It is generated manually or at build-time,
> e.g. using the maven-jcasgen-plugin.

Right, I was wondering when that happened, and just thought it would be
run-time, since I never saw the familiar <typeName>.java and
<typeName>_Type.java files anywhere.

> OpenNLP aims to be configurable with regards to types. So you must have *some*
> type system that you configure OpenNLP to use, right?

Yes, the OpenNLP type system that ships with the OpenNlp source. Though I think
after our discussion, I might just switch over to my own type system (most of
which will be a verbatim copy.)

> Open it in the Eclipse
> UIMA Type-System Editor and hit the "JCasGen" button - it will generate the
> JCas classes that you can use with uimaFIT JCasUtil.

I'm not using Eclipse, but I believe that maven-jcasgen-plugin would help me
here.

> typeSystemInit() is meant for CAS-based analysis engines, not for JCas-based annotators. 

Oh, that's interesting! I was confused, because its an overridable method for
JCasAnnotator.

> You need the CAS-based API only if you want to configure your components at
> runtime with regards to the annotation types they should use.

So that's the way OpenNLP does it?

> If you can stick to a specific type system, use the JCas-based analysis
> engines.

I can, and I think I shall.

> The CAS-API is not type-safe. Neither is the UIMA-JCas API, but the uimaFIT JCas-API is ;)

Rigt, so I usually dislike having to have the JCas-Gen-generated files lying
around (no real reason, except that it adds another step to the source setup
and compilation process, and I'm not using Eclipse) but I *very* much enjoy
uimaFIT's type safety in AnalysisEngines, so I think I'll bite the bullet.

> You could alternatively use an alternative OpenNLP binding for UIMA, e.g. the
> one provided by DKPro Core [1] (not an Apache project, but one I'm working on
> too).

I've been looking at and reading the DKPro code extensively. It's nice and easy
to read, and has helped me along with finding solutions to some problems
(I've been using it mostly as example code for an uimaFIT setup!)
Just this one I couldn't figure out :-) And sometimes I chose different avenues,
for example I ended up using a URLStreamHandlerFactory instead of dkpro's
ResourceUtils.resolveLocation() to load my resource files from within JARs (or
indeed any other location.)

Cheers,
Aleks

Re: Using OpenNLP type annotations with UIMAfit

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi there,

> Hi,
> 
> The UIMAfit manual (5.1) states that the preferred way to iterate over tokens in
> the CAS is the following:
> 
>    // JCas version
>    for (Token token : JCasUtil.select(jcas, Token.class)) {
>      ...
>    }
> 
> This assumes a Token.class is importable somewhere. But I'm using the OpenNLP
> tools, which don't provide such a type. Instead, it seems to be generated at run
> time during configuration steps, and is not accessible as a class in the AE (to
> my knowledge.)

No, it is not generated at runtime. It is generated manually or at build-time, e.g. using the maven-jcasgen-plugin. 

OpenNLP aims to be configurable with regards to types. So you must have *some* type system that you configure OpenNLP to use, right? Open it in the Eclipse UIMA Type-System Editor and hit the "JCasGen" button - it will generate the JCas classes that you can use with uimaFIT JCasUtil.

> Additionally, when extending o.a.u.fit.component.JCasAnnotator_ImplBase instead
> of o.a.u.component.JCasAnnotator_ImplBase, the method void typeSystemInit(TypeSytem)
> is not provided, which makes instatiating the type system the same way OpenNLP
> does it rather cumbersome (I generate an empty CAS with the typSystemDescription,
> then get its TypSystem and provide the Type and Feature objects from this
> TypeSystem instance as UIMAfit configuration parameters before deploying my AE.)

typeSystemInit() is meant for CAS-based analysis engines, not for JCas-based annotators. 
You need the CAS-based API only if you want to configure your components at runtime with regards to the annotation types they should use. If you can stick to a specific type system, use the JCas-based analysis engines.

> Even then, I can only use the less type-safe method of iterating over
> annotations: for (AnnotationFS token : cas.getAnnotationIndex(tokenType)) where
> tokenType is the Type instance I acquired from the TypeSystem either during
> typeSystemInit() or during configuration with the above hack.

The CAS-API is not type-safe. Neither is the UIMA-JCas API, but the uimaFIT JCas-API is ;)

> Is there some good way of solving this dilemma while still using UIMAfit's
> classes? Obviously, I could go back to using just plain UIMA, but I quite like
> UIMAfit's way of dealing with external resources! And I don't like the
> type-system-through-cas hack.

Generate the JCas classes for your type system and you should be fine.

You could alternatively use an alternative OpenNLP binding for UIMA, e.g. the one provided by DKPro Core [1] (not an Apache project, but one I'm working on too).

Cheers,

-- Richard

[1] https://code.google.com/p/dkpro-core-asl/