You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by "Cawley, Tim" <Ti...@dsto.defence.gov.au> on 2009/05/14 02:29:40 UTC

Parameters in uima descriptors

Talk about future releases on the dev list got me thinking about the
future of parameters.

I find a common problem is that parameter names in a descriptor can be
misspelt. While the problem settles down once the code and descriptors
are stable, it still got me thinking about two things. 
 
	1. Could parameter names in a component descriptor be a
reference to a "public static final String" in the implementing class.

	2. Secondly do we need a way of setting acceptable values for a
parameter?  If these acceptable values are accessible both in the
descriptor and in the code, then it probably doesn't mater where they
are defined (i.e. desc or code). If I had a list or enumeration of
acceptable values available then my parameter validation starts getting
neater to write or possibly even disappears totally . Obviously this
could not be a mandatory thing, but there are situations where I see it
being very useful.

What do people think?

Tim


IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914.  If you have received this email in error, you are requested to contact the sender and delete the email.

Re: Parameters in uima descriptors

Posted by Steven Bethard <st...@gmail.com>.

On Wed, Jun 3, 2009 at 6:19 AM, Thilo Goetz <tw...@gmx.de> wrote:
> Steven Bethard wrote:
>> Steven Bethard wrote:
>>> On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>>> I don't know the details of what you did, but it sounds to me like
>>>> you threw many advantages of UIMA (reusability, transparent remotability
>>>> etc.) right out of the window.
>>> I don't see why that would be the case. We're still creating
>>> AnalysisEngineDescriptions, just in Java code, not XML.
>> [snip]
>>> I haven't played around with "transparent remotability", but I can't
>>> see why using Java descriptors instead of XML descriptors would make
>>> that any harder. Maybe you can elaborate?
>>
>> I would like to hear an answer to this if anyone knows. Is there
>> something you can do with an XML descriptor that you can't do with an
>> AnalysisEngineDescription object?
>
> You can't do any of the things that our documentation
> says you can do with a descriptor.  You can't read it,
> for example.

Sorry, what does "read it" mean?

Steve
-- 
Where did you get the preposterous hypothesis?
Did Steve tell you that?
        --- The Hiphopopotamus

Re: Parameters in uima descriptors

Posted by Chris Roeder <ch...@ucdenver.edu>.

Thilo Goetz wrote:
> Steven Bethard wrote:
>   
>> Steven Bethard wrote:
>>     
>>> On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>>       
>>>> I don't know the details of what you did, but it sounds to me like
>>>> you threw many advantages of UIMA (reusability, transparent remotability
>>>> etc.) right out of the window.
>>>>         
>>> I don't see why that would be the case. We're still creating
>>> AnalysisEngineDescriptions, just in Java code, not XML.
>>>       
>> [snip]
>>     
>>> I haven't played around with "transparent remotability", but I can't
>>> see why using Java descriptors instead of XML descriptors would make
>>> that any harder. Maybe you can elaborate?
>>>       
>> I would like to hear an answer to this if anyone knows. Is there
>> something you can do with an XML descriptor that you can't do with an
>> AnalysisEngineDescription object?
>>     
>
> You can't do any of the things that our documentation
> says you can do with a descriptor.  You can't read it,
> for example.
>
>   
The IT/Java industry has grappled with issues concerning building systems
by plugging modules into a central integration platform for quite a 
while now.
UIMA AE's are analgous to either servlets or EJBs (beans) and the CPM 
would be
called a "container" as Tomcat and WebSphere are. The configuration in
these cases started out in XML as in UIMA. Newer releases have some of
the meta data is migrating  to Java Annotations.  The benefit of Java 
annotations
is that the meta-data stored in XML can be  written in the java source 
in a way
that  you don't get the problems associated with having two places that 
define
the name of a parameter for example. The AE descriptor information would be
in the annotations where the CPE gui could read it when creating the CPE
xml descriptor.

While their may be some Resume-Driven-Design that prefers the complexity
of these meta-data-driven approaches to the simplicity of doing it all 
in Java,
I think the motivation comes from the desire for a clear code separation 
between
container and "bean". It's clear to me in the extreme case of a 
commercial vendor supplied
container where you don't have access to the source so you can't code your
pipeline construction in java. While they could provide an 
initialization callback,
that's in effect what they are doing, yet with more restriction.

This is discussed in more detail under either "Inversion of Control" or
"Dependency Injection."

http://www.martinfowler.com/articles/injection.html

-Chris
>> Steve
>>

Re: Parameters in uima descriptors

Posted by Thilo Goetz <tw...@gmx.de>.

Steven Bethard wrote:
> Steven Bethard wrote:
>> On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>> I don't know the details of what you did, but it sounds to me like
>>> you threw many advantages of UIMA (reusability, transparent remotability
>>> etc.) right out of the window.
>> I don't see why that would be the case. We're still creating
>> AnalysisEngineDescriptions, just in Java code, not XML.
> [snip]
>> I haven't played around with "transparent remotability", but I can't
>> see why using Java descriptors instead of XML descriptors would make
>> that any harder. Maybe you can elaborate?
> 
> I would like to hear an answer to this if anyone knows. Is there
> something you can do with an XML descriptor that you can't do with an
> AnalysisEngineDescription object?

You can't do any of the things that our documentation
says you can do with a descriptor.  You can't read it,
for example.

> 
> Steve

Re: Parameters in uima descriptors

Posted by Steven Bethard <st...@gmail.com>.

Steven Bethard wrote:
> On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
>> I don't know the details of what you did, but it sounds to me like
>> you threw many advantages of UIMA (reusability, transparent remotability
>> etc.) right out of the window.
>
> I don't see why that would be the case. We're still creating
> AnalysisEngineDescriptions, just in Java code, not XML.
[snip]
> I haven't played around with "transparent remotability", but I can't
> see why using Java descriptors instead of XML descriptors would make
> that any harder. Maybe you can elaborate?

I would like to hear an answer to this if anyone knows. Is there
something you can do with an XML descriptor that you can't do with an
AnalysisEngineDescription object?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

Re: Parameters in uima descriptors

Posted by Steven Bethard <st...@gmail.com>.

On Thu, May 14, 2009 at 8:56 AM, Thilo Goetz <tw...@gmx.de> wrote:
> Steven Bethard wrote:
>> On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>> I don't know the details of what you did, but it sounds to me like
>>> you threw many advantages of UIMA (reusability, transparent remotability
>>> etc.) right out of the window.
>>
>> I don't see why that would be the case. We're still creating
>> AnalysisEngineDescriptions, just in Java code, not XML. See, for
>> example line 184 of:
>>
>> http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/token/TokenAnnotator.java
>>
>> The components are still just as reusable and pluggable. See, for example:
>>
>> http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/example/pos/BuildTestExamplePosModel.java
>> http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/example/pos/RunExamplePOSAnnotator.java
>>
>> And of course, anyone who wants to use our components through XML
>> descriptors can easily write their own.
>>
>> I haven't played around with "transparent remotability", but I can't
>> see why using Java descriptors instead of XML descriptors would make
>> that any harder. Maybe you can elaborate?
>
> In my opinion, having to write your own descriptor to an annotator
> that you're not familiar with (and don't necessarily want to become
> familiar with) makes reuse harder.

Sorry, I didn't understand this answer to the "transparent
remotability" question. What part of the "transparent remotability" in
UIMA is not accessible through Java code?

I agree that writing a descriptor for a component you're unfamiliar
with could be daunting, but my experience so far with other peoples'
descriptors was that I didn't really understand what the various
parameters were supposed to be without looking at the code anyway. I
contend that either way, this is a documentation problem. ;-)

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

Re: Parameters in uima descriptors

Posted by Thilo Goetz <tw...@gmx.de>.

Steven Bethard wrote:
> On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
>> Steven Bethard wrote:
>>> On Wed, May 13, 2009 at 5:29 PM, Cawley, Tim
>>> <Ti...@dsto.defence.gov.au> wrote:
>>>> I find a common problem is that parameter names in a descriptor can be
>>>> misspelt. While the problem settles down once the code and descriptors
>>>> are stable, it still got me thinking about two things.
>>>>
>>>>        1. Could parameter names in a component descriptor be a
>>>> reference to a "public static final String" in the implementing class.
>>> For what it's worth, in ClearTK we've stopped using analysis engine
>>> and CPE descriptors entirely because of the issues with keeping them
>>> in sync. Replacing them with code means that we can now just refer to
>>> our static final variables directly, and the compiler enforces that we
>>> didn't misspell anything (and fixes references automatically when we
>>> refactor).
>> I don't know the details of what you did, but it sounds to me like
>> you threw many advantages of UIMA (reusability, transparent remotability
>> etc.) right out of the window.
> 
> I don't see why that would be the case. We're still creating
> AnalysisEngineDescriptions, just in Java code, not XML. See, for
> example line 184 of:
> 
> http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/token/TokenAnnotator.java
> 
> The components are still just as reusable and pluggable. See, for example:
> 
> http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/example/pos/BuildTestExamplePosModel.java
> http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/example/pos/RunExamplePOSAnnotator.java
> 
> And of course, anyone who wants to use our components through XML
> descriptors can easily write their own.
> 
> I haven't played around with "transparent remotability", but I can't
> see why using Java descriptors instead of XML descriptors would make
> that any harder. Maybe you can elaborate?

In my opinion, having to write your own descriptor to an annotator
that you're not familiar with (and don't necessarily want to become
familiar with) makes reuse harder.

--Thilo

> 
> Steve

Re: Parameters in uima descriptors

Posted by Steven Bethard <st...@gmail.com>.

On Thu, May 14, 2009 at 6:29 AM, Thilo Goetz <tw...@gmx.de> wrote:
> Steven Bethard wrote:
>> On Wed, May 13, 2009 at 5:29 PM, Cawley, Tim
>> <Ti...@dsto.defence.gov.au> wrote:
>>> I find a common problem is that parameter names in a descriptor can be
>>> misspelt. While the problem settles down once the code and descriptors
>>> are stable, it still got me thinking about two things.
>>>
>>>        1. Could parameter names in a component descriptor be a
>>> reference to a "public static final String" in the implementing class.
>>
>> For what it's worth, in ClearTK we've stopped using analysis engine
>> and CPE descriptors entirely because of the issues with keeping them
>> in sync. Replacing them with code means that we can now just refer to
>> our static final variables directly, and the compiler enforces that we
>> didn't misspell anything (and fixes references automatically when we
>> refactor).
>
> I don't know the details of what you did, but it sounds to me like
> you threw many advantages of UIMA (reusability, transparent remotability
> etc.) right out of the window.

I don't see why that would be the case. We're still creating
AnalysisEngineDescriptions, just in Java code, not XML. See, for
example line 184 of:

http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/token/TokenAnnotator.java

The components are still just as reusable and pluggable. See, for example:

http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/example/pos/BuildTestExamplePosModel.java
http://code.google.com/p/cleartk/source/browse/trunk/src/org/cleartk/example/pos/RunExamplePOSAnnotator.java

And of course, anyone who wants to use our components through XML
descriptors can easily write their own.

I haven't played around with "transparent remotability", but I can't
see why using Java descriptors instead of XML descriptors would make
that any harder. Maybe you can elaborate?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

Re: Parameters in uima descriptors

Posted by Jörn Kottmann <ko...@gmail.com>.

> We might want to take a look at what other frameworks are doing
> these days.  For example, one could imagine UIMA annotations
> (as in Java annotations) in source code, and tooling that
> creates descriptors automatically from those.  We should be able
> to reconcile the need for external descriptors with the desire
> to declare a parameter in just one place.
>   
In  OpenNLP UIMA wrapper the implementation code checks
if mandatory parameters are really declared. These checks are also
performed by UIMA but thats error prone because its really easy to break
the descriptor.

If we now use annotations inside the AEs to declare the parameters the
configurationParameterSettings element in the descriptor is not necessary
anymore and can also not break. In the example of the OpenNLP annotators
it would now be possible to remove the second check since the implementation
code has full control over the parameters and now only have to trust UIMA
and not the user.

Another interesting thing could be to inject the configuration 
parameters directly
into the annotators based on the annotations.

Jörn

Re: Parameters in uima descriptors

Posted by Thilo Goetz <tw...@gmx.de>.

Steven Bethard wrote:
> On Wed, May 13, 2009 at 5:29 PM, Cawley, Tim
> <Ti...@dsto.defence.gov.au> wrote:
>> I find a common problem is that parameter names in a descriptor can be
>> misspelt. While the problem settles down once the code and descriptors
>> are stable, it still got me thinking about two things.
>>
>>        1. Could parameter names in a component descriptor be a
>> reference to a "public static final String" in the implementing class.
> 
> For what it's worth, in ClearTK we've stopped using analysis engine
> and CPE descriptors entirely because of the issues with keeping them
> in sync. Replacing them with code means that we can now just refer to
> our static final variables directly, and the compiler enforces that we
> didn't misspell anything (and fixes references automatically when we
> refactor).

I don't know the details of what you did, but it sounds to me like
you threw many advantages of UIMA (reusability, transparent remotability
etc.) right out of the window.

> 
>>        2. Secondly do we need a way of setting acceptable values for a
>> parameter?  If these acceptable values are accessible both in the
>> descriptor and in the code, then it probably doesn't mater where they
>> are defined (i.e. desc or code). If I had a list or enumeration of
>> acceptable values available then my parameter validation starts getting
>> neater to write or possibly even disappears totally . Obviously this
>> could not be a mandatory thing, but there are situations where I see it
>> being very useful.
>>
>> What do people think?
> 
> Personally, I think trying to fit extra name and type checking to
> descriptors is a lost cause - at some point you'll basically end up
> having to write the Java compiler for XML. ;-)
> 
> Steve

We might want to take a look at what other frameworks are doing
these days.  For example, one could imagine UIMA annotations
(as in Java annotations) in source code, and tooling that
creates descriptors automatically from those.  We should be able
to reconcile the need for external descriptors with the desire
to declare a parameter in just one place.

--Thilo

Re: Parameters in uima descriptors

Posted by Steven Bethard <st...@gmail.com>.

On Wed, May 13, 2009 at 5:29 PM, Cawley, Tim
<Ti...@dsto.defence.gov.au> wrote:
> I find a common problem is that parameter names in a descriptor can be
> misspelt. While the problem settles down once the code and descriptors
> are stable, it still got me thinking about two things.
>
>        1. Could parameter names in a component descriptor be a
> reference to a "public static final String" in the implementing class.

For what it's worth, in ClearTK we've stopped using analysis engine
and CPE descriptors entirely because of the issues with keeping them
in sync. Replacing them with code means that we can now just refer to
our static final variables directly, and the compiler enforces that we
didn't misspell anything (and fixes references automatically when we
refactor).

>        2. Secondly do we need a way of setting acceptable values for a
> parameter?  If these acceptable values are accessible both in the
> descriptor and in the code, then it probably doesn't mater where they
> are defined (i.e. desc or code). If I had a list or enumeration of
> acceptable values available then my parameter validation starts getting
> neater to write or possibly even disappears totally . Obviously this
> could not be a mandatory thing, but there are situations where I see it
> being very useful.
>
> What do people think?

Personally, I think trying to fit extra name and type checking to
descriptors is a lost cause - at some point you'll basically end up
having to write the Java compiler for XML. ;-)

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy