You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Adam Lally <al...@alum.rpi.edu> on 2011/07/22 18:21:48 UTC

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

On Wed, Apr 6, 2011 at 8:17 AM, Eddie Epstein <ea...@gmail.com> wrote:

> On Tue, Apr 5, 2011 at 5:57 PM, Richard Eckart de Castilho
> <ec...@tk.informatik.tu-darmstadt.de> wrote:
> > It sounds like this could easily be implemented as a post-processing step
> on an (aggregate) descriptor.
> >
> > 1) load aggregate (descriptor) - AnalysisEngineDescriptor desc = ...
> > 2) load properties - Properties config = ...
> > 3) set configuration parameters in descriptor from values used in
> properties - applyConfiguration(desc, properties);
> >
> > Such a feature could then be integrated into an execution engine like the
> CPE or UIMA-AS.
> >
> Implementing as a post-processing step does sound right. It would be
> nice if the logic was in the core so that every execution engine could
> just make a single method call. This would also allow an execution
> environment to use separate properties files when instantiating
> multiple AEs, as for example when the UIMA-AS client API does in
> process deployment of services.
>
>
I think I will soon be ready to try implementing something like this
(allowing setting configuration parameters from properties files).  I am
thinking there are a couple of extensions I may want to make in core UIMA.

1) I think one issue with the postprocessing-only apporach is that for
non-string parameters, you cannot in your descriptor write something like:
<value>
  <integer>${myPropertyName}</integer>
</value>
which would cause a parse error.

So I am thinking of adding a new element in the descriptor syntax:
<value>
  <propertyref>myPropertyName</propertyref>
</value>

This could be used for any parameter type.  Also there is no ambiguity in
case someone actually wanted to set their parameter to a string of the form
${....}.

2) It would be good if there was a way for existing UIMA applications
(DocumentAnalyzer, CVD, and also our users' own applications) to support
such descriptors, without having to add an extra line of code in all of them
to resolve the properties.  So, I'm imagining that the property replacement
happens automatically in the core, when the AE is instantiated.  We could
define a system property to specify the configuration file (e.g.
-Dorg.apache.uima.configuration_parameter_file=my.properties).  I would also
have a programmatic approach (like something in the additionalParams map
when instantiating the AE), to support applications that need multiple
configuration properties files in the same JVM.

Thoughts?

 -Adam

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Marshall Schor <ms...@schor.com>.

oops, I see you already answered this.  I failed to scroll the email to the
bottom and see that.

Sorry about that.

-Marshall

On 7/22/2011 4:50 PM, Marshall Schor wrote:
> What are the arguments/reasons to prefer the ${xxx} style? 
>
> One I can think of is that it makes it explicit in the descriptor what value is
> being potentially overridden by a top level properties file.  Unless the
> descriptor was altered to include this, it would prevent a top level properties
> file from overriding.
>
> Are there other reasons to prefer this style over the other one?
>
> -Marshall
>
> On 7/22/2011 2:44 PM, Adam Lally wrote:
>> On Fri, Jul 22, 2011 at 2:09 PM, Marshall Schor <ms...@schor.com> wrote:
>>
>>> The proposal to add ${xxxx} kinds of things into descriptors is one way to
>>> specify the parameter being overridden.
>>>
>>> Another way is to have the properties file, itself, contain a direct
>>> specification of the parameter being overridden, perhaps using the same
>>> syntax
>>> we already use:
>>>
>>>  keyname1/keyname2/.../keynamen/propertyName = value
>>>
>>> where the keyname1 ... n are the key names of the delegates in the
>>> aggregates.
>>>
>>> There are pluses and minuses for both approaches.  Here are some:
>>>
>>> The same annotator component, using the same XML descriptor, might be
>>> included
>>> in two different places in a big aggregate.  With the ${xxx} approach, the
>>> override would affect both, and it would not be possible to have different
>>> overrides for each one (unless of course, you changed that component's
>>> descriptor to, for example, use ${xxx} in one instance, and ${yyy} in the
>>> other).
>>>
>> It would still be possible to use the existing UIMA parameter override
>> mechanism in this case.  In the aggregate that imports these descriptors, we
>> can introduce an overriding parameter and sets its value to ${xxx} in one
>> case and ${yyy} in another.
>>
>>
>>> Using the keyname1/keyname2/... approach makes it pretty explicit, in one
>>> spot,
>>> what is being overridden.  The ${xxx} approach creates a bit of an
>>> indirection -
>>> it would be necessary to grep through all the xml files, whereever they may
>>> be
>>> located, to see what was being overridden.
>>>
>>>
>> In my use case I have a requirement (which I should have mentioned, sorry)
>> that I want to be able to have a single properties file that I can use with
>> multiple different aggregates.
>>
>> Consider the case where I have one top level aggregate, AE1, that contains
>> two components, AE2 and AE3, themselves aggregates.  Right now AE1 has all
>> my parameter settings, via parameter overrides.  But sometimes, I want to
>> separately run AE2 or AE3 (for debugging, or for service deployment).  In
>> order to do that now I have to manually make sure all the parameter
>> overrides that I set in AE1 are set separately in AE2 and AE3, which makes
>> things very difficult to maintain.
>>
>> So, I thinking I could have one place where I keep all my settings (like a
>> properties file, though the exact format is not too important), and I could
>> use that same file whether I run AE1, AE2, or AE3.  If the configuration
>> file has to specify the exact path down from the top of the aggregate, this
>> would not be possible.
>>
>> -Adam
>>

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Marshall Schor <ms...@schor.com>.

What are the arguments/reasons to prefer the ${xxx} style? 

One I can think of is that it makes it explicit in the descriptor what value is
being potentially overridden by a top level properties file.  Unless the
descriptor was altered to include this, it would prevent a top level properties
file from overriding.

Are there other reasons to prefer this style over the other one?

-Marshall

On 7/22/2011 2:44 PM, Adam Lally wrote:
> On Fri, Jul 22, 2011 at 2:09 PM, Marshall Schor <ms...@schor.com> wrote:
>
>> The proposal to add ${xxxx} kinds of things into descriptors is one way to
>> specify the parameter being overridden.
>>
>> Another way is to have the properties file, itself, contain a direct
>> specification of the parameter being overridden, perhaps using the same
>> syntax
>> we already use:
>>
>>  keyname1/keyname2/.../keynamen/propertyName = value
>>
>> where the keyname1 ... n are the key names of the delegates in the
>> aggregates.
>>
>> There are pluses and minuses for both approaches.  Here are some:
>>
>> The same annotator component, using the same XML descriptor, might be
>> included
>> in two different places in a big aggregate.  With the ${xxx} approach, the
>> override would affect both, and it would not be possible to have different
>> overrides for each one (unless of course, you changed that component's
>> descriptor to, for example, use ${xxx} in one instance, and ${yyy} in the
>> other).
>>
> It would still be possible to use the existing UIMA parameter override
> mechanism in this case.  In the aggregate that imports these descriptors, we
> can introduce an overriding parameter and sets its value to ${xxx} in one
> case and ${yyy} in another.
>
>
>> Using the keyname1/keyname2/... approach makes it pretty explicit, in one
>> spot,
>> what is being overridden.  The ${xxx} approach creates a bit of an
>> indirection -
>> it would be necessary to grep through all the xml files, whereever they may
>> be
>> located, to see what was being overridden.
>>
>>
> In my use case I have a requirement (which I should have mentioned, sorry)
> that I want to be able to have a single properties file that I can use with
> multiple different aggregates.
>
> Consider the case where I have one top level aggregate, AE1, that contains
> two components, AE2 and AE3, themselves aggregates.  Right now AE1 has all
> my parameter settings, via parameter overrides.  But sometimes, I want to
> separately run AE2 or AE3 (for debugging, or for service deployment).  In
> order to do that now I have to manually make sure all the parameter
> overrides that I set in AE1 are set separately in AE2 and AE3, which makes
> things very difficult to maintain.
>
> So, I thinking I could have one place where I keep all my settings (like a
> properties file, though the exact format is not too important), and I could
> use that same file whether I run AE1, AE2, or AE3.  If the configuration
> file has to specify the exact path down from the top of the aggregate, this
> would not be possible.
>
> -Adam
>

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Adam Lally <al...@alum.rpi.edu>.

On Fri, Jul 22, 2011 at 2:09 PM, Marshall Schor <ms...@schor.com> wrote:

> The proposal to add ${xxxx} kinds of things into descriptors is one way to
> specify the parameter being overridden.
>
> Another way is to have the properties file, itself, contain a direct
> specification of the parameter being overridden, perhaps using the same
> syntax
> we already use:
>
>  keyname1/keyname2/.../keynamen/propertyName = value
>
> where the keyname1 ... n are the key names of the delegates in the
> aggregates.
>
> There are pluses and minuses for both approaches.  Here are some:
>
> The same annotator component, using the same XML descriptor, might be
> included
> in two different places in a big aggregate.  With the ${xxx} approach, the
> override would affect both, and it would not be possible to have different
> overrides for each one (unless of course, you changed that component's
> descriptor to, for example, use ${xxx} in one instance, and ${yyy} in the
> other).
>

It would still be possible to use the existing UIMA parameter override
mechanism in this case.  In the aggregate that imports these descriptors, we
can introduce an overriding parameter and sets its value to ${xxx} in one
case and ${yyy} in another.

> Using the keyname1/keyname2/... approach makes it pretty explicit, in one
> spot,
> what is being overridden.  The ${xxx} approach creates a bit of an
> indirection -
> it would be necessary to grep through all the xml files, whereever they may
> be
> located, to see what was being overridden.
>
>
In my use case I have a requirement (which I should have mentioned, sorry)
that I want to be able to have a single properties file that I can use with
multiple different aggregates.

Consider the case where I have one top level aggregate, AE1, that contains
two components, AE2 and AE3, themselves aggregates.  Right now AE1 has all
my parameter settings, via parameter overrides.  But sometimes, I want to
separately run AE2 or AE3 (for debugging, or for service deployment).  In
order to do that now I have to manually make sure all the parameter
overrides that I set in AE1 are set separately in AE2 and AE3, which makes
things very difficult to maintain.

So, I thinking I could have one place where I keep all my settings (like a
properties file, though the exact format is not too important), and I could
use that same file whether I run AE1, AE2, or AE3.  If the configuration
file has to specify the exact path down from the top of the aggregate, this
would not be possible.

-Adam

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Marshall Schor <ms...@schor.com>.

The proposal to add ${xxxx} kinds of things into descriptors is one way to
specify the parameter being overridden.

Another way is to have the properties file, itself, contain a direct
specification of the parameter being overridden, perhaps using the same syntax
we already use:

  keyname1/keyname2/.../keynamen/propertyName = value

where the keyname1 ... n are the key names of the delegates in the aggregates.

There are pluses and minuses for both approaches.  Here are some:

The same annotator component, using the same XML descriptor, might be included
in two different places in a big aggregate.  With the ${xxx} approach, the
override would affect both, and it would not be possible to have different
overrides for each one (unless of course, you changed that component's
descriptor to, for example, use ${xxx} in one instance, and ${yyy} in the other).

Using the keyname1/keyname2/... approach makes it pretty explicit, in one spot,
what is being overridden.  The ${xxx} approach creates a bit of an indirection -
it would be necessary to grep through all the xml files, whereever they may be
located, to see what was being overridden.

-Marshall

On 7/22/2011 12:21 PM, Adam Lally wrote:
> On Wed, Apr 6, 2011 at 8:17 AM, Eddie Epstein <ea...@gmail.com> wrote:
>
>> On Tue, Apr 5, 2011 at 5:57 PM, Richard Eckart de Castilho
>> <ec...@tk.informatik.tu-darmstadt.de> wrote:
>>> It sounds like this could easily be implemented as a post-processing step
>> on an (aggregate) descriptor.
>>> 1) load aggregate (descriptor) - AnalysisEngineDescriptor desc = ...
>>> 2) load properties - Properties config = ...
>>> 3) set configuration parameters in descriptor from values used in
>> properties - applyConfiguration(desc, properties);
>>> Such a feature could then be integrated into an execution engine like the
>> CPE or UIMA-AS.
>> Implementing as a post-processing step does sound right. It would be
>> nice if the logic was in the core so that every execution engine could
>> just make a single method call. This would also allow an execution
>> environment to use separate properties files when instantiating
>> multiple AEs, as for example when the UIMA-AS client API does in
>> process deployment of services.
>>
>>
> I think I will soon be ready to try implementing something like this
> (allowing setting configuration parameters from properties files).  I am
> thinking there are a couple of extensions I may want to make in core UIMA.
>
> 1) I think one issue with the postprocessing-only apporach is that for
> non-string parameters, you cannot in your descriptor write something like:
> <value>
>   <integer>${myPropertyName}</integer>
> </value>
> which would cause a parse error.
>
> So I am thinking of adding a new element in the descriptor syntax:
> <value>
>   <propertyref>myPropertyName</propertyref>
> </value>
>
> This could be used for any parameter type.  Also there is no ambiguity in
> case someone actually wanted to set their parameter to a string of the form
> ${....}.
>
> 2) It would be good if there was a way for existing UIMA applications
> (DocumentAnalyzer, CVD, and also our users' own applications) to support
> such descriptors, without having to add an extra line of code in all of them
> to resolve the properties.  So, I'm imagining that the property replacement
> happens automatically in the core, when the AE is instantiated.  We could
> define a system property to specify the configuration file (e.g.
> -Dorg.apache.uima.configuration_parameter_file=my.properties).  I would also
> have a programmatic approach (like something in the additionalParams map
> when instantiating the AE), to support applications that need multiple
> configuration properties files in the same JVM.
>
> Thoughts?
>
>  -Adam
>

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Burn Lewis <bu...@gmail.com>.

Adam reminded me that we don't want these variables expanded in the xml
parser, to allow the unexpanded form to be edited safely with the CDE.  So
the original idea of a new element meaning string-with-variables seems the
best.  But I'd suggest restricting ourselves to just one easily-described
syntax, namely ${variable}

~Burn

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Burn Lewis <bu...@gmail.com>.

<definition><term>oxymoron</term><example>readable
xml</example></definition>

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Thilo Götz <tw...@gmx.de>.

On 07/08/11 00:00, Marshall Schor wrote:
> 
> 
> On 8/4/2011 6:16 PM, Richard Eckart de Castilho wrote:
>> Am 04.08.2011 um 23:27 schrieb Marshall Schor:
>>
>>> Many other languages allow both ${xxx} and $xxx - the latter for cases where the
>>> xxx's are limited to chars + numbers + maybe underscores, dashes, and periods;
>>> the first character not allowed "stops" the parsing of the name.  You still need
>>> the {} form for such things like ${xxx}yyy or for ${x!_($*} (if you want to
>>> allow that as a "name" - property files do, apparently).
>> How about supporting Commons EL [1] or the Spring Expression Language [2] or JEXL [3]?
> 
> All of these are potentially quite complex expression languages, each with their
> own particulars. 
> 
> I think one of the goals of the use cases is to have the "xml" readable by a
> non-expert - someone who's only familiar with XML, perhaps.  Simple string
> substitution I think is enough to give the flexibility needed for the use cases.

Sorry, can't resist: XML readable by a non-expert seems like a
contradiction in terms.

> 
> -Marshall
>>
>> -- Richard 
>>
>> [1] http://commons.apache.org/jexl/reference/syntax.html
>> [2] http://static.springsource.org/spring/docs/3.0.x/reference/expressions.html
>> [3] http://commons.apache.org/el/
>>

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Marshall Schor <ms...@schor.com>.


On 8/4/2011 6:16 PM, Richard Eckart de Castilho wrote:
> Am 04.08.2011 um 23:27 schrieb Marshall Schor:
>
>> Many other languages allow both ${xxx} and $xxx - the latter for cases where the
>> xxx's are limited to chars + numbers + maybe underscores, dashes, and periods;
>> the first character not allowed "stops" the parsing of the name.  You still need
>> the {} form for such things like ${xxx}yyy or for ${x!_($*} (if you want to
>> allow that as a "name" - property files do, apparently).
> How about supporting Commons EL [1] or the Spring Expression Language [2] or JEXL [3]?

All of these are potentially quite complex expression languages, each with their
own particulars. 

I think one of the goals of the use cases is to have the "xml" readable by a
non-expert - someone who's only familiar with XML, perhaps.  Simple string
substitution I think is enough to give the flexibility needed for the use cases.

-Marshall
>
> -- Richard 
>
> [1] http://commons.apache.org/jexl/reference/syntax.html
> [2] http://static.springsource.org/spring/docs/3.0.x/reference/expressions.html
> [3] http://commons.apache.org/el/
>

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Burn Lewis <bu...@gmail.com>.

We already have envVarRef support, e.g.

  <string>
    <envVarRef>MODELS_HOME</envVarRef>/master.ddinf
  </string>

which allows multiple substitutions anywhere in the string.  It is currently
restricted to the System properties but we could let a -D  specified
properties file override or augment that.

Personally I'd like to see a simpler syntax although it would require more
code changes. e.g.

  <string prop="MODELS_HOME">
    MODELS_HOME/master.ddinf
  </string>

Avoids the need for a new element and documenting what characters terminate
the variable, although it is a familiar syntax, e.g.

  <stringWithVariables>
    $MODELS_HOME/master.ddinf
  </stringWithVariables>

~Burn

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Richard Eckart de Castilho <ec...@tk.informatik.tu-darmstadt.de>.

Am 04.08.2011 um 23:27 schrieb Marshall Schor:

> Many other languages allow both ${xxx} and $xxx - the latter for cases where the
> xxx's are limited to chars + numbers + maybe underscores, dashes, and periods;
> the first character not allowed "stops" the parsing of the name.  You still need
> the {} form for such things like ${xxx}yyy or for ${x!_($*} (if you want to
> allow that as a "name" - property files do, apparently).

How about supporting Commons EL [1] or the Spring Expression Language [2] or JEXL [3]?

-- Richard 

[1] http://commons.apache.org/jexl/reference/syntax.html
[2] http://static.springsource.org/spring/docs/3.0.x/reference/expressions.html
[3] http://commons.apache.org/el/

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------

Re: Configuration parameters (was Working on a new API to enable creation of UIMA AS deployment descriptors programmatically)

Posted by Marshall Schor <ms...@schor.com>.

Some more thoughts on the details of the design:

The use of <propertyref> *in addition to* <value> in our descriptor XML seems
good in that it could be defined so that <value> provides a "default" value to
use in case <propertyref> didn't have a setting.

But in this case, it might be good to issue a warning - in case the setting was
accidentally omitted.
(You might also want to have a warning if some properties were defined but never
used).

If the parameter is overridden by a containing aggregate, I assume that override
would still be in force? Or is the intent to allow "reaching in" to contained
delegates and having that override containing aggregate specifications?

A little utility which showed what the actual parameters, in force, for a
particular aggregate + override property file, would be good - something like
maven's help:effective-pom.  This utility could also show what properties were
defined / not used, and what was specified in the properties file as a key, but
not matched in the descriptor.

(From discussion with others) it would be maybe nice to allow a <propertyref>
value to be a string which contains ${xxx}'s in it - so one could write
<propertyref>${baseDirectory}/a/b/${model}</propertyRef> as a value - in other
words, allowing string concatenation and multiple ${xxx}s to be used to make up
the string representation of the value.

Many other languages allow both ${xxx} and $xxx - the latter for cases where the
xxx's are limited to chars + numbers + maybe underscores, dashes, and periods;
the first character not allowed "stops" the parsing of the name.  You still need
the {} form for such things like ${xxx}yyy or for ${x!_($*} (if you want to
allow that as a "name" - property files do, apparently).

-Marshall Schor

On 7/22/2011 12:21 PM, Adam Lally wrote:
> On Wed, Apr 6, 2011 at 8:17 AM, Eddie Epstein <ea...@gmail.com> wrote:
>
>> On Tue, Apr 5, 2011 at 5:57 PM, Richard Eckart de Castilho
>> <ec...@tk.informatik.tu-darmstadt.de> wrote:
>>> It sounds like this could easily be implemented as a post-processing step
>> on an (aggregate) descriptor.
>>> 1) load aggregate (descriptor) - AnalysisEngineDescriptor desc = ...
>>> 2) load properties - Properties config = ...
>>> 3) set configuration parameters in descriptor from values used in
>> properties - applyConfiguration(desc, properties);
>>> Such a feature could then be integrated into an execution engine like the
>> CPE or UIMA-AS.
>> Implementing as a post-processing step does sound right. It would be
>> nice if the logic was in the core so that every execution engine could
>> just make a single method call. This would also allow an execution
>> environment to use separate properties files when instantiating
>> multiple AEs, as for example when the UIMA-AS client API does in
>> process deployment of services.
>>
>>
> I think I will soon be ready to try implementing something like this
> (allowing setting configuration parameters from properties files).  I am
> thinking there are a couple of extensions I may want to make in core UIMA.
>
> 1) I think one issue with the postprocessing-only apporach is that for
> non-string parameters, you cannot in your descriptor write something like:
> <value>
>   <integer>${myPropertyName}</integer>
> </value>
> which would cause a parse error.
>
> So I am thinking of adding a new element in the descriptor syntax:
> <value>
>   <propertyref>myPropertyName</propertyref>
> </value>
>
> This could be used for any parameter type.  Also there is no ambiguity in
> case someone actually wanted to set their parameter to a string of the form
> ${....}.
>
> 2) It would be good if there was a way for existing UIMA applications
> (DocumentAnalyzer, CVD, and also our users' own applications) to support
> such descriptors, without having to add an extra line of code in all of them
> to resolve the properties.  So, I'm imagining that the property replacement
> happens automatically in the core, when the AE is instantiated.  We could
> define a system property to specify the configuration file (e.g.
> -Dorg.apache.uima.configuration_parameter_file=my.properties).  I would also
> have a programmatic approach (like something in the additionalParams map
> when instantiating the AE), to support applications that need multiple
> configuration properties files in the same JVM.
>
> Thoughts?
>
>  -Adam
>