You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Michael Baessler <mb...@michael-baessler.de> on 2007/03/01 02:17:20 UTC

Re: Thoughts on extending FlowController API

Adam Lally wrote:
> On 2/26/07, Adam Lally <al...@alum.rpi.edu> wrote:
>> 3) Notification of errors to allow continuing after a failure.  This
>> would support an action like the current CPM's "continue" action.
>> There would be a new API:
>> Flow.onFailure(String failedAnalysisEngineKey, Throwable failure)
>>
>> If the runtime wanted to continue after a failure, it would call this
>> method on the Flow Controller, and then would go back to calling
>> hasNext/next.  Without this notification, a "continue" action wouldn't
>> make much sense, because a dynamic FlowControlle may make an
>> assumption that the last step it issued completed successfully.
>>
>> Note for #2 and #3 I'm not intending on having the existing framework
>> call these methods, yet.  These Flow Controller extensions are a
>> prerequisite for doing more advanced flow things like parallel flows
>> and error recovery.
>>
>
> Actually as I think about it more I wonder if it would be better if
> when I add the Flow.onFailure() I also change the framework to call
> this method when an error occurs.  The existing FixedFlowControllers
> (such as fixed flow) could just refuse to continue, so the default
> beahvior would be unchanged.  This could be a configuration parameter
> on the FixedFlowController so people could configure their AEs to
> continue after errors.  It seems like this would provide some value so
> may be worth doing, rather than just adding a method that's never
> called.
>
> Possibly we might want to allow the application to control whether the
> FlowController is consulted when an error occurs.  This could be made
> configurable through the additionalParams map when the Aggregate AE is
> constructed.  Then an application could always use "terminate on
> error" mode if desired, regardless of the FlowController being used. 
Sound also be more reasonable for me... but some additional 
comments/questions form my side.

How does it work with the additionalParams map to configure my 
application to 'continue'
or 'terminate' in case of errors. Will it be configurable for each 
analysis engine separately?
I think it would be very useful since the error handling depends on the 
analysis engine. So when using the additionalParams map, does the 
application
have to take care how to get the configuration or will that be part of 
any of the common descriptors?
I think a good place to specify this will the flowConstraints section in 
an aggregate descriptor.

When having a build-in flow, it can look like:
    <flowConstraints>
      <fixedFlow>
        <node errorAction="continue" >ae1</node>
        <node errorAction="terminate">ae2</node>
      </fixedFlow>
    </flowConstraints>

but when having a FlowController plugged in, this section is missing. 
But I wonder why. I think for these flows, the order of the
analysis engines can also be relevant. How does this work currently? I 
think the order of the analysis engine definition is used, right?
Why we don't have a section like:

    <flowConstraints>
      <customFlow> <!-- indicates using the imported FlowController -->
        <node errorAction="continue" >ae1</node>
        <node errorAction="terminate">ae2</node>
      </customFlow>
    </flowConstraints>

to specify the customFlow items in a oder of choice. So it will be easy 
possible to add additional information for each analysis engine to the 
FlowController.

-- Michael

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

On 3/1/07, Adam Lally <al...@alum.rpi.edu> wrote:
> While we're on that topic, since I added a ParallelStep that the Flow
> Controller can return, I wonder if we also want to extend <fixedFlow>
> to allow including a parallel step.  So something like:
>       <fixedFlow>
>         <node errorAction="continue" >ae1</node>
>         <parallel>
>           <node errorAction="terminate">ae2</node>
>           <node errorAction="continue">ae3</node>
>         </parallel>
>       </fixedFlow>
>
> If we don't do this then people who want to configure a parallel flow
> would need a custom flow controller, which seems a little bit like
> overkill.
>
> A concern is that we'd be adding complexity to what used to be a very
> simple concept for the <fixedFlow>, but I think we can hide this from
> most users until they start to care about more complex flow options.
>

On second thought it may be messy to try to extend the Java interface
for FixedFlow while keeping it backwards compatible (it has a
getFixedFlow() method that returns a String array, currently).

We could add another built-in flow type that allows the
errorAction/continueOnError flag as well as parallel steps.  But what
to call it?  complexFlow?

-Adam

Re: Thoughts on extending FlowController API

Posted by Michael Baessler <mb...@michael-baessler.de>.

Adam Lally wrote:
> More thought on the flow controller / flowConstraints topic:
>
> I think there's a fundamental question here as to how the flow ought
> to be specified, now that we've opened things up so that the flow
> specification might take a variety of forms, not just a flat list.
>
> Do we want to:
> (a) support specifying the flow through the FlowController's
> configuration parameters
>
> OR
>
> (b) support extending the <flowConstraints> section of the aggregate
> descriptor with new kinds of flows in addition to <fixedFlow> and
> <capabilityLanguageFlow>.  We might even imagine a <customFlow> that
> could be filled in with arbitrary XML, it being the FlowController's
> job to make sense out of this.
>
>
> An advantage of (a) are that we use the common configuration parameter
> mechanisms we already have, so for example we could use the same GUIs
> we use for setting other parmeters to also set the parameters on the
> flow controller.  (In contrast, if we allow arbitrary XML, the user
> would need an XML editor to be able to edit the flow.)
>
> Advantages of (b): It's closer to what the user already knows.  It can
> be much less verbose than using configuration parameters (which also
> require overrides in the aggregate if the flow is to be specified
> there).  If there's already an XML syntax for the flow it could
> potentially be used directly.  (Although this last could also be done
> with an external resource referring to a separate file containig the
> flow definitions.)

I prefer (b). I think the users already know how to specify a flow and 
it seems to be easier for me to specify
the flow using the flowConstrainst than using configuration parameters.

I'm currently not sure what will be the best way to do this, but we 
should try to get feedback from out users/community what they think 
about this. Do we know anyone that is already using
custom flow controllers?

-- Michael

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

More thought on the flow controller / flowConstraints topic:

I think there's a fundamental question here as to how the flow ought
to be specified, now that we've opened things up so that the flow
specification might take a variety of forms, not just a flat list.

Do we want to:
(a) support specifying the flow through the FlowController's
configuration parameters

OR

(b) support extending the <flowConstraints> section of the aggregate
descriptor with new kinds of flows in addition to <fixedFlow> and
<capabilityLanguageFlow>.  We might even imagine a <customFlow> that
could be filled in with arbitrary XML, it being the FlowController's
job to make sense out of this.


An advantage of (a) are that we use the common configuration parameter
mechanisms we already have, so for example we could use the same GUIs
we use for setting other parmeters to also set the parameters on the
flow controller.  (In contrast, if we allow arbitrary XML, the user
would need an XML editor to be able to edit the flow.)

Advantages of (b): It's closer to what the user already knows.  It can
be much less verbose than using configuration parameters (which also
require overrides in the aggregate if the flow is to be specified
there).  If there's already an XML syntax for the flow it could
potentially be used directly.  (Although this last could also be done
with an external resource referring to a separate file containig the
flow definitions.)

Thoughts?

-Adam

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

On 3/7/07, Thilo Goetz <tw...@gmx.de> wrote:
> Let me make sure I'm following this.  MyFlowController can not know what
> order analysis engines were specified in in the
> delegateAnalysisEngineSpecifiers tag, as those are order independent,
> right?  You can then specify a fixedFlow constraint that the flow
> controller is free to follow or not, also correct?
>

Yes, that's right.

> If that is right, it seems really weird.  All we want is to pass a
> suggestion to the custom flow controller, in the absence of other
> information, choose this order.  Calling that fixedFlow is more than
> just confusing, it's misleading.  You should not be able to specify a
> fixedFlow with a custom flow controller, for why would you need one when
> you have a fixed flow?
>

I agree, it's weird.  What are the other options?

Giving the FC access to the order of declaration of the delegates
would require some work, since that is parsed into a HashMap right
now, and presented to the FC using the Map interface.

We could allow a <customFlow> element.  I'm just not sure that isn't
also confusing - won't people ask how to use this element to specify
how a custom flow works, not just as a "suggestion"?

Or we could just deprecate the whole idea of a flow element and move
towards using the flow controller's configuration parameters to do
everything.  A lot of custom flow controllers are going to need other
parameters anyway.

BTW, consider that the FixedFlowController is also a FlowController.
It implements the FlowController interface and doesn't have any
special powers that other flow controllers don't have.  Same for the
CapabilityLanguageFlowController.  So they need to be able to access
the <fixedFlow> or <capabilityLanguageFlow> elements.

> On the other hand, I'm not even sure what this whole discussion is
> about, so maybe this is all besides the point ;-)
>

The "whole discussion" has migrated.  I started it to discuss some
changes to the FlowController API, but at this point we're discussing
things that have been the way they are since v2.0.

-Adam

Re: Thoughts on extending FlowController API

Posted by Thilo Goetz <tw...@gmx.de>.

Adam Lally wrote:
> On 3/2/07, Michael Baessler <mb...@michael-baessler.de> wrote:
>> Adam Lally wrote:
>> > You can specify both a custom FlowController AND a <fixedFlow> (or
>> > <capabilityLanguageFlow>) element in your descriptor.  Your
>> > FlowController will be called but it can access all of the
>> > AnalysisEngineMetdata from the aggregate descriptor, including the
>> > fixedFlow and capabilityLanguageFlow sections.
>>
>> But how can I specify a order for my custom flow? Is this also possible
>> or do I have to use configuration parameter settings?
>>
>>
> 
> Like this:
> 
> <analysisEngineDescription 
> xmlns="http://uima.apache.org/resourceSpecifier">
> <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
> <primitive>false</primitive>
> 
> <delegateAnalysisEngineSpecifiers>
>  <delegateAnalysisEngine key="a1">
>    <import location="Annotator1.xml"/>
>  </delegateAnalysisEngine>
>  <delegateAnalysisEngine key="a2">
>    <import location="Annotator2.xml"/>
>  </delegateAnalysisEngine>
> </delegateAnalysisEngineSpecifiers>
> <flowController key="MyFlowController">
>  <import location="MyFlowController.xml"/>
> </flowController>
> 
> <analysisEngineMetaData>
> <name>Aggregate with custom flow controller</name>
> <version>1.0</version>
> <vendor>The Apache Software Foundation</vendor>
> 
> <flowConstraints>
>  <fixedFlow>
>    <node>a1</node>
>    <node>a2</node>
>  </fixedFlow>
> </flowConstraints>
> 
> </analysisEngineMetaData>
> </analysisEngineDescription>
> 
> 
> Your custom flow controller in its intialization can query the
> <flowContraints> and find out that the sequence is supposed to be a1,
> a2; and it can act accordingly.
> 
> -Adam

Let me make sure I'm following this.  MyFlowController can not know what 
order analysis engines were specified in in the 
delegateAnalysisEngineSpecifiers tag, as those are order independent, 
right?  You can then specify a fixedFlow constraint that the flow 
controller is free to follow or not, also correct?

If that is right, it seems really weird.   All we want is to pass a 
suggestion to the custom flow controller, in the absence of other 
information, choose this order.  Calling that fixedFlow is more than 
just confusing, it's misleading.  You should not be able to specify a 
fixedFlow with a custom flow controller, for why would you need one when 
you have a fixed flow?

On the other hand, I'm not even sure what this whole discussion is 
about, so maybe this is all besides the point ;-)

--Thilo

Re: Thoughts on extending FlowController API

Posted by Marshall Schor <ms...@schor.com>.

Adam Lally wrote:
> On 3/6/07, Michael Baessler <mb...@michael-baessler.de> wrote:
>> Adam Lally wrote:
>> > Your custom flow controller in its intialization can query the
>> > <flowContraints> and find out that the sequence is supposed to be a1,
>> > a2; and it can act accordingly.
>> But is this not a little bit confusing for our users when using a
>> fixedFlow constraint just to configure a custom flow?
>>
>> I think it would be better to have an additional flowConstraint tag like
>> <customFlow> to specify the order of the custom flow. So that fixedFlow
>> and capabilityLanguageFlow must only be used when
>> these flows are used.
>>
>
> Perhaps it is a little confusing... but we have to find the path of
> least-confusion. :)
>
> To me it is also confusing to define a new tag <customFlow> and then
> say that the only thing you're allowed to have in it is a flat list of
> nodes.  The general idea of a custom flow is that it can make any kind
> of flow decisions it wants.
>
> Others, any opinions on what is the least confusing thing to do here?

We had some user confusion about this also when using the custom flow 
controller - because of
the CDE operations.  In the CDE, there is an option when you add a 
delegate, to also
"automatically" add it to the flow.   And there are buttons to reorder 
this, and also to "remove"
items from the flow.  The CDE builds a <fixed flow> element to contain 
this spec,
when using the custom flow, just like Adam's example in this thread.

So, the model exposed in the CDE is that there is a flow section which 
has  an ordered
list of delegates, and it can be used for any of the built-in flows or 
the custom one.

As Adam said, the flow order part is not required to be used by the 
custom flow controller.

If I had it to do over again, I'd refactor the XML descriptors along 
more orthogonal axis
(putting all the flow sequencing lists into one standard XML element, 
used by all the
flows), but because of a desire to limit user impact, I'm OK with the 
way it is now, especially
because the CDE hides this bit of verboseness.

-Marshall

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

On 3/6/07, Michael Baessler <mb...@michael-baessler.de> wrote:
> Adam Lally wrote:
> > Your custom flow controller in its intialization can query the
> > <flowContraints> and find out that the sequence is supposed to be a1,
> > a2; and it can act accordingly.
> But is this not a little bit confusing for our users when using a
> fixedFlow constraint just to configure a custom flow?
>
> I think it would be better to have an additional flowConstraint tag like
> <customFlow> to specify the order of the custom flow. So that fixedFlow
> and capabilityLanguageFlow must only be used when
> these flows are used.
>

Perhaps it is a little confusing... but we have to find the path of
least-confusion. :)

To me it is also confusing to define a new tag <customFlow> and then
say that the only thing you're allowed to have in it is a flat list of
nodes.  The general idea of a custom flow is that it can make any kind
of flow decisions it wants.

Others, any opinions on what is the least confusing thing to do here?

-Adam

Re: Thoughts on extending FlowController API

Posted by Michael Baessler <mb...@michael-baessler.de>.

Adam Lally wrote:
> On 3/2/07, Michael Baessler <mb...@michael-baessler.de> wrote:
>> But how can I specify a order for my custom flow? Is this also possible
>> or do I have to use configuration parameter settings?
> Like this:
>
> <analysisEngineDescription 
> xmlns="http://uima.apache.org/resourceSpecifier">
> <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
> <primitive>false</primitive>
>
> <delegateAnalysisEngineSpecifiers>
>  <delegateAnalysisEngine key="a1">
>    <import location="Annotator1.xml"/>
>  </delegateAnalysisEngine>
>  <delegateAnalysisEngine key="a2">
>    <import location="Annotator2.xml"/>
>  </delegateAnalysisEngine>
> </delegateAnalysisEngineSpecifiers>
> <flowController key="MyFlowController">
>  <import location="MyFlowController.xml"/>
> </flowController>
>
> <analysisEngineMetaData>
> <name>Aggregate with custom flow controller</name>
> <version>1.0</version>
> <vendor>The Apache Software Foundation</vendor>
>
> <flowConstraints>
>  <fixedFlow>
>    <node>a1</node>
>    <node>a2</node>
>  </fixedFlow>
> </flowConstraints>
>
> </analysisEngineMetaData>
> </analysisEngineDescription>
>
>
> Your custom flow controller in its intialization can query the
> <flowContraints> and find out that the sequence is supposed to be a1,
> a2; and it can act accordingly.
But is this not a little bit confusing for our users when using a 
fixedFlow constraint just to configure a custom flow?

I think it would be better to have an additional flowConstraint tag like 
<customFlow> to specify the order of the custom flow. So that fixedFlow 
and capabilityLanguageFlow must only be used when
these flows are used.

-- Michael

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

On 3/2/07, Michael Baessler <mb...@michael-baessler.de> wrote:
> Adam Lally wrote:
> > You can specify both a custom FlowController AND a <fixedFlow> (or
> > <capabilityLanguageFlow>) element in your descriptor.  Your
> > FlowController will be called but it can access all of the
> > AnalysisEngineMetdata from the aggregate descriptor, including the
> > fixedFlow and capabilityLanguageFlow sections.
>
> But how can I specify a order for my custom flow? Is this also possible
> or do I have to use configuration parameter settings?
>
>

Like this:

<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>false</primitive>

<delegateAnalysisEngineSpecifiers>
  <delegateAnalysisEngine key="a1">
    <import location="Annotator1.xml"/>
  </delegateAnalysisEngine>
  <delegateAnalysisEngine key="a2">
    <import location="Annotator2.xml"/>
  </delegateAnalysisEngine>
</delegateAnalysisEngineSpecifiers>
<flowController key="MyFlowController">
  <import location="MyFlowController.xml"/>
</flowController>

<analysisEngineMetaData>
<name>Aggregate with custom flow controller</name>
<version>1.0</version>
<vendor>The Apache Software Foundation</vendor>

<flowConstraints>
  <fixedFlow>
    <node>a1</node>
    <node>a2</node>
  </fixedFlow>
</flowConstraints>

</analysisEngineMetaData>
</analysisEngineDescription>


Your custom flow controller in its intialization can query the
<flowContraints> and find out that the sequence is supposed to be a1,
a2; and it can act accordingly.

-Adam

Re: Thoughts on extending FlowController API

Posted by Michael Baessler <mb...@michael-baessler.de>.

Adam Lally wrote:
> On 3/2/07, Michael Baessler <mb...@michael-baessler.de> wrote:
>> But how does it work when I would like to implement something like a
>> CapabilityLanguage flow we already have as build-in
>> flow. When I implement this a custom flow I would like to specify the
>> possible order of analysis engines. The custom flow can now decide if
>> all the engines are called but if, please use the order I have
>> specified. So I think in this case, it is not a fixed flow that a use
>> will specify for the custom flow.
>>
>
> You can specify both a custom FlowController AND a <fixedFlow> (or
> <capabilityLanguageFlow>) element in your descriptor.  Your
> FlowController will be called but it can access all of the
> AnalysisEngineMetdata from the aggregate descriptor, including the
> fixedFlow and capabilityLanguageFlow sections.
But how can I specify a order for my custom flow? Is this also possible 
or do I have to use configuration parameter settings?

-- Michael

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

On 3/2/07, Michael Baessler <mb...@michael-baessler.de> wrote:
> But how does it work when I would like to implement something like a
> CapabilityLanguage flow we already have as build-in
> flow. When I implement this a custom flow I would like to specify the
> possible order of analysis engines. The custom flow can now decide if
> all the engines are called but if, please use the order I have
> specified. So I think in this case, it is not a fixed flow that a use
> will specify for the custom flow.
>

You can specify both a custom FlowController AND a <fixedFlow> (or
<capabilityLanguageFlow>) element in your descriptor.  Your
FlowController will be called but it can access all of the
AnalysisEngineMetdata from the aggregate descriptor, including the
fixedFlow and capabilityLanguageFlow sections.

-Adam

Re: Thoughts on extending FlowController API

Posted by Michael Baessler <mb...@michael-baessler.de>.

Adam Lally wrote:
> On 2/28/07, Michael Baessler <mb...@michael-baessler.de> wrote:
>> but when having a FlowController plugged in, this section is missing.
>
> Actually it is possible, but not required, to have a <fixedFlow>
> section when using a custom FlowController.  (I think the CDE supports
> this too, but I'm not sure.)
>
>> But I wonder why. I think for these flows, the order of the
>> analysis engines can also be relevant. How does this work currently? I
>> think the order of the analysis engine definition is used, right?
>
> The reason it's optional is that a custom FlowController often
> wouldn't use a fixed ordering of AnalysisEngines - it may make dynamic
> flow decisions based on other criteria.
>
> Note that FlowControllers can define configuration parameters just
> like AEs can, so whatever information the FlowController needs to make
> routing decisions can be provided that way, if it can't be represented
> by the <fixedFlow> object.
>
> The ordering of analysis engine definitions can't be used to make flow
> decisions.  These are put into a HashMap and the ordering is lost
> before it gets to the FlowController.
>
But how does it work when I would like to implement something like a 
CapabilityLanguage flow we already have as build-in
flow. When I implement this a custom flow I would like to specify the 
possible order of analysis engines. The custom flow can now decide if
all the engines are called but if, please use the order I have 
specified. So I think in this case, it is not a fixed flow that a use 
will specify for the custom flow.

-- Michael

Re: Thoughts on extending FlowController API

Posted by Adam Lally <al...@alum.rpi.edu>.

On 2/28/07, Michael Baessler <mb...@michael-baessler.de> wrote:
> How does it work with the additionalParams map to configure my
> application to 'continue'
> or 'terminate' in case of errors. Will it be configurable for each
> analysis engine separately?
> I think it would be very useful since the error handling depends on the
> analysis engine. So when using the additionalParams map, does the
> application
> have to take care how to get the configuration or will that be part of
> any of the common descriptors?

The FlowController could decide based on configuration (see below)
whether to continue or terminate based on which Analysis Engine failed
(some might be more imporant to the end-result than others).

I intended the additionalParams suggestion just to be a global switch
to cause an abort on _any_ error, just in case a deployer wanted to
override the flow controller's decision in that way.  (I'm not sure
this is a worthwhile thing to do, it was just an idea.)  Of course
there are many more possible kinds of error handling configuration
settings that the user might want to specify, but I don't want to get
into how to specify them all in the aggregate descriptor.

> I think a good place to specify this will the flowConstraints section in
> an aggregate descriptor.
>
> When having a build-in flow, it can look like:
>     <flowConstraints>
>       <fixedFlow>
>         <node errorAction="continue" >ae1</node>
>         <node errorAction="terminate">ae2</node>
>       </fixedFlow>
>     </flowConstraints>
>

Yes, we could consider extending the <fixedFlow> in this way.  That
would let people who are using the existing FixedFlowController easily
configure whether to continue or terminate.  The default would be
terminate, so maybe the attribute should be a boolean
continueOnError="true" in order to override the default.

While we're on that topic, since I added a ParallelStep that the Flow
Controller can return, I wonder if we also want to extend <fixedFlow>
to allow including a parallel step.  So something like:
      <fixedFlow>
        <node errorAction="continue" >ae1</node>
        <parallel>
          <node errorAction="terminate">ae2</node>
          <node errorAction="continue">ae3</node>
        </parallel>
      </fixedFlow>

If we don't do this then people who want to configure a parallel flow
would need a custom flow controller, which seems a little bit like
overkill.

A concern is that we'd be adding complexity to what used to be a very
simple concept for the <fixedFlow>, but I think we can hide this from
most users until they start to care about more complex flow options.

Changes to the definition of FixedFlow would require CDE support, though.

> but when having a FlowController plugged in, this section is missing.

Actually it is possible, but not required, to have a <fixedFlow>
section when using a custom FlowController.  (I think the CDE supports
this too, but I'm not sure.)

> But I wonder why. I think for these flows, the order of the
> analysis engines can also be relevant. How does this work currently? I
> think the order of the analysis engine definition is used, right?

The reason it's optional is that a custom FlowController often
wouldn't use a fixed ordering of AnalysisEngines - it may make dynamic
flow decisions based on other criteria.

Note that FlowControllers can define configuration parameters just
like AEs can, so whatever information the FlowController needs to make
routing decisions can be provided that way, if it can't be represented
by the <fixedFlow> object.

The ordering of analysis engine definitions can't be used to make flow
decisions.  These are put into a HashMap and the ordering is lost
before it gets to the FlowController.

So in summary I think I like the idea of extending <fixedFlow> to
support two other things:
1) replace a <node> with a parallel step that is a collection of
<node>s that could be run in parallel.
2) for each <node> has an optional boolean attribute continueOnError,
which defaults to false.  If set to true, in the case of an error in
this AE, processing will continue on to the next element of the flow.

-Adam