You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Eddie Epstein (JIRA)" <ui...@incubator.apache.org> on 2008/12/03 17:55:44 UTC

[jira] Created: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Processing order of parent CAS different on UIMA and UIMA AS
------------------------------------------------------------

                 Key: UIMA-1245
                 URL: https://issues.apache.org/jira/browse/UIMA-1245
             Project: UIMA
          Issue Type: Bug
          Components: Async Scaleout
            Reporter: Eddie Epstein


Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.

A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.

However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Jerry Cwiklik (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676661#action_12676661 ] 

Jerry Cwiklik commented on UIMA-1245:
-------------------------------------

After the last round of discussions here is *my* understanding how things should work:

1) In both remote and colocated case, the input CAS will *not* be held in the primitive CM. The input CAS will immediately follow its last child.

2) The aggregate will hold the input CAS (if configured to do so) until all its children *leave* the aggregate. This is in contrast with the other idea where the CAS was to be held until all its children were processed in all aggregates. The latter requires complicated notification mechanism. As I understand it we opt for a simpler solution, which is the former. 
The aggregate will decrement the input CAS child count as soon as the child CAS is either dropped in the aggregate or the child CAS is sent to the aggregate's client. As soon as the child count hits zero, the input CAS will be allowed to continue to the next step in the flow.

3) The parameter processParentLast will be defined on the containing aggregate not individual CM. Is this reasonable? 



> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Marshall Schor (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor reassigned UIMA-1245:
------------------------------------

    Assignee: Eddie Epstein

Fixed in dd2spring.xsl and test cases, but needs doc updates. assigning to Eddie to do doc updates :-) 

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Jerry Cwiklik (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674228#action_12674228 ] 

Jerry Cwiklik commented on UIMA-1245:
-------------------------------------

It seems that the general agreement is that the CM should hold the input CAS until ALL of its children are processed. Only than, the input CAS would be returned to the client. This would be a default. To override this, the user would add a new attribute in the dd for the Cas Multiplier . Eddie votes for processParentEarly="true".  If this is provided , the parent CAS would be returned from the CM as soon as all children are generated. This would enable processing of input CAS and its children at the same time. The input CAS would still be held in the final state until all its children are processed. Assuming everyone agrees, I need Marshall to modify dd2spring to support the new CM parameter. 

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Jerry Cwiklik (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerry Cwiklik updated UIMA-1245:
--------------------------------

    Fix Version/s: 2.3AS

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>             Fix For: 2.3AS
>
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Adam Lally (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654598#action_12654598 ] 

Adam Lally commented on UIMA-1245:
----------------------------------

Actually I think your proposal to block the parent upon return to the AGGR ~outer~  is already implied by the original proposal.  That's because AGGR ~inner~ is a CAS Multiplier, and therefore its parent CAS should be blocked from further processing until its children have finished processing in AGGR ~outer~.  

It's a good point though that this wouldn't be exactly identical to the single-threaded case.  The relative order of 2 components being executed would be the same as the synchronous case whenever those 2 components were "sibilngs" (contained in the same aggregate, at the same level of nesting), but not when they were at different levels of nesting.  I think that's still good enough, though.

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Marshall Schor (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654586#action_12654586 ] 

Marshall Schor commented on UIMA-1245:
--------------------------------------

I wonder if it is a good goal to try and have the multi-threaded UIMA-AS aggregate work the same as a single-threaded UIMA aggregate.  Consider the following example: 

An aggregate, AGGR ~inner~ , containing a delegate Cas Multiplier (CM) whose children go thru some subsequent Analysis Engines (AE ~children~ ) and whose main parent CAS subsequently goes thru some subsequent AEs (AE ~parent~) before exiting the aggregate.

Now imagine this aggregate, AGGR ~inner~ , contained in another aggregate, AGGR ~outer~ , where AGGR ~inner~ is considered to be a Cas Multiplier (that is, the "children" CASes produced by CM exit that aggregate and are input back into AGGR ~outer~).

Now image that AGGR ~inner~ is "remote", working off of a JMS queue.

How would the proposed solution determine when "all its children CASes have finished processing"?  This would require some new signaling from  AGGR ~outer~ back to AGGR ~inner~ , which we don't currently have.  This is because the processing of the child CASes could continue in AGGR ~outer~, and AGGR ~outer~ would need to signal (in the general case through many levels of nesting) when a child CAS was finished processing. 

If the idea was to suspend the flow of the parent CAS until all of its children CASes had left AGGR ~inner~ , and then return the parent CAS to AGGR ~outer~, this wouldn't require new signaling, but it would potentially change the order of processing from the comparable single-threaded plain UIMA case - where the parent would be held until the child CASes had finished their processing in all containing aggregate levels.   

Perhaps the thinking in this case is to 
# *block* the parent from going thru any AE ~parent~ in the AGGR ~inner~ until all of the child CASes have exited AGGR ~inner~ (or gone to "final state"),
# *release* the parent, allowing it to go through all of its AEs ~parent~ in the AGGR ~inner~, and then be returned to AGGR ~outer~
# *block* the parent upon return to AGGR ~outer~ until all of its children have finished processing in AGGR ~outer~ .

with suitable extensions for multi-levels of nesting :-)

This would not be _identical_ to the single-threaded case, but it might be close enough.  But my feeling is this is getting very complex, and some simpler (to explain) approach that gives up on the goal of having the UIMA-AS and single-threaded UIMA cases operate the same might be better.  

One thing complicating the current design approaches is the overloading of these kinds of flow decisions with the way errors are "passed back" - the current design for errors is using the Parent CAS to signal the failure of a Child Cas in some cases, so the parent CAS needs to be kept around until all the child CASes have exited a level.  

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Eddie Epstein (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677506#action_12677506 ] 

Eddie Epstein commented on UIMA-1245:
-------------------------------------

+1 for Jerry's last proposal also.

As for the parameter name, the behavior is to "complete processing of children CASes before any further processing of their parent CAS". A CAS sent to a CasMultiplier may not have any children, so a wording referring to ParentCas is better than to InputCas. With that in mind, processParentLast is good, and processParentAfterChildren is even longer :)

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Burn Lewis (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676308#action_12676308 ] 

Burn Lewis commented on UIMA-1245:
----------------------------------

To summarize my understanding of recent discussions ...

First I'd like to suggest that the default should not change.  Processing the parent last does not guarantee that UIMA-AS will act like core UIMA ... in addition the size of all downstream pools must be set to 1 to ensure that each child is processed sequentially.  We should document the settings needed for UIMA-like processing but I think the default should be UIMA-AS style processing, i.e. processParentLast="false".

With the current design parents are held in the final step of an aggregate until all children have completed processing in that aggregate.  This ensures that any child errors can be reported on the input CAS, and that aggregate CMs satisfy the CM contract of not processing the parent until all children have been returned.  If this aggregate is nested in another, the same conditions hold at the final step of the outer aggregate.

But with this new processParentLast="true" option the parent must be held after the CM until all of its children have completed processing in all aggregates, i.e. have been returned to their pool.  Unlike the previous case we must track the number of children active in any of the nested aggregates.

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Eddie Epstein (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655241#action_12655241 ] 

Eddie Epstein commented on UIMA-1245:
-------------------------------------

{quote}
I assume this would be a per-CM option, not application wide .... where should we specify it and how? Should we make this new behaviour the default? Should the default be true or false? Perhaps in the deployment descriptor:

<casMultiplier poolSize="1" processParentLast="true">
or
<casMultiplier poolSize="1" processParentEarly="false">
or ....
{quote}

As per the original intent of this issue, the default has to be process the parent last. I'd vote for processParentEarly="false" syntax. Yes, this has to be per CM.

{quote}
As Burn points out, the "new signalling" I mentioned above isn't new - it's already implemented, and serves (among other uses) to keep an inner remote CAS multiplier "throttled" from generating too many CASes too rapidly.
{quote}

The "signalling" mechanism, which prevents a parent CAS from being returned before all processing on its children is completed, was created so that any errors while "working" on that CAS could be reported back to the client that sent it. CAS multiplier throttling is accomplished only by the size of its CAS pool.

{quote}
Now, since this signal is there, it allows some alternatives, for when the parent is being blocked. Should the block be lifted when all the child CASes at this level have been processed, or only when all the child CASes at this level have been released (by virtue of a signal from containing aggregates, perhaps, as described above)?
{quote}

The block is for child CASes still in process. When all processing on the children is done, the parent is unblocked.

{quote}
In thinking more about this, perhaps the option should be to allow processing the "parent-last" on a specific delegate, instead of having the block be at the CAS Multiplier. This would allow the parent to flow through additional AEs in its path up to some specified point, where it would then be forced to wait for all of its children to be released. This would satisfy the use case of insuring at some point in the flow that the parent was blocked until its children were released.
{quote}

Interesting, but is certainly a more complicated design. Also, if the default is to have the same behavior as core UIMA, then every subsequent AE would have the parent-last block on, and unblocking would require lots of overrides.

Eddie

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Jerry Cwiklik (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerry Cwiklik closed UIMA-1245.
-------------------------------

    Resolution: Fixed

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Marshall Schor (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654949#action_12654949 ] 

Marshall Schor commented on UIMA-1245:
--------------------------------------

As Burn points out, the "new signalling" I mentioned above isn't new - it's already implemented, and serves (among other uses) to keep an inner remote CAS multiplier "throttled" from generating too many CASes too rapidly.   

Now, since this signal is there, it allows some alternatives, for when the parent is being blocked.  Should the block be lifted when all the child CASes _*at this level*_ have been processed, or only when all the child CASes at this level have been released (by virtue of a signal from containing aggregates, perhaps, as described above)?  

In thinking more about this, perhaps the option should be to allow processing the "parent-last" on a specific delegate, instead of having the block be at the CAS Multiplier.  This would allow the parent to flow through additional AEs in its path up to some specified point, where it would then be forced to wait for all of its children to be released.  This would satisfy the use case of insuring at some point in the flow that the parent was blocked until its children were released.

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Burn Lewis (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654817#action_12654817 ] 

Burn Lewis commented on UIMA-1245:
----------------------------------

I think we could get the same behaviour as single-threaded UIMA if in addition to delaying the processing of the parent we set the size of all CAS pools to 1.  This would ensure that only one child at a time would be active and the next would not start until the previous had been returned to the pool.  If the CM is remote then when the child CAS is returned to the AGGR ~outer~ pool it signals back to AGGR ~inner~ that a CAS has been released which would then release its copy of the child and so allow the CM in AGGR ~inner~ to generate the next child.  Then when all children have been released the held-up parent CAS would be allowed to continue in the flow.

Also, as we discussed, this "process-parent-last" option would be more useful than for just emulating single-threaded operation.  An aggregate could use a CM to split its work into many parallel pieces, and then use a final CM to combine the results of its children and merge them into the parent before it exits the aggregate.  The new option would ensure that the parent reaches the final CM only after all its children. This would let the aggregate process its children asynchronously but appear to its caller as a non-CM simple aggregate.

I assume this would be a per-CM option, not application wide .... where should we specify it and how?  Should we make this new behaviour the default?  Should the default be true or false?  Perhaps in the deployment descriptor:

     {{<casMultiplier poolSize="1" processParentLast="true">}}
or
     {{<casMultiplier poolSize="1" processParentEarly="false">}}
or ....

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Burn Lewis (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676750#action_12676750 ] 

Burn Lewis commented on UIMA-1245:
----------------------------------

My suggestion that we should attempt to emulate core UIMA's single-threaded handling of parents was clearly over-design .... when all that appears to be needed is a way to optionally ensure that parents follow children in the flow.

I assume that in 3) you mean that the parameter is part of the CasMultiplier specification in the deployment descriptor for the aggregate, i.e. applies per CasMultiplier and not to all CasMulti[pliers in the aggregate.  Since the default (I hope) will be to let the parent continue in the flow as soon as its last child has entered the flow, perhaps the parameter for this new feature could indicate that the parent's processing will be suspended until all its child CASes have completed their processing in the aggregate.  Since the docs use the terms input/output instead of parent/child perhaps...

<casMultiplier poolSize="6" suspendInputCas="true"/>
or
<casMultiplier poolSize="6" processInputCasAfterOutputs="true"/>

+1 for the simpler per-aggregate design

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1245) Processing order of parent CAS different on UIMA and UIMA AS

Posted by "Burn Lewis (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656131#action_12656131 ] 

Burn Lewis commented on UIMA-1245:
----------------------------------

Marshall's generalization to apply process-parent-last to any delegate, not just the first after a CM, would probably be only useful with a custom Flow Controller, so perhaps it should be controlled via the FC.  If we added a "wait-for-children" flag to each Step object then when set true the framework could suspend the flow of any CAS with active children.  When all children are released the controller would process the parent's flow.  We're currently doing this just for the FinalStep but generalizing to any Step would be easy.  (The flag would always be set for FinalStep as all children must complete without errors before the parent is released.)  
We could add a new "wait" choice to the ActionAfterCasMultiplier configuration parameter in the Fixed & Advanced FCs

> Processing order of parent CAS different on UIMA and UIMA AS
> ------------------------------------------------------------
>
>                 Key: UIMA-1245
>                 URL: https://issues.apache.org/jira/browse/UIMA-1245
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>
> Arron Kaplan raised the question of when parent CASes are processed relative to their children. See http://markmail.org/message/5cop7iv2nshouhgs  As of now, the processing order for a multi-threaded UIMA AS aggregate is different than that for a single-threaded UIMA aggregate.
> A discussion with Burn, Adam, Jerry, Marshall and myself concluded that the default processing order for UIMA AS should be changed to be the same as in UIMA, in order to have the same application behavior for both. This will be done by suspending flow of a parent CAS after it is returned from a CasMultiplier delegate until all its children CASes have finished processing.
> However, there also needs to be a UIMA AS deployment option for CasMultiplier delegates that allows the parent CAS to resume processing immediately after being returned from the CM. This option is needed to enable parallel processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.