You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2011/03/14 16:02:12 UTC

Working on shared UIMA Context issues

Adam noted that the issue https://issues.apache.org/jira/browse/UIMA-2078
suggests there are other issues around Cas Multipliers in base UIMA, when using
shared UIMA Contexts.

This is because the getEmptyCas method in the (shared) UimaContext is checking
to see if the pool size is exceeded, and if the pool size is 1 but you have 5
pipelines sharing the UimaContext, this test would result in throwing an
exception on the 2nd one.

So, one approach to "fix" this problem would be to not share UimaContexts, in
this case.  The downside of this would be that the contexts were not shared
among the pipelines.  This could be a good or bad thing, depending on the use
case(s).

Good thing: The contexts are very large, but read-only.

Bad thing: The contexts are used by the pipeline for things like storing data
via the external resource manager, in structures like HashMaps, which are not
thread safe.  The user could have designed a pipe line where some upstream
annotators wrote some data into a map, and some downstream annotators later
accessed that data, presuming what it would see would be just what the upstream
annotator put there.  In the case of shared UIMA-Contexts, besides the issue of
thread safety for HashMaps, even if the user used a thread-safe version of this,
this presumption would not hold.

There is a built-in framework method (produceAnalysisEngine with 2 int
arguments) that instantiates multiple analysis engines
(MultiprocessingAnalysisEngine_impl), used by (for example) the SOAP service
adapter.  In light of this, it seems faulty in several ways.

1) MultiprocessingAnalysisEngine_impl shares the UimaContext among the pool of
resources, and has the above issues including the CasMultiplier / pool size issue.

2) When UIMA-AS was being debugged, one issue that came up was that some
annotators had been written with a presumption that the thread used to call the
initialize method needed to be the same thread used to call the process method
(these annotators made use of ThreadLocal variables, IIRC).
See https://issues.apache.org/jira/browse/UIMA-1223 .  UIMA-AS was updated to
insure in its multi-pipeline setup that this presumption was met.

Shouldn't this same presumption be met with the base UIMA implementation?

-Marshall






Re: Working on shared UIMA Context issues

Posted by Adam Lally <al...@alum.rpi.edu>.
On Mon, Mar 14, 2011 at 1:45 PM, Marshall Schor <ms...@schor.com> wrote:

> I think I may have found a way to have the individual pipelines created by
> the
> MultiprocessingAnalysisEngine stop sharing the UIMA contexts.  This would
> alleviate the CasMultiplier issue, but at a cost of changing the behavior
> for
> existing users of this facility - in that "external resources" managed by
> UIMA
> would no longer be shared across the multiple pipelines.
>
> An alternative would be to document this behavior and warn against using
> this
> facility with Cas Multipliers.
>
> Any preferences?  I think I'm slightly in favor of no longer implicitly
> sharing
> UIMA Context across multiple pipelines.
>
>
I think it should be possible (although perhaps more difficult to implement)
for the CAS Multipliers to have different UimaContexts but without sharing a
ResourceManager.  This would fix the CAS Multiplier issue without changing
the sharing of external resources.  I'm a bit leery of having external
resources not be shared.  If someone has large resources they may be relying
on them not being replicated among pipelines.  I would prefer to stay
backwards compatible on this.  If we feel we need to address a potential
problem with non-thread-safe resources, we could consider adding a
deployment option to allow the user to explicitly request non-shared
resources.

 -Adam

Re: Working on shared UIMA Context issues

Posted by Marshall Schor <ms...@schor.com>.
I think I may have found a way to have the individual pipelines created by the
MultiprocessingAnalysisEngine stop sharing the UIMA contexts.  This would
alleviate the CasMultiplier issue, but at a cost of changing the behavior for
existing users of this facility - in that "external resources" managed by UIMA
would no longer be shared across the multiple pipelines.

An alternative would be to document this behavior and warn against using this
facility with Cas Multipliers.

Any preferences?  I think I'm slightly in favor of no longer implicitly sharing
UIMA Context across multiple pipelines.

-Marshall



On 3/14/2011 11:17 AM, Marshall Schor wrote:
> On 3/14/2011 11:02 AM, Marshall Schor wrote:
>> Adam noted that the issue https://issues.apache.org/jira/browse/UIMA-2078
>> suggests there are other issues around Cas Multipliers in base UIMA, when using
>> shared UIMA Contexts.
>>
>> This is because the getEmptyCas method in the (shared) UimaContext is checking
>> to see if the pool size is exceeded, and if the pool size is 1 but you have 5
>> pipelines sharing the UimaContext, this test would result in throwing an
>> exception on the 2nd one.
>>
>> So, one approach to "fix" this problem would be to not share UimaContexts, in
>> this case.  The downside of this would be that the contexts were not shared
>> among the pipelines.  This could be a good or bad thing, depending on the use
>> case(s).
>>
>> Good thing: The contexts are very large, but read-only.
>>
>> Bad thing: The contexts are used by the pipeline for things like storing data
>> via the external resource manager, in structures like HashMaps, which are not
>> thread safe.  The user could have designed a pipe line where some upstream
>> annotators wrote some data into a map, and some downstream annotators later
>> accessed that data, presuming what it would see would be just what the upstream
>> annotator put there.  In the case of shared UIMA-Contexts, besides the issue of
>> thread safety for HashMaps, even if the user used a thread-safe version of this,
>> this presumption would not hold.
>>
>> There is a built-in framework method (produceAnalysisEngine with 2 int
>> arguments) that instantiates multiple analysis engines
>> (MultiprocessingAnalysisEngine_impl), used by (for example) the SOAP service
>> adapter.  In light of this, it seems faulty in several ways.
> Vinci services also make use of the MultiprocessingAnalysisEngine class.
>
> -Marshall
>> 1) MultiprocessingAnalysisEngine_impl shares the UimaContext among the pool of
>> resources, and has the above issues including the CasMultiplier / pool size issue.
>>
>> 2) When UIMA-AS was being debugged, one issue that came up was that some
>> annotators had been written with a presumption that the thread used to call the
>> initialize method needed to be the same thread used to call the process method
>> (these annotators made use of ThreadLocal variables, IIRC).
>> See https://issues.apache.org/jira/browse/UIMA-1223 .  UIMA-AS was updated to
>> insure in its multi-pipeline setup that this presumption was met.
>>
>> Shouldn't this same presumption be met with the base UIMA implementation?
>>
>> -Marshall
>>
>>
>>
>>
>>
>>
>>
>

Re: Working on shared UIMA Context issues

Posted by Marshall Schor <ms...@schor.com>.
On 3/14/2011 11:02 AM, Marshall Schor wrote:
> Adam noted that the issue https://issues.apache.org/jira/browse/UIMA-2078
> suggests there are other issues around Cas Multipliers in base UIMA, when using
> shared UIMA Contexts.
>
> This is because the getEmptyCas method in the (shared) UimaContext is checking
> to see if the pool size is exceeded, and if the pool size is 1 but you have 5
> pipelines sharing the UimaContext, this test would result in throwing an
> exception on the 2nd one.
>
> So, one approach to "fix" this problem would be to not share UimaContexts, in
> this case.  The downside of this would be that the contexts were not shared
> among the pipelines.  This could be a good or bad thing, depending on the use
> case(s).
>
> Good thing: The contexts are very large, but read-only.
>
> Bad thing: The contexts are used by the pipeline for things like storing data
> via the external resource manager, in structures like HashMaps, which are not
> thread safe.  The user could have designed a pipe line where some upstream
> annotators wrote some data into a map, and some downstream annotators later
> accessed that data, presuming what it would see would be just what the upstream
> annotator put there.  In the case of shared UIMA-Contexts, besides the issue of
> thread safety for HashMaps, even if the user used a thread-safe version of this,
> this presumption would not hold.
>
> There is a built-in framework method (produceAnalysisEngine with 2 int
> arguments) that instantiates multiple analysis engines
> (MultiprocessingAnalysisEngine_impl), used by (for example) the SOAP service
> adapter.  In light of this, it seems faulty in several ways.

Vinci services also make use of the MultiprocessingAnalysisEngine class.

-Marshall
> 1) MultiprocessingAnalysisEngine_impl shares the UimaContext among the pool of
> resources, and has the above issues including the CasMultiplier / pool size issue.
>
> 2) When UIMA-AS was being debugged, one issue that came up was that some
> annotators had been written with a presumption that the thread used to call the
> initialize method needed to be the same thread used to call the process method
> (these annotators made use of ThreadLocal variables, IIRC).
> See https://issues.apache.org/jira/browse/UIMA-1223 .  UIMA-AS was updated to
> insure in its multi-pipeline setup that this presumption was met.
>
> Shouldn't this same presumption be met with the base UIMA implementation?
>
> -Marshall
>
>
>
>
>
>
>