You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <ui...@incubator.apache.org> on 2008/12/10 21:52:44 UTC

[jira] Created: (UIMA-1249) CasManager_impl logic for lazy merge of the type system could lead to excessive work or missed errors

CasManager_impl logic for lazy merge of the type system could lead to excessive work or missed errors
-----------------------------------------------------------------------------------------------------

                 Key: UIMA-1249
                 URL: https://issues.apache.org/jira/browse/UIMA-1249
             Project: UIMA
          Issue Type: Improvement
    Affects Versions: 2.2.2
            Reporter: Marshall Schor
            Priority: Minor


The CasManager_impl class is sometimes (mostly?) (but not always) used when assembling a pipeline of UIMA components.  

There is one instance of this associated with each Resource Manager instance.  (PearWrapper versions of the ResourceManager share this component).

It has 2 phases.  At "initialization", its method {{addMetaData}} is repeatedly called as a part of the initialization phase of components being assembled to run under one ResourceManager instance, to collect all the metadata from the components (e.g., their individual type systems, type priorities, and index definitions).

At the first call that requires the merged result, e.g. {{getCasDefinition()}}, the class merges all the collected metadata and uses it to produce the CAS's type system, indexes, etc.  

After this first call, additional calls to {{addMetaData}} which attempt to add new things not already in the merge, should result in an error.  

In the current implementation, the call to {{addMetaData}} in this case not only won't result in any error, but it will reset the class instance, so that a subsequent call to get the CAS definition will result in merging being again called, and a new, non-identical merged result will be returned.  This could result in CASes in a pool, for instance, having different type systems.

Normally, this sequence will never happen; however, in the multi-threaded case, where initialization and processing could occur at the same time across multiple instances, it could happen that {{addMetaData}} could be called by a thread still initializing, while another thread has already obtained the "final" merged CAS definition.  In these cases, the {{addMetaData}} call could be "ignored", but in the general case, one would need to check to see if the metaData being added would change the existing type system, and throw an error if it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1249) CasManager_impl logic for lazy merge of the type system could lead to excessive work or missed errors

Posted by "Marshall Schor (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655376#action_12655376 ] 

Marshall Schor commented on UIMA-1249:
--------------------------------------

If the logic for initialization is changed (see UIMA-1223) to serialize calls to ProduceAnalysisEngine, then after the first one is produced, the subsequent ones (in the case where there are multiple threads / instances) don't have to add their meta-data to the CasManager - all this work has already been done.

One other point - at one point there was a concern that for large type systems, a large part of the heap was being used up due to multiple copies of type system descriptors (UIMA-1230).  The code for accumulating metaData from each component (which includes the type system) *clones* the metaData before calling the addMetaData, (see: AnalysisEngineImplBase: initialize, where it does {{getCasManager().addMetaData((AnalysisEngineMetaData)md.clone());}}).  

This information is collected into an ArrayList which is never nulled-out - so it won't be garbage collected.  In the current implementation, this analysis (not measured...) would imply if there were 20 components, and this was scaled out by a factor of 10, there would be 200 cloned copies of the component metadata (type system, type priorities, and index specifications) being held in the heap.

One fix for this might be to (a) null out this ArrayList after the lazy merge is complete, provided other fixes were in place (see above) to just reuse the already merged metaData result without recomputing it.

> CasManager_impl logic for lazy merge of the type system could lead to excessive work or missed errors
> -----------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-1249
>                 URL: https://issues.apache.org/jira/browse/UIMA-1249
>             Project: UIMA
>          Issue Type: Improvement
>    Affects Versions: 2.2.2
>            Reporter: Marshall Schor
>            Priority: Minor
>
> The CasManager_impl class is sometimes (mostly?) (but not always) used when assembling a pipeline of UIMA components.  
> There is one instance of this associated with each Resource Manager instance.  (PearWrapper versions of the ResourceManager share this component).
> It has 2 phases.  At "initialization", its method {{addMetaData}} is repeatedly called as a part of the initialization phase of components being assembled to run under one ResourceManager instance, to collect all the metadata from the components (e.g., their individual type systems, type priorities, and index definitions).
> At the first call that requires the merged result, e.g. {{getCasDefinition()}}, the class merges all the collected metadata and uses it to produce the CAS's type system, indexes, etc.  
> After this first call, additional calls to {{addMetaData}} which attempt to add new things not already in the merge, should result in an error.  
> In the current implementation, the call to {{addMetaData}} in this case not only won't result in any error, but it will reset the class instance, so that a subsequent call to get the CAS definition will result in merging being again called, and a new, non-identical merged result will be returned.  This could result in CASes in a pool, for instance, having different type systems.
> Normally, this sequence will never happen; however, in the multi-threaded case, where initialization and processing could occur at the same time across multiple instances, it could happen that {{addMetaData}} could be called by a thread still initializing, while another thread has already obtained the "final" merged CAS definition.  In these cases, the {{addMetaData}} call could be "ignored", but in the general case, one would need to check to see if the metaData being added would change the existing type system, and throw an error if it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (UIMA-1249) CasManager_impl logic for lazy merge of the type system could lead to excessive work or missed errors

Posted by "Marshall Schor (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor updated UIMA-1249:
---------------------------------

    Affects Version/s:     (was: 2.2.2)
                       2.3

defer past 2.3.0

> CasManager_impl logic for lazy merge of the type system could lead to excessive work or missed errors
> -----------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-1249
>                 URL: https://issues.apache.org/jira/browse/UIMA-1249
>             Project: UIMA
>          Issue Type: Improvement
>    Affects Versions: 2.3
>            Reporter: Marshall Schor
>            Priority: Minor
>
> The CasManager_impl class is sometimes (mostly?) (but not always) used when assembling a pipeline of UIMA components.  
> There is one instance of this associated with each Resource Manager instance.  (PearWrapper versions of the ResourceManager share this component).
> It has 2 phases.  At "initialization", its method {{addMetaData}} is repeatedly called as a part of the initialization phase of components being assembled to run under one ResourceManager instance, to collect all the metadata from the components (e.g., their individual type systems, type priorities, and index definitions).
> At the first call that requires the merged result, e.g. {{getCasDefinition()}}, the class merges all the collected metadata and uses it to produce the CAS's type system, indexes, etc.  
> After this first call, additional calls to {{addMetaData}} which attempt to add new things not already in the merge, should result in an error.  
> In the current implementation, the call to {{addMetaData}} in this case not only won't result in any error, but it will reset the class instance, so that a subsequent call to get the CAS definition will result in merging being again called, and a new, non-identical merged result will be returned.  This could result in CASes in a pool, for instance, having different type systems.
> Normally, this sequence will never happen; however, in the multi-threaded case, where initialization and processing could occur at the same time across multiple instances, it could happen that {{addMetaData}} could be called by a thread still initializing, while another thread has already obtained the "final" merged CAS definition.  In these cases, the {{addMetaData}} call could be "ignored", but in the general case, one would need to check to see if the metaData being added would change the existing type system, and throw an error if it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.