You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Eddie Epstein (JIRA)" <de...@uima.apache.org> on 2010/06/23 23:25:51 UTC

[jira] Created: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Provide simple mechanism to capture all CASes input to specified delegate
-------------------------------------------------------------------------

                 Key: UIMA-1818
                 URL: https://issues.apache.org/jira/browse/UIMA-1818
             Project: UIMA
          Issue Type: New Feature
          Components: Async Scaleout
            Reporter: Eddie Epstein
            Assignee: Eddie Epstein


The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.

A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.

This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.

CAS logging would be controlled by the following properties:

UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.

UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/

UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.

UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by "Marshall Schor (JIRA)" <de...@uima.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881925#action_12881925 ] 

Marshall Schor commented on UIMA-1818:
--------------------------------------

Sounds like a valuable debugging aide.  

Is the idea that *every* CAS that comes thru a particular specified annotator would be saved to the file system?
* if so - maybe some parameter to control how many, or how frequently to sample, etc.?

The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc.  This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases.  

Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after  (showing both).  "before" could be the default.

Would it be valuable to dump only the changed data (a/la "delta cas")?  (possible syntax: add modifier :delta)

It would be good if the output was consumable by the CAS Viewer, too :-).

> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by Marshall Schor <ms...@schor.com>.
sounds good. please go for it :-) -Marshall

On 6/24/2010 9:45 AM, Eddie Epstein (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882153#action_12882153 ] 
>
> Eddie Epstein commented on UIMA-1818:
> -------------------------------------
>
> bq. Is the idea that every CAS that comes thru a particular specified annotator would be saved to the file system?
> Yes.
>
> bq. if so - maybe some parameter to control how many, or how frequently to sample, etc.?
> Implementing JMX control to dynamically turn on/off CAS logging would accomplish this.
>
> {quote}The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc. This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases.
> {quote}
> Right. The same syntax should work for UIMA CASes. To clarify, the code to implement this is in the aggregate controller, of which there is one for UIMA AS and another for core UIMA. The UIMA AS controller only sees asynchronous delegates and visa versa for the core UIMA controller. This issue is only covering implementation for asynchronous delegates.
>
> {quote}Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after (showing both). "before" could be the default.{quote}
> To me, much less valuable to capture output CASes, and more complicated to implement. The main use of capturing CASes going into a delegate is to be able to later run the delegate stand-alone in a debug environment. In my case, a scaled out delegate is hanging on one or more CASes and timing out. This utility will allow one to easily capture all the CASes sent to the queue, find the problem CAS and ultimately the cause.
>
> bq.Would it be valuable to dump only the changed data (a/la "delta cas")? (possible syntax: add modifier :delta)
> This sounds more appropriately handled by CAS journaling, where all CAS modifications can be attributed to specific annotators.
>
> bq.It would be good if the output was consumable by the CAS Viewer, too
> Interesting. The XmiCASes will be, but only if the CAS typesystem is available. The typesystem description should be written into the directory along with the CAS files.
>
>
>   
>> Provide simple mechanism to capture all CASes input to specified delegate
>> -------------------------------------------------------------------------
>>
>>                 Key: UIMA-1818
>>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>>             Project: UIMA
>>          Issue Type: New Feature
>>          Components: Async Scaleout
>>            Reporter: Eddie Epstein
>>            Assignee: Eddie Epstein
>>
>> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
>> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
>> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
>> CAS logging would be controlled by the following properties:
>> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
>> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
>> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
>> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".
>>     
>   

[jira] Commented: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by "Eddie Epstein (JIRA)" <de...@uima.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882153#action_12882153 ] 

Eddie Epstein commented on UIMA-1818:
-------------------------------------

bq. Is the idea that every CAS that comes thru a particular specified annotator would be saved to the file system?
Yes.

bq. if so - maybe some parameter to control how many, or how frequently to sample, etc.?
Implementing JMX control to dynamically turn on/off CAS logging would accomplish this.

{quote}The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc. This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases.
{quote}
Right. The same syntax should work for UIMA CASes. To clarify, the code to implement this is in the aggregate controller, of which there is one for UIMA AS and another for core UIMA. The UIMA AS controller only sees asynchronous delegates and visa versa for the core UIMA controller. This issue is only covering implementation for asynchronous delegates.

{quote}Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after (showing both). "before" could be the default.{quote}
To me, much less valuable to capture output CASes, and more complicated to implement. The main use of capturing CASes going into a delegate is to be able to later run the delegate stand-alone in a debug environment. In my case, a scaled out delegate is hanging on one or more CASes and timing out. This utility will allow one to easily capture all the CASes sent to the queue, find the problem CAS and ultimately the cause.

bq.Would it be valuable to dump only the changed data (a/la "delta cas")? (possible syntax: add modifier :delta)
This sounds more appropriately handled by CAS journaling, where all CAS modifications can be attributed to specific annotators.

bq.It would be good if the output was consumable by the CAS Viewer, too
Interesting. The XmiCASes will be, but only if the CAS typesystem is available. The typesystem description should be written into the directory along with the CAS files.


> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by "Eddie Epstein (JIRA)" <de...@uima.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eddie Epstein closed UIMA-1818.
-------------------------------

    Resolution: Fixed

> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by "Eddie Epstein (JIRA)" <de...@uima.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eddie Epstein reopened UIMA-1818:
---------------------------------


When accessing UIMA_CASLOG_TYPE_NAME, the featurestructure may be in a named View. Another optional specification of the view is needed: UIMA_CASLOG_VIEW_NAME.

> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by "Eddie Epstein (JIRA)" <de...@uima.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eddie Epstein closed UIMA-1818.
-------------------------------

    Resolution: Fixed

> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (UIMA-1818) Provide simple mechanism to capture all CASes input to specified delegate

Posted by "Marshall Schor (JIRA)" <de...@uima.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor updated UIMA-1818:
---------------------------------

    Fix Version/s: 2.3.1AS

> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>             Fix For: 2.3.1AS
>
>
> The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is  the time in microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.