You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Richard Eckart de Castilho (JIRA)" <de...@uima.apache.org> on 2013/11/01 15:33:17 UTC

[jira] [Commented] (UIMA-3399) More consistent handling of multiple add-to-index behavior for same Feature Structure

    [ https://issues.apache.org/jira/browse/UIMA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811284#comment-13811284 ] 

Richard Eckart de Castilho commented on UIMA-3399:
--------------------------------------------------

I seem to remember that one of the reasons to post to the mailing list instead of to Jira was, because it was easier to discuss things ;) nevertheless

   * +1 for a delete/re-add behavior
   * +0 with a tendency to -1 for reinstating the old behavior - if there was a flag introduced, a version should be defined that the flag as well as the backwards-compatibility code is removed

General questions: 

   * what happens to FSes which are only reachable by other FSes and not indexed at all? Does the remote-case cover that?
   * what happens if an FS is added to some indexes but not to all? Does the remote-case cover that?

> More consistent handling of multiple add-to-index behavior for same Feature Structure
> -------------------------------------------------------------------------------------
>
>                 Key: UIMA-3399
>                 URL: https://issues.apache.org/jira/browse/UIMA-3399
>             Project: UIMA
>          Issue Type: Brainstorming
>    Affects Versions: 2.4.2SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>
> UIMA has a somewhat unusual indexing architecture.  You can define indexes (sorted, bag, set), and then add / remove a feature structure (FS) to all of the defined indexes.
> The design intention (I think) was to support the concept of a FS being indexed, or not.  However, the current design allows some anomalies that behave inconsistently between code being run "locally", versus as remote services (due to how serialization handles this).  Serialization encodes only the concept of a FS being either in an index or not. 
> The problem arises in the edge case where the same FS is added to the indexes multiple times.  For local (non-remote) cases, for bag and sorted indexes, the same exact FS would be added multiple times.  This would have the consequences:
> -  Iterating would return multiple == FSs.
> -  Remove from indexes of a multiply-added FS would reduce the number by 1; the FS would still be in the index.
> For the same code, running remotely, serialization would have "collapsed" the multiple additions into one, so would behave differently.
> A proposed improvement:  Change the behavior of "add-to-index" so that  subsequent add-to-indexes of a same FS would be either a no-op, or a delete / re-add (to cover the case where some feature values of the FS might have changed, and therefore leading to the need to re-index the FS).  To cover users who might be exploiting the old behavior, we could have a framework context flag to re-instate the older behavior.
> This would better align how code running locally or remotely works.
> What do people think about this idea?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Re: [jira] [Commented] (UIMA-3399) More consistent handling of multiple add-to-index behavior for same Feature Structure

Posted by Richard Eckart de Castilho <re...@apache.org>.
Thanks. All doubts removed :) No more edge-cases I can think of right now.

-- Richard

On 01.11.2013, at 22:26, Marshall Schor <ms...@schor.com> wrote:

> 
> On 11/1/2013 10:33 AM, Richard Eckart de Castilho (JIRA) wrote:
>>    [ https://issues.apache.org/jira/browse/UIMA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811284#comment-13811284 ] 
>> 
>> Richard Eckart de Castilho commented on UIMA-3399:
>> --------------------------------------------------
>> 
>> I seem to remember that one of the reasons to post to the mailing list instead of to Jira was, because it was easier to discuss things ;) 
> 
> I'm not known for being completely consistent, all the time ;-)
> 
>> nevertheless
>> 
>>   * +1 for a delete/re-add behavior
>>   * +0 with a tendency to -1 for reinstating the old behavior - if there was a flag introduced, a version should be defined that the flag as well as the backwards-compatibility code is removed
>> 
>> General questions: 
>> 
>>   * what happens to FSes which are only reachable by other FSes and not indexed at all? Does the remote-case cover that?
> 
> Yes, the serializers to a "trace" of all reachable FSs.
> 
>>   * what happens if an FS is added to some indexes but not to all? Does the remote-case cover that?
> 
> So, to clarify: a FS is added to an IndexRepository.  There is one
> IndexRepository in the CAS, per View.  Each IndexRepository can have many
> defined indexes, and always has one built-in index (the Annotation Index).  Each
> IndexRepository also has bag indexes created if needed for types that are not
> covered by any defined index.
> 
> For a given index repository, there is no way to add an FS to some of the
> defined indexes, and not others.
> 
> Of course, it is possible to add a FS to the indexes for one view, and not have
> it added in a different view.
> 
> Does that answer the question, or did I miss something?
> 
> -Marshall


Re: [jira] [Commented] (UIMA-3399) More consistent handling of multiple add-to-index behavior for same Feature Structure

Posted by Marshall Schor <ms...@schor.com>.
On 11/1/2013 10:33 AM, Richard Eckart de Castilho (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/UIMA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811284#comment-13811284 ] 
>
> Richard Eckart de Castilho commented on UIMA-3399:
> --------------------------------------------------
>
> I seem to remember that one of the reasons to post to the mailing list instead of to Jira was, because it was easier to discuss things ;) 

I'm not known for being completely consistent, all the time ;-)

> nevertheless
>
>    * +1 for a delete/re-add behavior
>    * +0 with a tendency to -1 for reinstating the old behavior - if there was a flag introduced, a version should be defined that the flag as well as the backwards-compatibility code is removed
>
> General questions: 
>
>    * what happens to FSes which are only reachable by other FSes and not indexed at all? Does the remote-case cover that?

Yes, the serializers to a "trace" of all reachable FSs.

>    * what happens if an FS is added to some indexes but not to all? Does the remote-case cover that?

So, to clarify: a FS is added to an IndexRepository.  There is one
IndexRepository in the CAS, per View.  Each IndexRepository can have many
defined indexes, and always has one built-in index (the Annotation Index).  Each
IndexRepository also has bag indexes created if needed for types that are not
covered by any defined index.

For a given index repository, there is no way to add an FS to some of the
defined indexes, and not others.

Of course, it is possible to add a FS to the indexes for one view, and not have
it added in a different view.

Does that answer the question, or did I miss something?

-Marshall

>
>> More consistent handling of multiple add-to-index behavior for same Feature Structure
>> -------------------------------------------------------------------------------------
>>
>>                 Key: UIMA-3399
>>                 URL: https://issues.apache.org/jira/browse/UIMA-3399
>>             Project: UIMA
>>          Issue Type: Brainstorming
>>    Affects Versions: 2.4.2SDK
>>            Reporter: Marshall Schor
>>            Assignee: Marshall Schor
>>            Priority: Minor
>>
>> UIMA has a somewhat unusual indexing architecture.  You can define indexes (sorted, bag, set), and then add / remove a feature structure (FS) to all of the defined indexes.
>> The design intention (I think) was to support the concept of a FS being indexed, or not.  However, the current design allows some anomalies that behave inconsistently between code being run "locally", versus as remote services (due to how serialization handles this).  Serialization encodes only the concept of a FS being either in an index or not. 
>> The problem arises in the edge case where the same FS is added to the indexes multiple times.  For local (non-remote) cases, for bag and sorted indexes, the same exact FS would be added multiple times.  This would have the consequences:
>> -  Iterating would return multiple == FSs.
>> -  Remove from indexes of a multiply-added FS would reduce the number by 1; the FS would still be in the index.
>> For the same code, running remotely, serialization would have "collapsed" the multiple additions into one, so would behave differently.
>> A proposed improvement:  Change the behavior of "add-to-index" so that  subsequent add-to-indexes of a same FS would be either a no-op, or a delete / re-add (to cover the case where some feature values of the FS might have changed, and therefore leading to the need to re-index the FS).  To cover users who might be exploiting the old behavior, we could have a framework context flag to re-instate the older behavior.
>> This would better align how code running locally or remotely works.
>> What do people think about this idea?
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1#6144)
>