You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2014/11/14 17:27:33 UTC

[jira] [Updated] (UIMA-3399) More consistent handling of multiple add-to-index behavior for same Feature Structure

     [ https://issues.apache.org/jira/browse/UIMA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor updated UIMA-3399:
---------------------------------
          Description: 
UIMA has a somewhat unusual indexing architecture.  You can define indexes (sorted, bag, set), and then add / remove a feature structure (FS) to all of the defined indexes.

The design intention (I think) was to support the concept of a FS being indexed, or not.  However, the current design allows some anomalies that behave inconsistently between code being run "locally", versus as remote services (due to how serialization handles this).  Serialization encodes only the concept of a FS being either in an index or not. 

The problem arises in the edge case where the same identical FS is added to the indexes multiple times.  For local (non-remote) cases, for bag and sorted indexes, the same exact FS would be added multiple times.  This would have the consequences:

-  Iterating would return multiple == FSs.
-  Remove from indexes of a multiply-added FS would reduce the number by 1; the FS would still be in the index unless the last remaining one was removed..

For the same code, running remotely, serialization would have "collapsed" the multiple additions into one, so would behave differently.

This Jira changes the behavior of "add-to-index" so that  subsequent add-to-indexes of a same identical FS would be a no-op. To cover users who might be exploiting the old behavior, the JVM property "uima.allow_duplicate_add_to_indices", read when the UIMA classes are loaded, would restore the previous behavior.

Note that with this change, the UIMA "Set" index still has a distinct purpose , separate from the "Bag" index, because it defines Feature Structure equivalence based not on identity, but rather on specified key feature values being equal.  

This change better aligns how code running locally or remotely works.

  was:
UIMA has a somewhat unusual indexing architecture.  You can define indexes (sorted, bag, set), and then add / remove a feature structure (FS) to all of the defined indexes.

The design intention (I think) was to support the concept of a FS being indexed, or not.  However, the current design allows some anomalies that behave inconsistently between code being run "locally", versus as remote services (due to how serialization handles this).  Serialization encodes only the concept of a FS being either in an index or not. 

The problem arises in the edge case where the same FS is added to the indexes multiple times.  For local (non-remote) cases, for bag and sorted indexes, the same exact FS would be added multiple times.  This would have the consequences:

-  Iterating would return multiple == FSs.
-  Remove from indexes of a multiply-added FS would reduce the number by 1; the FS would still be in the index.

For the same code, running remotely, serialization would have "collapsed" the multiple additions into one, so would behave differently.

A proposed improvement:  Change the behavior of "add-to-index" so that  subsequent add-to-indexes of a same FS would be either a no-op, or a delete / re-add (to cover the case where some feature values of the FS might have changed, and therefore leading to the need to re-index the FS).  To cover users who might be exploiting the old behavior, we could have a framework context flag to re-instate the older behavior.

This would better align how code running locally or remotely works.

What do people think about this idea?

    Affects Version/s:     (was: 2.4.2SDK)
        Fix Version/s: 2.7.0SDK
           Issue Type: Improvement  (was: Brainstorming)

Changed this to an "improvement" and changed the description to what's being implemented. 

> More consistent handling of multiple add-to-index behavior for same Feature Structure
> -------------------------------------------------------------------------------------
>
>                 Key: UIMA-3399
>                 URL: https://issues.apache.org/jira/browse/UIMA-3399
>             Project: UIMA
>          Issue Type: Improvement
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.7.0SDK
>
>
> UIMA has a somewhat unusual indexing architecture.  You can define indexes (sorted, bag, set), and then add / remove a feature structure (FS) to all of the defined indexes.
> The design intention (I think) was to support the concept of a FS being indexed, or not.  However, the current design allows some anomalies that behave inconsistently between code being run "locally", versus as remote services (due to how serialization handles this).  Serialization encodes only the concept of a FS being either in an index or not. 
> The problem arises in the edge case where the same identical FS is added to the indexes multiple times.  For local (non-remote) cases, for bag and sorted indexes, the same exact FS would be added multiple times.  This would have the consequences:
> -  Iterating would return multiple == FSs.
> -  Remove from indexes of a multiply-added FS would reduce the number by 1; the FS would still be in the index unless the last remaining one was removed..
> For the same code, running remotely, serialization would have "collapsed" the multiple additions into one, so would behave differently.
> This Jira changes the behavior of "add-to-index" so that  subsequent add-to-indexes of a same identical FS would be a no-op. To cover users who might be exploiting the old behavior, the JVM property "uima.allow_duplicate_add_to_indices", read when the UIMA classes are loaded, would restore the previous behavior.
> Note that with this change, the UIMA "Set" index still has a distinct purpose , separate from the "Bag" index, because it defines Feature Structure equivalence based not on identity, but rather on specified key feature values being equal.  
> This change better aligns how code running locally or remotely works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)