You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2014/11/18 20:49:20 UTC

edge case in indexes and UIMA-AS remote services

While thinking through and looking at code around index management, I've come up
with some edge cases that could be considered "bugs"; they involve cases where
there's (only) Set index(es) for some type.

Consider, for example, a case where you have a client and (remote) service.  The
Client defines some indices, the service defines just one index, a Set index.

Now, imagine the client first creates some FSs, adds them to the indices, and it
has sufficient indices of the non-Set kind, to insure that all FSs are recorded
as being belonging to some index.

Next, imagine this CAS is sent to the remote; the sending includes the list of
things that are in the index.  At the remote, these are addedToIndexes.  Because
of the Set index at the remote, some of these FSs could be considered equal, and
won't be added to any index at the remote.

When the CAS is returned to the client, those FSs that were not in the Set will
be lost.  While you might consider this a nifty "filter", it violates the
principle that there should be no logical difference from running an annotator
remotely or imbedded.

Is this a bug?  It seems like it to me.  A possible fix would be to extend the
"auto-create" of default bag indices (whose purpose is to record when things are
added to indices, and there is no index for this), to be auto-created if an
addToIndex only found Set index(es) and the result was not added (because (all
of) the Set index already held another FS in the equivalence class).  Currently
the auto-create operation only happens if there a no indices at all (for this
type).

If the remote had no indices defined, then the addToIndexes would automatically
create a "Default Bag Index" for the type in question, so that the effect of the
add to indices would not be lost; this would be observable in subsequent calls
to get an iterator over a particular type, in CAS Serialization, and in getting
an iterator over all types added to indices. 

With the current behavior, merely introducing a Set index (which would block the
auto-create of a default bag index) would change the behavior of these operations.

WDYT? Do you agree this is a bug that should be fixed along the lines suggested
(by auto-creating a default bag index if there are only Set indices and the
add-to-index op doesn't add)?

-Marshall