You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Neal R Lewis <nr...@us.ibm.com> on 2013/02/09 00:11:52 UTC

Maintaining UIMAj indexing and references while using stable FSIDs in a CAS Store

It was brought up recently in a meeting that we have to consider the effect of a Feature Structure ID in a CAS / CAS Store on deserialization of a CAS into UIMAj and the annotation indexing.  

e.g, How would adding a stable identifier affect indexing and references withing jCAS Objects?  

I'd like to throw out a couple scenarios to the community and see if these cover all of the possible use cases, and discuss how I currently implement it, and hopefully get some comments :)

First, I'd like to confirm that I'm thinking of a CAS STore operating in between different PEARs or full UIMA Applications, not running between an Aggregate analytic (although that is definitely something to consider).  Furthermore, I am assuming that the CAS Store interface retrieves a CAS object that agrees to the OASIS spec, and that the CAS store is responsible for creating FSIDs.

I can think of four scenarios when deserializing a CAS xmi (I'm not sure about deserializing from binary) to a  jCAS object, as it comes from the CAS Store.
 
1:  A minimal CAS that contains only a sofa and view . This is the simplest input to pull from a CAS Store, and doesn't require an modifications in the UIMAj deserialization.

2:  A full CAS with a SOFA and associated annotations in multiple views

3:  A CAS Fragment (or projection) of a single CAS xmi from the store, that contains only the information necessary for this particular Analytic Pipeline (there might or might not be a SOFA and view associated with it).

4:  A CAS created from one or more analytics on different artifacts (zero or more cas:Sofa elements, and zero or more View elements)

Currently, if I use the FSID element, I have to set the deserialization to LENIENT, or preprocess them out of the CAS before deserialization. This simply removes the unknown attributes. 

For scenario 1, other than lenient serialization, nothing needs to be completed.

For scenario 2 and 3, the associated Type System of the CAS must be registered for serialization. 

For scenerio 4, I haven't implemented yet in UIMAj, but will be working on something for this soon. 

Now, I haven't dug into the Serialization code yet to see how else this can be accomplished, but will be looking into it soon.  I would just like to begin a discussion on this topic to make sure that we're covering all our bases :) 

Thanks! 

Neal