You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "AndyMC@apache.org (Andy McMurry)" <mc...@gmail.com> on 2014/08/27 08:39:35 UTC

Fwd: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors

Interesting thread in UIMA core about JSON Serialization CAS and Descriptors. 


Begin forwarded message:

> From: Marshall Schor <ms...@schor.com>
> Subject: Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
> Date: August 25, 2014 at 8:33:54 PM PDT
> To: dev@uima.apache.org
> Reply-To: dev@uima.apache.org
> 
> 
> On 8/25/2014 6:54 PM, Jens Grivolla wrote:
>> Is the JSON serialization documented somewhere?
> Yes, there's a chapter in the reference book.  You can build that
> (uima-docbook-references), until it's released.
> 
> There are also lots of Javadocs in the main implementing class:
> XmiCasSerializer.  (It's in this class because it shares a lot of the machinery
> with Xmi serialization).
> 
>> 
>> I saw that there appear to be quite a few alternative serializations. It
>> seems to include something like a typesystem definition, but only with a
>> list of feature names, not their types, if I understood the format
>> correctly (@featureRefs has a list of the features that are not of
>> primitive types, it seems).
> The @featureRefs is only those features which are "references" to other feature
> structures.
> 
> You're correct, in noticing that the feature "range" types are not present. 
> This is because the serialization is to JSON, which supports a native
> representation of things that are collections (JSON arrays) which could be uima
> Arrays or Lists, and ranges that are boolean are representable by JSON true and
> false values.  There is no distinction that a number is a byte/short/int/long,
> because those are all represented as a JSON "number".  And so forth...
> 
> The Json serialization for a CAS can optionally include parts of the type
> system: It can include what the supertypes are for serialized types (to enable
> iterating over a type and all of its subtypes, like Cas iterators normally do); 
> it can also identify which slots which appear to have number values are actually
> to be interpreted as references to other feature structures.  Otherwise, the
> serialized form might have a slot "foo" : 111  which is a number value, and a
> slot "bar" : 112 which is a reference to another feature structure whose ID is
> 112.  This extra information (in @featureRefs) permits the user of the JSON
> serialized form a way to distinguish these two case.
> 
>> 
>> It would be very useful if the serialization allowed one to easily pull out
>> a partial CAS with just a subset of the views (by only including some
>> subtrees of the JSON structure), and merge views into it.
> Another optional part of the serialization is a list of views, together with an
> array of numbers each one of which represents a serialized Feature Structure
> that is indexed in that view.
>> This might be
>> complicated, as I understand that the views define annotation indices, but
>> the same annotation can be indexed in several views, right?
> 
> Feature Structures can be classified into "Annotations" and other types (not a
> subtype of Annotation).
> 
> Annotations are special - they have an implied reference to a particular subject
> of analysis.  So they are restricted to being indexed in the view that is
> associated with that subject-of-analysis.
> 
> Other types (not subtypes of Annotation (or more precisely, AnnotationBase)) do
> not have this restriction, and can be indexed in multiple views.
> 
> See
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa.
> 
> Let me know where the documentation might be improved :-)
> 
> -Marshall
>> 
>> -- Jens
>> 
>> 
>> 
> 


Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors

Posted by John Green <jo...@gmail.com>.
Very good


On Wed, Aug 27, 2014 at 2:39 AM, AndyMC@apache.org (Andy McMurry) <
mcmurry.andy@gmail.com> wrote:

> Interesting thread in UIMA core about JSON Serialization CAS and
> Descriptors.
>
>
> Begin forwarded message:
>
> > From: Marshall Schor <ms...@schor.com>
> > Subject: Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for
> CASs and UIMA Descriptors
> > Date: August 25, 2014 at 8:33:54 PM PDT
> > To: dev@uima.apache.org
> > Reply-To: dev@uima.apache.org
> >
> >
> > On 8/25/2014 6:54 PM, Jens Grivolla wrote:
> >> Is the JSON serialization documented somewhere?
> > Yes, there's a chapter in the reference book.  You can build that
> > (uima-docbook-references), until it's released.
> >
> > There are also lots of Javadocs in the main implementing class:
> > XmiCasSerializer.  (It's in this class because it shares a lot of the
> machinery
> > with Xmi serialization).
> >
> >>
> >> I saw that there appear to be quite a few alternative serializations. It
> >> seems to include something like a typesystem definition, but only with a
> >> list of feature names, not their types, if I understood the format
> >> correctly (@featureRefs has a list of the features that are not of
> >> primitive types, it seems).
> > The @featureRefs is only those features which are "references" to other
> feature
> > structures.
> >
> > You're correct, in noticing that the feature "range" types are not
> present.
> > This is because the serialization is to JSON, which supports a native
> > representation of things that are collections (JSON arrays) which could
> be uima
> > Arrays or Lists, and ranges that are boolean are representable by JSON
> true and
> > false values.  There is no distinction that a number is a
> byte/short/int/long,
> > because those are all represented as a JSON "number".  And so forth...
> >
> > The Json serialization for a CAS can optionally include parts of the type
> > system: It can include what the supertypes are for serialized types (to
> enable
> > iterating over a type and all of its subtypes, like Cas iterators
> normally do);
> > it can also identify which slots which appear to have number values are
> actually
> > to be interpreted as references to other feature structures.  Otherwise,
> the
> > serialized form might have a slot "foo" : 111  which is a number value,
> and a
> > slot "bar" : 112 which is a reference to another feature structure whose
> ID is
> > 112.  This extra information (in @featureRefs) permits the user of the
> JSON
> > serialized form a way to distinguish these two case.
> >
> >>
> >> It would be very useful if the serialization allowed one to easily pull
> out
> >> a partial CAS with just a subset of the views (by only including some
> >> subtrees of the JSON structure), and merge views into it.
> > Another optional part of the serialization is a list of views, together
> with an
> > array of numbers each one of which represents a serialized Feature
> Structure
> > that is indexed in that view.
> >> This might be
> >> complicated, as I understand that the views define annotation indices,
> but
> >> the same annotation can be indexed in several views, right?
> >
> > Feature Structures can be classified into "Annotations" and other types
> (not a
> > subtype of Annotation).
> >
> > Annotations are special - they have an implied reference to a particular
> subject
> > of analysis.  So they are restricted to being indexed in the view that is
> > associated with that subject-of-analysis.
> >
> > Other types (not subtypes of Annotation (or more precisely,
> AnnotationBase)) do
> > not have this restriction, and can be indexed in multiple views.
> >
> > See
> >
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa
> .
> >
> > Let me know where the documentation might be improved :-)
> >
> > -Marshall
> >>
> >> -- Jens
> >>
> >>
> >>
> >
>
>