You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Chen, Pei" <Pe...@childrens.harvard.edu> on 2013/01/03 22:25:34 UTC

UIMA AS Binary vs XMI serialization

Hi,
I was just curious on others' experience with the binary serialization.

My original issue was documents which contained invalid XML chars, so I decided to try the binary serialization option within AS instead of replacing/modifing the special chars in the original docs.  As a side effect, I noticed that it's magnitudes of order faster;
Just curious if there were any reasons why not make this  the recommended/default when sending CAS's around within AS.  Are there any downsides to be aware of (assuming that UIMA will have wrappers to abstract this from users for all of their implementations.)

Caused by: org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0 character: , 0x0
        at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)

--Pei

Re: UIMA AS Binary vs XMI serialization

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Pei,

The binary serialization option requires that client and service have
identical CAS TypeSystem definitions. With Xmi it is only required
that the service's definition is a proper subset of the client. Note
that the client's TypeSystem will automatically integrate that of all
delegate services, so the potential problem here is for the service to
have a type with a feature that is incompatible with the definition
for that type on the client. An example would be for feature named
"foo" to be a float on the client and an integer on the server.

The best approach to be safe for binary serialization would be for all
analytic components to import their TypeSystem definitions from a
common place.

Eddie

On Thu, Jan 3, 2013 at 4:25 PM, Chen, Pei
<Pe...@childrens.harvard.edu> wrote:
> Hi,
> I was just curious on others' experience with the binary serialization.
>
> My original issue was documents which contained invalid XML chars, so I decided to try the binary serialization option within AS instead of replacing/modifing the special chars in the original docs.  As a side effect, I noticed that it's magnitudes of order faster;
> Just curious if there were any reasons why not make this  the recommended/default when sending CAS's around within AS.  Are there any downsides to be aware of (assuming that UIMA will have wrappers to abstract this from users for all of their implementations.)
>
> Caused by: org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0 character: , 0x0
>         at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
>
> --Pei