You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2016/05/20 15:14:26 UTC

svn commit: r1744753 - /uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml

Author: schor
Date: Fri May 20 15:14:26 2016
New Revision: 1744753

URL: http://svn.apache.org/viewvc?rev=1744753&view=rev
Log:
no Jira - add table consolidating useful comparative information about the alternative CAS Serialization capabilities

Modified:
    uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml

Modified: uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml?rev=1744753&r1=1744752&r2=1744753&view=diff
==============================================================================
--- uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml (original)
+++ uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml Fri May 20 15:14:26 2016
@@ -485,17 +485,21 @@ ae.destroy();</programlisting></para>
       <title>Saving CASes to file systems or general Streams</title>
       
       <para>The UIMA framework provides multiple APIs to save and restore the contents of a CAS to streams. 
+      Two common uses of this are to save CASes to the file system, and to send CASes to other processes, running
+      on remote systems.</para>
+      
+      <para>
         The CASes can be serialized in multiple formats:
         <itemizedlist>
           <listitem>
             <para>Binary formats:
               <itemizedlist>
                 <listitem>
-                  <para>plain binary:  This is used to communicate with remote services, and also for interfacing with
+                  <para>plain binary: This is used to communicate with remote services, and also for interfacing with
                   annotators written in C/C++ or related languages via the JNI Java interface, from Java</para>
                 </listitem>
                 <listitem>
-                  <para>Two forms of compressed binary.  The recommend one is form 6, which also allows
+                  <para>Compressed binary: There are two forms of compressed binary.  The recommend one is form 6, which also allows
                   type filtering. See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.compress.overview"/>.</para>
                 </listitem>
               </itemizedlist>
@@ -515,6 +519,141 @@ ae.destroy();</programlisting></para>
         </itemizedlist>
       </para>
       
+      <para>Each of these serializations has different capabilities, summarized in the table below.
+       <table frame="all" id="ugr.tug.tbl.serialization_capabilities">
+          <title>Serialization Capabilities</title>
+          <tgroup cols="7" rowsep="1" colsep="1">
+            <colspec colname="c1"/>
+            <colspec colname="c2"/>
+            <colspec colname="c3"/>
+            <colspec colname="c4"/>
+            <colspec colname="c5"/>
+            <colspec colname="c6"/>
+            <colspec colname="c7"/>
+            <thead>
+              <row>
+                <entry align="center"></entry>
+                <entry align="center">XCAS</entry>
+                <entry align="center">XMI</entry>
+                <entry align="center">JSON</entry>
+                <entry align="center">Binary</entry>
+                <entry align="center">Cmpr 4</entry>
+                <entry align="center">Cmrp 6</entry>
+              </row>
+            </thead>
+            <tbody>
+              <row>
+                <entry>Output</entry>
+                <entry>Output Stream</entry>
+                <entry>Output Stream</entry>
+                <entry>Output Stream, File, Writer</entry>
+                <entry>Output Stream</entry>
+                <entry>Output Stream, Data Output Stream, File</entry>
+                <entry>Output Stream, Data Output Stream, File</entry>
+              </row>
+              <row>
+                <entry>Lists/Arrays inline formatting?</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+              </row>
+              <row>
+                <entry>Formatted?</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+              </row>
+              <row>
+                <entry>Type Filtering?</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+              </row>
+              <row>
+                <entry>Delta Cas?</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+              </row>
+              <row>
+                <entry>OOTS?</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+              </row>
+              <row>
+                <entry>Only send indexed + reachable FSs?</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>Yes</entry>
+                <entry>send all</entry>
+                <entry>send all</entry>
+                <entry>Yes</entry>
+              </row>
+              <row>
+                <entry>NameSpace/Schemas?</entry>
+                <entry>-</entry>
+                <entry>Yes</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+                <entry>-</entry>
+              </row>
+            </tbody>
+          </tgroup>
+          
+        </table>
+      </para>
+      
+      <para>In the above table, Cmpr 4 and Cmpr 6 refer to Compressed forms of the serialization.</para>
+      
+      <para>For the XMI and JSON formats, lists and arrays can sometimes be formatted "inline".
+      In this representation, the elements are formatted directly as the value of a particular
+      feature.  This is only done if the arrays and lists are not multiply-referenced.</para>
+      
+      <para>Type Filtering support enables only a subset of the types and/or features to be
+      serialized. An additional type system object is used to specify the types to be included
+      in the serialization.  This can be useful, for instance, when sending a CAS to a remote service,
+      where the remote service only uses a small number of the types and features, to reduce the size
+      of the serialized CAS.</para>
+      
+      <para>Delta Cas support makes use of a "mark" set in the CAS, and only serializes changes in the CAS,
+      both new and modified Feature Structures, that were added or changed after the mark was set.
+      This is useful for remote services, supporting the use-case where a large CAS is sent to the service,
+      which sets the mark in the received CAS, and then adds a small amount of information; 
+      the Delta CAS then serializes only that small amount as the "reply" sent back to the sender.</para>
+      
+      <para>OOTS means "Out of Type System" support, intended to support the use-case where a CAS is being sent
+      to a remote application.  This supports deserializing an incoming CAS where
+      some of the types and/or features may not be present in the receiving CAS's type system.  A "lenient" 
+      option on the deserialization permits the deserialization to proceed, with the out-of-type-system
+      information preserved so that when the CAS is subsequently reserialized (in the use-case, to be 
+      returned back to the sender), the out-of-type-system information is re-merged back into the output stream.
+      </para>
+      
+      <para>The Binary and Compressed Form 4 serializations send all the Feature Structures in the CAS,
+      in the order they were created in the CAS.  The other methods only 
+      send Feature Structures that are reachable, either by 
+      their being in some CAS index, or being referenced 
+      as a feature of another Feature Structure which is reachable.</para>
+      
+      <para>The NameSpace/Schema support allows specifying a set of schemas, each one corresponding to a particular
+      namespace, used in XMI serialization.</para>
       <para>To save an XMI representation of a CAS, use the <literal>serialize</literal> method of the class
         <literal>org.apache.uima.util.XmlCasSerializer</literal>. To save an XCAS representation of a CAS,
         use the class <literal>org.apache.uima.cas.impl.XCASSerializer</literal> instead; see the Javadocs