You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2013/03/27 19:52:25 UTC

svn commit: r1461791 - in /uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook: tug.type_mapping.xml tutorials_and_users_guides.xml

Author: schor
Date: Wed Mar 27 18:52:24 2013
New Revision: 1461791

URL: http://svn.apache.org/r1461791
Log:
[UIMA-2498] add some documentation on type mapping in compressed serialization/ deserialization

Added:
    uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml   (with props)
Modified:
    uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml

Added: uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml?rev=1461791&view=auto
==============================================================================
--- uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml (added)
+++ uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml Wed Mar 27 18:52:24 2013
@@ -0,0 +1,142 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.type_mapping">
+  <title>Managing different Type Systems</title>
+  <titleabbrev>Managing different TypeSystems</titleabbrev>
+  
+  <section id="ugr.tug.type_mapping.type_merging">
+    <title>Annotators, Type Merging, and Remotes</title>
+    
+	  <para>UIMA supports combining Annotators that have different type systems.
+	  This is normally done by "merging" the two type systems when the Annotators
+	  are first loaded and instantiated. The merge process produces a logical
+	  Union of the two; types having the same name have their feature sets combined.
+	  The combining rules say that the range of same-named feature slots must be the same.
+	  This combined type system is then used for the CAS that will be passed to
+	  all of the annotators.   Details of type merging are described in
+	  <olink targetdoc="%uima_docs_ref;" targetptr="ugr.ref.cas.typemerging"/>.
+	  </para>
+	  
+	  <para>This approach (of merging the type systems together) works well for
+	  annotators that are run together in one UIMA pipeline instantiation in one
+	  machine.  Extensions are needed when UIMA is scaled out where the pipeline
+	  includes remote annotators, acting as servers, serving
+	  potentially multiple clients, each of which might have a different type system.
+	  Clients, when initializing, query all their remote server parts to get their
+	  type system definition, and merges them together with its own 
+	  to make the type system for the CAS that will be sent among all of those
+	  annotators. The Client's TypeSystem is the union of
+	  all of its annotators, even when some of the them are remote.
+	  </para>
+  </section>
+  
+  <section id="ugr.tug.type_mapping.remote_support">
+    <title>Supporting Remote Annotators</title>
+  
+	  <para>Servers, in providing service to multiple clients, may receive CASes from
+	  different Clients having different type systems.  UIMA has implemented several
+	  different approaches to support this.</para>
+	  
+	  <para>
+	  Base UIMA includes support for SOAP and VINCI
+	  protocols.  These send the Client's type system definition (which is 
+	  guaranteed to be a superset of the Server's), along with the CAS.  The Server
+	  Annotators will get a "typeSystemInit" call to let them reinitialize their type
+	  system information to correspond to the new CAS coming in.  
+	  </para>
+	  
+	  <para>When a server is a UIMA-AS server, the communication sends CASes without
+	  type system information.  Several protocols and variations are possible in this case.
+	  </para>
+	  <para>
+	  When using XMI serialization for sending/receiving CASes, the Client
+	  sends all reachable Feature Structures to the server.  The Server can receive a CAS 
+	  having instances of types it doesn't know about, or perhaps feature-slots it doesn't
+	  know about within a type it does know about.  In these cases, the Server, while
+	  deserializing, holds aside those type instances and/or feature instances that 
+	  are not defined it the Server's type system.  When the Server returns the CAS 
+	  back to the client, it combines
+	  those held-out types and/or features with the serialized FeatureStructures it sends back.
+	  </para>
+	  <para>
+    This approach avoids the need to send the type system along with the CAS on every
+    invocation of a remote part of a UIMA Pipeline.
+	  </para>
+  </section>
+  
+  <section id="ugr.tug.type_mapping.allowed_differences">
+    <title>Type filtering support in Binary Compressed Serialization/Deserialization</title>
+    
+    <para>The built-in support for Binary Compressed Serialization/Deserialization
+    supports filtering between non-identical type systems.  The filtering is designed
+    so that things (types and/or features) that are defined in one type system
+    but not in another are not sent (when serializing) nor received 
+    (when deserializing).  When deserializing, non-received features receive 0 
+    as their value.  For built-in types, like integer, float, etc., this is the 
+    number 0. </para>
+    
+    <para>Some kinds of type mappings cannot be supported, and will signal errors.
+    The two types being mapped between must be "mergable" according to the normal
+    type merger rules (see above); otherwise, errors are signaled.</para>
+  </section>
+  
+  <section id="ugr.tug.type_mapping.compressed">
+    <title>Remote Services support with Compressed Binary Serialization</title>
+    
+    <para>Using uncompressed Binary Serialization protocols for communicating to 
+    remote UIMA-AS services, requires that the Client and Server's type systems
+    be identical.  Compressed Binary Serialization protocols support
+    Server type systems which are a subset of the Clients.  Types and/or features 
+    not in the Server's type system are not sent to the Server.  Because of this, there's
+    no need to hold-aside types and features at the Server, as is the case with Xmi
+    transports (see above).       
+    </para>
+    
+    <para>Typically, for efficiency reasons, services use the Delta-CAS protocol to return the 
+    CAS back to the Client.  Delta protocols send only newly created Feature Structures, 
+    along with modifications made to existing Feature Structures.
+    </para>
+  </section>
+  
+  <section id="ugr.tug.type_filtering.compressed_file">
+    <title>Compressed Binary serialization to/from files</title>
+    
+    <para>When invoking compressed binary serialization to a file, you can specify
+    a target type system which is a subset of the original type system.  The
+    serialization will exclude types and features not in the target, when 
+    serializing.  You can use this to filter the CAS to serialize out just the parts
+    you want to.
+    </para>
+    
+    <para>When using binary compressed deserialization from a file, the target type system
+    must be the one that went with the target when it was serialized.  The source
+    type system can be different; if it is missing types/features, these will be 
+    filtered during deserialization.  If it has additional features, these will be 
+    set to 0 (the default value) in the CAS heap.  For numeric features, this means
+    the value will be 0 (including floating point 0); for feature structure references
+    and strings, the value will be null.
+    </para>
+  </section>
+</chapter>

Propchange: uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Modified: uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml?rev=1461791&r1=1461790&r2=1461791&view=diff
==============================================================================
--- uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml (original)
+++ uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml Wed Mar 27 18:52:24 2013
@@ -33,5 +33,6 @@ under the License.
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.multi_views.xml"/>
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cas_multiplier.xml"/>
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.xmi_emf.xml"/>
-  <!-- xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.configuration.xml"/-->  
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.configuration.xml"/>  
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.type_mapping.xml"/>
 </book>