You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2013/03/27 19:52:25 UTC
svn commit: r1461791 - in
/uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook:
tug.type_mapping.xml tutorials_and_users_guides.xml
Author: schor
Date: Wed Mar 27 18:52:24 2013
New Revision: 1461791
URL: http://svn.apache.org/r1461791
Log:
[UIMA-2498] add some documentation on type mapping in compressed serialization/ deserialization
Added:
uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml (with props)
Modified:
uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
Added: uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml?rev=1461791&view=auto
==============================================================================
--- uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml (added)
+++ uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml Wed Mar 27 18:52:24 2013
@@ -0,0 +1,142 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.type_mapping">
+ <title>Managing different Type Systems</title>
+ <titleabbrev>Managing different TypeSystems</titleabbrev>
+
+ <section id="ugr.tug.type_mapping.type_merging">
+ <title>Annotators, Type Merging, and Remotes</title>
+
+ <para>UIMA supports combining Annotators that have different type systems.
+ This is normally done by "merging" the two type systems when the Annotators
+ are first loaded and instantiated. The merge process produces a logical
+ Union of the two; types having the same name have their feature sets combined.
+ The combining rules say that the range of same-named feature slots must be the same.
+ This combined type system is then used for the CAS that will be passed to
+ all of the annotators. Details of type merging are described in
+ <olink targetdoc="%uima_docs_ref;" targetptr="ugr.ref.cas.typemerging"/>.
+ </para>
+
+ <para>This approach (of merging the type systems together) works well for
+ annotators that are run together in one UIMA pipeline instantiation in one
+ machine. Extensions are needed when UIMA is scaled out where the pipeline
+ includes remote annotators, acting as servers, serving
+ potentially multiple clients, each of which might have a different type system.
+ Clients, when initializing, query all their remote server parts to get their
+ type system definition, and merges them together with its own
+ to make the type system for the CAS that will be sent among all of those
+ annotators. The Client's TypeSystem is the union of
+ all of its annotators, even when some of the them are remote.
+ </para>
+ </section>
+
+ <section id="ugr.tug.type_mapping.remote_support">
+ <title>Supporting Remote Annotators</title>
+
+ <para>Servers, in providing service to multiple clients, may receive CASes from
+ different Clients having different type systems. UIMA has implemented several
+ different approaches to support this.</para>
+
+ <para>
+ Base UIMA includes support for SOAP and VINCI
+ protocols. These send the Client's type system definition (which is
+ guaranteed to be a superset of the Server's), along with the CAS. The Server
+ Annotators will get a "typeSystemInit" call to let them reinitialize their type
+ system information to correspond to the new CAS coming in.
+ </para>
+
+ <para>When a server is a UIMA-AS server, the communication sends CASes without
+ type system information. Several protocols and variations are possible in this case.
+ </para>
+ <para>
+ When using XMI serialization for sending/receiving CASes, the Client
+ sends all reachable Feature Structures to the server. The Server can receive a CAS
+ having instances of types it doesn't know about, or perhaps feature-slots it doesn't
+ know about within a type it does know about. In these cases, the Server, while
+ deserializing, holds aside those type instances and/or feature instances that
+ are not defined it the Server's type system. When the Server returns the CAS
+ back to the client, it combines
+ those held-out types and/or features with the serialized FeatureStructures it sends back.
+ </para>
+ <para>
+ This approach avoids the need to send the type system along with the CAS on every
+ invocation of a remote part of a UIMA Pipeline.
+ </para>
+ </section>
+
+ <section id="ugr.tug.type_mapping.allowed_differences">
+ <title>Type filtering support in Binary Compressed Serialization/Deserialization</title>
+
+ <para>The built-in support for Binary Compressed Serialization/Deserialization
+ supports filtering between non-identical type systems. The filtering is designed
+ so that things (types and/or features) that are defined in one type system
+ but not in another are not sent (when serializing) nor received
+ (when deserializing). When deserializing, non-received features receive 0
+ as their value. For built-in types, like integer, float, etc., this is the
+ number 0. </para>
+
+ <para>Some kinds of type mappings cannot be supported, and will signal errors.
+ The two types being mapped between must be "mergable" according to the normal
+ type merger rules (see above); otherwise, errors are signaled.</para>
+ </section>
+
+ <section id="ugr.tug.type_mapping.compressed">
+ <title>Remote Services support with Compressed Binary Serialization</title>
+
+ <para>Using uncompressed Binary Serialization protocols for communicating to
+ remote UIMA-AS services, requires that the Client and Server's type systems
+ be identical. Compressed Binary Serialization protocols support
+ Server type systems which are a subset of the Clients. Types and/or features
+ not in the Server's type system are not sent to the Server. Because of this, there's
+ no need to hold-aside types and features at the Server, as is the case with Xmi
+ transports (see above).
+ </para>
+
+ <para>Typically, for efficiency reasons, services use the Delta-CAS protocol to return the
+ CAS back to the Client. Delta protocols send only newly created Feature Structures,
+ along with modifications made to existing Feature Structures.
+ </para>
+ </section>
+
+ <section id="ugr.tug.type_filtering.compressed_file">
+ <title>Compressed Binary serialization to/from files</title>
+
+ <para>When invoking compressed binary serialization to a file, you can specify
+ a target type system which is a subset of the original type system. The
+ serialization will exclude types and features not in the target, when
+ serializing. You can use this to filter the CAS to serialize out just the parts
+ you want to.
+ </para>
+
+ <para>When using binary compressed deserialization from a file, the target type system
+ must be the one that went with the target when it was serialized. The source
+ type system can be different; if it is missing types/features, these will be
+ filtered during deserialization. If it has additional features, these will be
+ set to 0 (the default value) in the CAS heap. For numeric features, this means
+ the value will be 0 (including floating point 0); for feature structure references
+ and strings, the value will be null.
+ </para>
+ </section>
+</chapter>
Propchange: uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tug.type_mapping.xml
------------------------------------------------------------------------------
svn:eol-style = native
Modified: uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml?rev=1461791&r1=1461790&r2=1461791&view=diff
==============================================================================
--- uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml (original)
+++ uima/uimaj/branches/filteredCompress-uima-2498/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml Wed Mar 27 18:52:24 2013
@@ -33,5 +33,6 @@ under the License.
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.multi_views.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cas_multiplier.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.xmi_emf.xml"/>
- <!-- xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.configuration.xml"/-->
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.configuration.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.type_mapping.xml"/>
</book>