You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2016/08/17 15:37:45 UTC
svn commit: r1756643 -
/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
Author: schor
Date: Wed Aug 17 15:37:45 2016
New Revision: 1756643
URL: http://svn.apache.org/viewvc?rev=1756643&view=rev
Log:
[UIMA-4685] update class javadoc
Modified:
uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
Modified: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java?rev=1756643&r1=1756642&r2=1756643&view=diff
==============================================================================
--- uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java (original)
+++ uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java Wed Aug 17 15:37:45 2016
@@ -50,66 +50,107 @@ import org.apache.uima.cas.impl.XmiCasSe
import org.xml.sax.SAXException;
/**
- * a collection of static methods aimed at making it easy to
- * - save and load CASes, and to
- * - optionally include their Type Systems (abbreviated TS)
- * and perhaps also their index definitions based on those type systems (abbreviated TSI).
- *
- * * The TSI's purpose: these are used to replace a CAS's existing type system and index definition.
- *
- * * The TS's purpose: these are only used with Compressed Form 6 to specify the type system used in the serialized data,
- * in order to allow deserializing into some other type system in the CAS, leniently.
- *
- * ** Note: this use does **not** replace the CAS's type system, like the TSI use does.
- *
- * There are several serialization formats supported; these are listed in the Java enum SerialFormat,
- * together with their preferred file extension name.
- *
- * The APIs for loading attempt to automatically use the appropriate deserializers, based on the input data format.
- * To select the right deserializer, first, the file extension name (if available) is used:
- * - xmi: XMI format
- * - xcas: XCAS format
- * - xml: XCAS format
- *
- * If none of these apply, then the first few bytes of the input are examined to determine the format.
- *
- * For loading, the inputs may be supplied as URLs or as InputStream.
- * Note: You can use Files or Paths by converting these to URLs:
- * URL url = a_path.toUri().toURL();
- * URL url = a_file.toUri().toURL();
- *
- * When loading, an optional CasLoadMode enum value may be specified. See the Javadocs for that for details.
- *
- * When TS or TSI information is saved, it is either saved in the same destination (e.g. file or stream), or in a separate one.
- * - The serialization formats ending in _TSI support saving the TSI in the same destination.
- * The save APIs for other formats can optionally also save the TSI into a separate (second) OutputStream.
- *
- * Summary of the APIs for saving:
- * save(CAS, OutputStream, SerialFormat)
- * save(CAS, OutputStream, OutputStream, SerialFormat) - extra outputStream for saving the TSI
- *
- * Note: there is no API for saving in COMPRESSED_FILTERED with a filtering type system; to do that, use the
- * methods in Serialization.serializeWithCompression
- *
- * Summary of APIs for loading:
- * load(URL , CAS)
- * load(InputStream, CAS)
- *
- * load(URL , URL , CAS, CasLoadMode) - the second URL is for loading a separately-stored TSI
- * load(InputStream, InputStream, CAS, CasLoadMode)
- *
- * When loading, a TSI or a TS may be available. If available, it is used for one of two purposes:
- * - except for Compressed Form 6, it must be a TSI and
- * it is used to reset the CAS to have the specified type system and index specification
- * - for Compressed Form 6 (only) it is used to specify the type system of the serialized data, to enable
- * form 6's lenient deserialization.
+ * <p>A collection of static methods aimed at making it easy to</p>
+ * <ul>
+ * <li>save and load CASes, and to</li>
+ * <li>optionally include the CAS's Type System (abbreviated TS (only available for Compressed Form 6)) and optionally also include the CAS's indexes definition.</li>
+ * <li>The combinaton of Type System and Indexes definition is called TSI.
+ * <ul>
+ * <li>The TSI's purpose: to replace the CAS's existing type system and index definition.</li>
+ * <li>The TS's purpose: to specify the type system used in the serialized data for format Compressed Form 6, in order to allow deserializing into some other type system in the CAS, leniently.</li>
+ * </ul>
+ * </li>
+ * </ul>
+ *
+ * <p>TSI information can be</p>
+ * <ul>
+ * <li>embedded</li>
+ * <li>externally supplied (via another input source to the load)</li>
+ * <li>both embedded and externally supplied. In this case the embedded takes precedence.</li>
+ * </ul>
+ *
+ * <p>TS information is available embedded, for COMPRESSED_FILTERED_TS format,
+ * and also from embedded or external TSI information (since it also contains the type system information).</p>
+ *
+ * <p>When an external TSI is supplied while loading Compressed Form 6,</p>
+ * <ul>
+ * <li>for COMPRESSED_FILTERED_TS
+ * <ul>
+ * <li>it uses the embedded TS for decoding</li>
+ * <li>it uses the external TSI to replace the CAS's existing type system and index definition.</li>
+ * </ul>
+ * </li>
+ * <li>for COMPRESSED_FILTERED_TSI
+ * <ul>
+ * <li>the external TSI is ignored, the embedded on overrides, both to replace the CAS's existing type system and index definition. and for decoding</li>
+ * </ul>
+ * </li>
+ * <li>for COMPRESSED_FILTERED
+ * <ul>
+ * <li>the external TSI's type system part is used for deocding</li>
+ * <li>if CasLoadMode == REINIT, the external TSI is also used to replace the CAS's existing type system and index definition.
+ * <ul>
+ * <li>Note: this is the only use for CasLoadMode.REINIT</li>
+ * </ul>
+ * </li>
+ * </ul>
+ * </li>
+ * </ul>
+ *
+ * <p>Compressed Form 6 loading decoding type system is picked from these sources, in this order:</p>
+ * <ul>
+ * <li>an embedded TS or TSI</li>
+ * <li>an external TSI</li>
+ * <li>the CAS's type system</li>
+ * </ul>
+ *
+ * <p>The serialization formats supported here are specified in the SerialFormat enum.</p>
+ *
+ * <p>The <code>load </code>api's automatically use the appropriate deserializers, based on the input data format.</p>
+ *
+ * <p>Loading inputs may be supplied as URLs or as an appropriately buffered InputStream.</p>
+ *
+ * <p>Note: you can use Files or Paths by converting these to URLs:</p>
+ * <ul>
+ * <li><code>URL url = a_path.toUri().toURL();</code></li>
+ * <li><code>URL url = a_file.toUri().toURL();</code></li>
+ * </ul>
+ *
+ * <p>When loading, an optional CasLoadMode enum value maybe specified to indicate</p>
+ * <ul>
+ * <li>LENIENT loading - used with XCas and XMI data data sources to silently ignore types and features present in the serialized form, but not in the receiving type system.</li>
+ * <li>REINIT - used with Compressed Form 6 loading to indicate that if no embedded TSI information is available, the external TSI is to be used to replace the CAS's existing type system and index definition.</li>
+ * </ul>
+ *
+ * <p style="padding-left: 30px;">For more details, see the Javadocs for CasLoadMode.</p>
+ *
+ * <p>When TS or TSI information is saved, it is either saved in the same destination (e.g. file or stream), or in a separate one.</p>
+ * <ul>
+ * <li>The serialization formats ending in _TSI and _TS support saving the TSI (or TS) in the same destination.</li>
+ * <li>The save APIs for other formats can optionally also save the TSI into a separate (second) OutputStream.</li>
+ * </ul>
+ *
+ * <p>Summary of APIs for saving:</p>
+ * <p style="padding-left: 30px;">
+ * <code>save(aCAS, outputStream, aSerialFormat)</code><br />
+ * <code>save(aCAS, outputStream, tsiOutputStream, aSerialFormat)</code></p>
+ *
+ * <p>Summary of APIs for loading:</p>
+ * <p style="padding-left: 30px;">
+ * <code>load(URL , CAS)</code><br />
+ * <code>load(InputStream, CAS)</code></p>
+ * <p style="padding-left: 30px;">
+ * <code>load(URL , URL , CAS, CasLoadMode) - the second URL is for loading a separately-stored TSI</code><br />
+ * <code>load(InputStream, InputStream, CAS, CasLoadMode)</code><br />
+ * <code>load(URL , URL , CAS, lenient) - lenient is used to set the CasLoadMode to LENIENT or DEFAULT</code><br />
+ * <code>load(InputStream, InputStream, CAS, lenient)</code></p>
*/
public class CasIOUtils {
/**
- * Loads a Cas from a URL source. The format is determined from the file extension name and the content.
- * For formats ending with _TSI, the type system and index definitions are reset.
+ * Loads a Cas from a URL source.
+ * For SerialFormats ending with _TSI the type system and index definitions are reset.
* CasLoadMode is DEFAULT. To specify this explicitly, use the 4 argument form.
*
* @param casUrl