You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Peter Klügl <pe...@averbis.com> on 2016/07/18 11:43:01 UTC
Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src:
main/java/org/apache/uima/util/CasIOUtils.java
main/java/org/apache/uima/util/SerializationFormat.java
test/java/org/apache/uima/util/CasIOUtilsTest.java
Hi,
I added a first prototype of the utils class. Suggestions and comments
are welcome. I'll proceed with the CAS Editor for now...
Best,
Peter
Am 18.07.2016 um 13:40 schrieb pkluegl@apache.org:
> Author: pkluegl
> Date: Mon Jul 18 11:40:47 2016
> New Revision: 1753208
>
> URL: http://svn.apache.org/viewvc?rev=1753208&view=rev
> Log:
> UIMA-4685
> - added SerializationFormat enum
> - added CasIOUtils for loading/storing with different formats
> - added test
>
> Added:
> uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java (with props)
> uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java (with props)
> uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java (with props)
>
> Added: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
> URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java?rev=1753208&view=auto
> ==============================================================================
> --- uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java (added)
> +++ uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java Mon Jul 18 11:40:47 2016
> @@ -0,0 +1,507 @@
> +/*
> + * Licensed to the Apache Software Foundation (ASF) under one
> + * or more contributor license agreements. See the NOTICE file
> + * distributed with this work for additional information
> + * regarding copyright ownership. The ASF licenses this file
> + * to you under the Apache License, Version 2.0 (the
> + * "License"); you may not use this file except in compliance
> + * with the License. You may obtain a copy of the License at
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing,
> + * software distributed under the License is distributed on an
> + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> + * KIND, either express or implied. See the License for the
> + * specific language governing permissions and limitations
> + * under the License.
> + */
> +package org.apache.uima.util;
> +
> +import static org.apache.uima.cas.impl.Serialization.deserializeCAS;
> +import static org.apache.uima.cas.impl.Serialization.deserializeCASComplete;
> +import static org.apache.uima.cas.impl.Serialization.serializeCAS;
> +import static org.apache.uima.cas.impl.Serialization.serializeCASComplete;
> +import static org.apache.uima.cas.impl.Serialization.serializeCASMgr;
> +import static org.apache.uima.cas.impl.Serialization.serializeWithCompression;
> +
> +import java.io.BufferedInputStream;
> +import java.io.DataInputStream;
> +import java.io.DataOutputStream;
> +import java.io.File;
> +import java.io.IOException;
> +import java.io.InputStream;
> +import java.io.ObjectInputStream;
> +import java.io.ObjectOutputStream;
> +import java.io.OutputStream;
> +import java.net.URL;
> +import java.nio.file.Path;
> +import java.util.Arrays;
> +
> +import org.apache.uima.cas.CAS;
> +import org.apache.uima.cas.impl.CASCompleteSerializer;
> +import org.apache.uima.cas.impl.CASImpl;
> +import org.apache.uima.cas.impl.CASMgrSerializer;
> +import org.apache.uima.cas.impl.CASSerializer;
> +import org.apache.uima.cas.impl.TypeSystemImpl;
> +import org.apache.uima.cas.impl.XCASDeserializer;
> +import org.apache.uima.cas.impl.XCASSerializer;
> +import org.apache.uima.cas.impl.XmiCasDeserializer;
> +import org.apache.uima.cas.impl.XmiCasSerializer;
> +import org.apache.uima.resource.ResourceInitializationException;
> +import org.xml.sax.SAXException;
> +
> +public class CasIOUtils {
> +
> + public static final byte[] UIMA_TS_HEADER = new byte[] { 'U', 'I', 'M', 'A', 'T', 'S' };
> +
> + public static final byte[] UIMA_HEADER = new byte[] { 'U', 'I', 'M', 'A' };
> +
> + /**
> + *
> + * @param casPath
> + * The path containing the CAS
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(Path casPath, CAS aCAS) throws IOException {
> +
> + return load(casPath, null, aCAS);
> + }
> +
> + /**
> + *
> + * @param casPath
> + * The path containing the CAS
> + * @param tsPath
> + * The optional path containing the type system
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(Path casPath, Path tsPath, CAS aCAS) throws IOException {
> +
> + URL casUrl = casPath.toUri().toURL();
> + URL tsUrl = tsPath == null ? null : tsPath.toUri().toURL();
> + return load(casUrl, tsUrl, aCAS);
> + }
> +
> + /**
> + *
> + * @param casFile
> + * The file containing the CAS
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(File casFile, CAS aCAS) throws IOException {
> +
> + return load(casFile, null, aCAS);
> + }
> +
> + /**
> + *
> + * @param casFile
> + * The file containing the CAS
> + * @param tsFile
> + * The optional file containing the type system
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(File casFile, File tsFile, CAS aCAS) throws IOException {
> +
> + URL casUrl = casFile.toURI().toURL();
> + URL tsUrl = tsFile == null ? null : tsFile.toURI().toURL();
> + return load(casUrl, tsUrl, aCAS);
> + }
> +
> + /**
> + *
> + * @param casUrl
> + * The url containing the CAS
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(URL casUrl, CAS aCAS) throws IOException {
> +
> + return load(casUrl, null, aCAS);
> + }
> +
> + /**
> + *
> + * @param casUrl
> + * The url containing the CAS
> + * @param tsUrl
> + * The optional url containing the type system
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(URL casUrl, URL tsUrl, CAS aCAS) throws IOException {
> + String path = casUrl.getPath().toLowerCase();
> + if (path.endsWith(".xmi")) {
> + try {
> + XmiCasDeserializer.deserialize(casUrl.openStream(), aCAS, true);
> + return SerializationFormat.XMI;
> + } catch (SAXException e) {
> + throw new IOException(e);
> + }
> + } else if (path.endsWith(".xcas") || path.endsWith(".xml")) {
> + try {
> + XCASDeserializer.deserialize(casUrl.openStream(), aCAS, true);
> + return SerializationFormat.XCAS;
> + } catch (SAXException e) {
> + throw new IOException(e);
> + }
> + }
> +return loadBinary(casUrl.openStream(), tsUrl == null ? null : tsUrl.openStream(), aCAS);
> + }
> +
> + /**
> + * This method tries to guess the format of the input stream. It supports binary format and XMI
> + * but not XCAS
> + *
> + * @param casInputStream
> + * The input stream containing the CAS
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(InputStream casInputStream, CAS aCAS) throws IOException {
> + return load(casInputStream, null, aCAS);
> + }
> +
> + /**
> + * This method tries to guess the format of the input stream. It supports binary format and XMI
> + * but not XCAS
> + *
> + * @param casInputStream
> + * The input stream containing the CAS
> + * @param tsInputStream
> + * The optional input stream containing the type system
> + * @param aCAS
> + * The CAS that should be filled
> + * @throws IOException
> + */
> + public static SerializationFormat load(InputStream casInputStream, InputStream tsInputStream, CAS aCAS)
> + throws IOException {
> + BufferedInputStream bis = new BufferedInputStream(casInputStream);
> + bis.mark(32);
> + byte[] headerXml = new byte[16];
> + bis.read(headerXml);
> + bis.reset();
> + String start = new String(headerXml);
> + if (start.startsWith("<?xml ")) {
> + try {
> + XmiCasDeserializer.deserialize(bis, aCAS, true);
> + return SerializationFormat.XMI;
> + } catch (SAXException e) {
> + throw new IOException(e);
> + }
> + }
> +return loadBinary(bis, tsInputStream, aCAS);
> + }
> +
> + /**
> + * Read CAS from the specified stream.
> + *
> + * @param is
> + * The input stream of the CAS
> + * @param aCAS
> + * the CAS in which the inpout stream will be deserialized
> + * @throws IOException
> + */
> + public static SerializationFormat loadBinary(InputStream is, CAS aCAS) throws IOException {
> + return loadBinary(is, (CASMgrSerializer) null, aCAS);
> + }
> +
> + /**
> + * Read CAS from the specified stream.
> + *
> + * @param is
> + * The input stream of the CAS
> + * @param typeIS
> + * Optional stream from which typesystem information may be read. This is only used if
> + * the binary format read from the primary input stream does not already contain
> + * typesystem information.
> + * @param aCAS
> + * the CAS in which the input stream will be deserialized
> + * @throws IOException
> + */
> + public static SerializationFormat loadBinary(InputStream is, InputStream typeIS, CAS aCAS) throws IOException {
> + CASMgrSerializer casMgr = null;
> + if (typeIS != null) {
> + casMgr = readCasManager(typeIS);
> + }
> +
> + return loadBinary(is, casMgr, aCAS);
> + }
> +
> + /**
> + * Read CAS from the specified stream.
> + *
> + * @param is
> + * The input stream of the CAS
> + * @param casMgr
> + * Optional CASMgrSerializer. This is only used if the binary format read from the
> + * primary input stream does not already contain typesystem information.
> + * @param aCAS
> + * the CAS in which the input stream will be deserialized
> + * @throws IOException
> + */
> + public static SerializationFormat loadBinary(InputStream is, CASMgrSerializer casMgr, CAS aCAS)
> + throws IOException {
> + try {
> + BufferedInputStream bis = new BufferedInputStream(is);
> + TypeSystemImpl ts = null;
> +
> + // Check if this is original UIMA CAS format or an extended format with type system
> + bis.mark(32);
> + DataInputStream dis = new DataInputStream(bis);
> +
> + byte[] header = new byte[UIMA_TS_HEADER.length];
> + dis.read(header);
> +
> + // If it is UIMA with type system format, read the type system
> + if (Arrays.equals(header, UIMA_TS_HEADER)) {
> + ObjectInputStream ois = new ObjectInputStream(bis);
> + CASMgrSerializer casMgrSerializer = (CASMgrSerializer) ois.readObject();
> + ts = casMgrSerializer.getTypeSystem();
> + ts.commit();
> + } else {
> + bis.reset();
> + }
> +
> + if (ts != null) {
> + // Only format 6 can have type system information
> + deserializeCAS(aCAS, bis, ts, null);
> + return SerializationFormat.S6p;
> + } else {
> +
> + // Check if this is a UIMA binary CAS stream
> + byte[] header4 = new byte[UIMA_HEADER.length];
> + dis.read(header4);
> +
> + if (header4[0] != 'U') {
> + // ArrayUtils.reverse(header4);
> + for (int i = 0; i < header4.length / 2; i++) {
> + byte temp = header4[i];
> + header4[i] = header4[header4.length - i - 1];
> + header4[header4.length - i - 1] = temp;
> + }
> + }
> +
> + // Peek into the version
> + int version = dis.readInt();
> + int version1 = dis.readInt();
> + bis.reset();
> +
> + if (Arrays.equals(header4, UIMA_HEADER)) {
> + // It is a binary CAS stream
> +
> + if ((version & 4) == 4 && (version1 != 0)) {
> + // This is a form 6
> + if (ts == null && casMgr != null) {
> + // If there was not type system in the file but one is set, then load it
> + ts = casMgr.getTypeSystem();
> + ts.commit();
> + }
> + deserializeCAS(aCAS, bis, ts, null);
> + return SerializationFormat.S6;
> + } else {
> + // This is a form 0 or 4
> + deserializeCAS(aCAS, bis);
> + if(version == 4) {
> + return SerializationFormat.S4;
> + }
> + return SerializationFormat.S0;
> + }
> + } else {
> + // If it is not a UIMA binary CAS stream and not xml, assume it is output from
> + // SerializedCasWriter
> + ObjectInputStream ois = new ObjectInputStream(bis);
> + Object object = ois.readObject();
> + if (object instanceof CASCompleteSerializer) {
> + CASCompleteSerializer serializer = (CASCompleteSerializer) object;
> + deserializeCASComplete(serializer, (CASImpl) aCAS);
> + return SerializationFormat.Sp;
> + } else if (object instanceof CASSerializer) {
> + CASCompleteSerializer serializer;
> + if (casMgr != null) {
> + // Annotations and CAS metadata saved separately
> + serializer = new CASCompleteSerializer();
> + serializer.setCasMgrSerializer(casMgr);
> + serializer.setCasSerializer((CASSerializer) object);
> + } else {
> + // Expecting that CAS is already initialized as required
> + serializer = serializeCASComplete((CASImpl) aCAS);
> + serializer.setCasSerializer((CASSerializer) object);
> + }
> + deserializeCASComplete(serializer, (CASImpl) aCAS);
> + return SerializationFormat.S;
> + } else {
> + throw new IOException("Unknown serialized object found with type ["
> + + object.getClass().getName() + "]");
> + }
> + }
> + }
> + } catch (ResourceInitializationException e) {
> + throw new IOException(e);
> + } catch (ClassNotFoundException e) {
> + throw new IOException(e);
> + } finally {
> + if (is != null) {
> + is.close();
> + }
> + }
> +
> + }
> +
> + /**
> + * Write the CAS in the specified format.
> + *
> + * @param aCas
> + * The CAS that should be serialized and stored
> + * @param docOS
> + * The output stream for the CAS
> + * @param formatName
> + * The format string in which the CAS should be stored.
> + * @throws IOException
> + */
> + public static void save(CAS aCas, OutputStream docOS, String formatName) throws IOException {
> + SerializationFormat format = SerializationFormat.valueOf(formatName);
> + save(aCas, docOS, null, format);
> + }
> +
> + /**
> + * Write the CAS in the specified format.
> + *
> + * @param aCas
> + * The CAS that should be serialized and stored
> + * @param docOS
> + * The output stream for the CAS
> + * @param format
> + * The SerializationFormat in which the CAS should be stored.
> + * @throws IOException
> + */
> + public static void save(CAS aCas, OutputStream docOS, SerializationFormat format)
> + throws IOException {
> + save(aCas, docOS, null, format);
> + }
> +
> + /**
> + * Write the CAS in the specified format. If the format does not include typesystem information
> + * and the optional output stream of the typesystem is specified, then the typesystem information
> + * is written there.
> + *
> + * @param aCas
> + * The CAS that should be serialized and stored
> + * @param docOS
> + * The output stream for the CAS
> + * @param typeOS
> + * Optional output stream for type system information. Only used if the format does not
> + * support storing typesystem information directly in the main output file.
> + * @param format
> + * The SerializationFormat in which the CAS should be stored.
> + * @throws IOException
> + */
> + public static void save(CAS aCas, OutputStream docOS, OutputStream typeOS,
> + SerializationFormat format) throws IOException {
> + boolean typeSystemWritten = false;
> + try {
> + switch (format) {
> + case XMI:
> + XmiCasSerializer.serialize(aCas, docOS);
> + break;
> + case XCAS:
> + XCASSerializer xcasSerializer = new XCASSerializer(aCas.getTypeSystem());
> + XMLSerializer xmlSerialzer = new XMLSerializer(docOS, true);
> + xcasSerializer.serialize(aCas, xmlSerialzer.getContentHandler());
> + break;
> + case S:
> + // Java-serialized CAS without type system
> + {
> + CASSerializer serializer = new CASSerializer();
> + serializer.addCAS((CASImpl) aCas);
> + ObjectOutputStream objOS = new ObjectOutputStream(docOS);
> + objOS.writeObject(serializer);
> + objOS.flush();
> + }
> + break;
> + case Sp:
> + // Java-serialized CAS with type system
> + {
> + ObjectOutputStream objOS = new ObjectOutputStream(docOS);
> + CASCompleteSerializer serializer = serializeCASComplete((CASImpl) aCas);
> + objOS.writeObject(serializer);
> + objOS.flush();
> + typeSystemWritten = true; // Embedded type system
> + }
> + break;
> + case S0:
> + // Java-serialized CAS without type system
> + serializeCAS(aCas, docOS);
> + break;
> + case S4:
> + // Binary compressed CAS without type system (form 4)
> + serializeWithCompression(aCas, docOS);
> + break;
> +
> + case S6:
> + // Binary compressed CAS (form 6)
> + serializeWithCompression(aCas, docOS, aCas.getTypeSystem());
> + break;
> + case S6p:
> + // Binary compressed CAS (form 6)
> + // ... with embedded Java-serialized type system
> + writeHeader(docOS);
> + writeTypeSystem(aCas, docOS);
> + typeSystemWritten = true; // Embedded type system
> + serializeWithCompression(aCas, docOS, aCas.getTypeSystem());
> + break;
> + default:
> + throw new IllegalArgumentException("Unknown format [" + format.name()
> + + "]. Must be one of: " + SerializationFormat.values());
> + }
> + } catch (IOException e) {
> + throw e;
> + } catch (Exception e) {
> + throw new IOException(e);
> + }
> +
> + // To support writing to ZIPs, the type system must be written separately from the CAS data
> + if (typeOS != null && !typeSystemWritten) {
> + writeTypeSystem(aCas, typeOS);
> + typeSystemWritten = true;
> + }
> + }
> +
> + private static CASMgrSerializer readCasManager(InputStream aIs) throws IOException {
> + CASMgrSerializer casMgrSerializer;
> +
> + try {
> + ObjectInputStream is = new ObjectInputStream(aIs);
> + casMgrSerializer = (CASMgrSerializer) is.readObject();
> + } catch (ClassNotFoundException e) {
> + throw new IOException(e);
> + }
> +
> + return casMgrSerializer;
> + }
> +
> + private static void writeHeader(OutputStream aOS) throws IOException {
> + DataOutputStream dataOS = new DataOutputStream(aOS);
> + dataOS.write(UIMA_TS_HEADER);
> + dataOS.flush();
> + }
> +
> + private static void writeTypeSystem(CAS aCas, OutputStream aOS) throws IOException {
> + ObjectOutputStream typeOS = new ObjectOutputStream(aOS);
> + CASMgrSerializer casMgrSerializer = serializeCASMgr((CASImpl) aCas);
> + typeOS.writeObject(casMgrSerializer);
> + typeOS.flush();
> + }
> +}
>
> Propchange: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
> ------------------------------------------------------------------------------
> svn:eol-style = native
>
> Added: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java
> URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java?rev=1753208&view=auto
> ==============================================================================
> --- uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java (added)
> +++ uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java Mon Jul 18 11:40:47 2016
> @@ -0,0 +1,66 @@
> +/*
> + * Licensed to the Apache Software Foundation (ASF) under one
> + * or more contributor license agreements. See the NOTICE file
> + * distributed with this work for additional information
> + * regarding copyright ownership. The ASF licenses this file
> + * to you under the Apache License, Version 2.0 (the
> + * "License"); you may not use this file except in compliance
> + * with the License. You may obtain a copy of the License at
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing,
> + * software distributed under the License is distributed on an
> + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> + * KIND, either express or implied. See the License for the
> + * specific language governing permissions and limitations
> + * under the License.
> + */
> +package org.apache.uima.util;
> +
> +/**
> + * The available serialization formats in uimaj-core. Additional serializers like json are not included.
> + *
> + */
> +public enum SerializationFormat {
> +
> + /**
> + * XML-serialized CAS
> + */
> + XMI,
> +
> + /**
> + * XML-serialized CAS
> + */
> + XCAS,
> +
> + /**
> + * Java-serialized CAS without type system
> + */
> + S,
> +
> + /**
> + * Java-serialized CAS with type system
> + */
> + Sp,
> +
> + /**
> + * Java-serialized CAS without type system
> + */
> + S0,
> +
> + /**
> + * Binary compressed CAS without type system (form 4)
> + */
> + S4,
> +
> + /**
> + * Binary compressed CAS (form 6)
> + */
> + S6,
> +
> + /**
> + * Binary compressed CAS (form 6) with embedded Java-serialized type system
> + */
> + S6p;
> +}
>
> Propchange: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java
> ------------------------------------------------------------------------------
> svn:eol-style = native
>
> Added: uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java
> URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java?rev=1753208&view=auto
> ==============================================================================
> --- uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java (added)
> +++ uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java Mon Jul 18 11:40:47 2016
> @@ -0,0 +1,144 @@
> +/*
> + * Licensed to the Apache Software Foundation (ASF) under one
> + * or more contributor license agreements. See the NOTICE file
> + * distributed with this work for additional information
> + * regarding copyright ownership. The ASF licenses this file
> + * to you under the Apache License, Version 2.0 (the
> + * "License"); you may not use this file except in compliance
> + * with the License. You may obtain a copy of the License at
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing,
> + * software distributed under the License is distributed on an
> + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> + * KIND, either express or implied. See the License for the
> + * specific language governing permissions and limitations
> + * under the License.
> + */
> +package org.apache.uima.util;
> +
> +import java.io.ByteArrayInputStream;
> +import java.io.ByteArrayOutputStream;
> +import java.io.File;
> +import java.io.FileInputStream;
> +import java.io.FileOutputStream;
> +import java.io.IOException;
> +import java.io.ObjectOutput;
> +import java.io.ObjectOutputStream;
> +
> +import org.apache.uima.UIMAFramework;
> +import org.apache.uima.cas.CAS;
> +import org.apache.uima.resource.metadata.FsIndexDescription;
> +import org.apache.uima.resource.metadata.TypeSystemDescription;
> +import org.apache.uima.resource.metadata.impl.TypePriorities_impl;
> +import org.apache.uima.test.junit_extension.JUnitExtension;
> +
> +import junit.framework.Assert;
> +import junit.framework.TestCase;
> +
> +public class CasIOUtilsTest extends TestCase{
> +
> + private static final int SIMPLE_CAS_DEFAULT_INDEX_SIZE = 7;
> +
> + private CAS cas;
> +
> + public CasIOUtilsTest(String arg0) {
> + super(arg0);
> + }
> +
> + protected void setUp() throws Exception {
> + File typeSystemFile = JUnitExtension.getFile("ExampleCas/testTypeSystem.xml");
> + File indexesFile = JUnitExtension.getFile("ExampleCas/testIndexes.xml");
> +
> + TypeSystemDescription typeSystem = UIMAFramework.getXMLParser().parseTypeSystemDescription(
> + new XMLInputSource(typeSystemFile));
> + FsIndexDescription[] indexes = UIMAFramework.getXMLParser().parseFsIndexCollection(new XMLInputSource(indexesFile))
> + .getFsIndexes();
> + cas = CasCreationUtils.createCas(typeSystem, new TypePriorities_impl(), indexes);
> + CasIOUtils.load(JUnitExtension.getFile("ExampleCas/simpleCas.xmi"), cas);
> + }
> +
> + public void testXMI() throws Exception {
> + File casFile = new File("target/temp-test-output/simpleCas.xmi");
> + casFile.getParentFile().mkdirs();
> + CasIOUtils.save(cas, new FileOutputStream(casFile), SerializationFormat.XMI);
> + cas.reset();
> + CasIOUtils.load(casFile, cas);
> + Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> + cas.reset();
> + CasIOUtils.load(new FileInputStream(casFile), cas);
> + Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> + cas.reset();
> + CasIOUtils.load(casFile.toURI().toURL(), cas);
> + Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> + }
> +
> + public void testXCAS() throws Exception {
> + File casFile = new File("target/temp-test-output/simpleCas.xcas");
> + casFile.getParentFile().mkdirs();
> + CasIOUtils.save(cas, new FileOutputStream(casFile), SerializationFormat.XCAS);
> + cas.reset();
> + CasIOUtils.load(casFile, cas);
> + Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> + cas.reset();
> + CasIOUtils.load(casFile.toURI().toURL(), cas);
> + Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> + }
> +
> + public void testS() throws Exception {
> + testFormat(SerializationFormat.S, "bins");
> + }
> +
> + public void testSp() throws Exception {
> + testFormat(SerializationFormat.Sp, "binsp");
> + }
> +
> + public void testS0() throws Exception {
> + testFormat(SerializationFormat.S0, "bins0");
> + }
> +
> + public void testS4() throws Exception {
> + testFormat(SerializationFormat.S4, "bins4");
> + }
> +
> + public void testS6() throws Exception {
> + testFormat(SerializationFormat.S6, "bins6");
> + }
> +
> + public void testS6p() throws Exception {
> + testFormat(SerializationFormat.S6p, "bins6p");
> + }
> +
> + private void testFormat(SerializationFormat format, String fileEnding) throws Exception {
> + File casFile = new File("target/temp-test-output/simpleCas."+ fileEnding);
> + casFile.getParentFile().mkdirs();
> + CasIOUtils.save(cas, new FileOutputStream(casFile), format);
> + cas.reset();
> + SerializationFormat loadedFormat = CasIOUtils.load(new FileInputStream(casFile), cas);
> + Assert.assertEquals(format, loadedFormat);
> + Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> + }
> +
> + public void testWrongInputStream() throws Exception {
> + ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
> + ObjectOutput out = null;
> +
> + out = new ObjectOutputStream(byteArrayOutputStream);
> + out.writeObject(new String("WRONG OBJECT"));
> +
> + byte[] casBytes = byteArrayOutputStream.toByteArray();
> + try {
> + CasIOUtils.load(new ByteArrayInputStream(casBytes), cas);
> + } catch (Exception e) {
> + Assert.assertTrue(e instanceof IOException);
> + return;
> + }
> + Assert.fail("An exception should have been thrown for wrong input.");
> + }
> +
> +
> + protected void tearDown() throws Exception {
> + cas.release();
> + }
> +}
>
> Propchange: uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java
> ------------------------------------------------------------------------------
> svn:eol-style = native
>
>
Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src: main/java/org/apache/uima/util/CasIOUtils.java main/java/org/apache/uima/util/SerializationFormat.java test/java/org/apache/uima/util/CasIOUtilsTest.java
Posted by Richard Eckart de Castilho <re...@apache.org>.
On 18.07.2016, at 14:14, Peter Klügl <pe...@averbis.com> wrote:
>
> I would prefer the .Xcas variants. Why distinguish between serialized
> and binary?
We don't necessarily have to distinguish. But we may want to because
"normal" UIMA binary formats identify themselves as such, but the
serialized CAS afaik doesn't.
> This is rather a convention concerning the CasIOUtils, right? The code
> works right now with any file extension (exception: xcas). The output is
> specified by the given outputstream.
>
> Well, for the CAS Editor, I could really use that convention in order to
> link the editor.
I was thinking about the CAS Editor (because I saw extensions being used
there when I checked the code. But simply also in general as a best practice.
Cheers,
-- Richard
Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src:
main/java/org/apache/uima/util/CasIOUtils.java
main/java/org/apache/uima/util/SerializationFormat.java
test/java/org/apache/uima/util/CasIOUtilsTest.java
Posted by Peter Klügl <pe...@averbis.com>.
I would prefer the .Xcas variants. Why distinguish between serialized
and binary?
This is rather a convention concerning the CasIOUtils, right? The code
works right now with any file extension (exception: xcas). The output is
specified by the given outputstream.
Well, for the CAS Editor, I could really use that convention in order to
link the editor.
Peter
Am 18.07.2016 um 13:48 schrieb Richard Eckart de Castilho:
> On 18.07.2016, at 13:43, Peter Kl�gl <pe...@averbis.com> wrote:
>> I added a first prototype of the utils class. Suggestions and comments
>> are welcome. I'll proceed with the CAS Editor for now...
> Suggestion for the default file extension for
>
> - binary CAS files: ".bin" or ".bcas"
>
> - "serialized" CASes: ".ser" or ".scas"
>
> Other suggestions?
>
> Cheers,
>
> -- Richard
Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src: main/java/org/apache/uima/util/CasIOUtils.java main/java/org/apache/uima/util/SerializationFormat.java test/java/org/apache/uima/util/CasIOUtilsTest.java
Posted by Richard Eckart de Castilho <re...@apache.org>.
On 18.07.2016, at 13:43, Peter Klügl <pe...@averbis.com> wrote:
>
> I added a first prototype of the utils class. Suggestions and comments
> are welcome. I'll proceed with the CAS Editor for now...
Suggestion for the default file extension for
- binary CAS files: ".bin" or ".bcas"
- "serialized" CASes: ".ser" or ".scas"
Other suggestions?
Cheers,
-- Richard