You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Peter Klügl <pe...@averbis.com> on 2016/07/18 11:43:01 UTC

Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src: main/java/org/apache/uima/util/CasIOUtils.java main/java/org/apache/uima/util/SerializationFormat.java test/java/org/apache/uima/util/CasIOUtilsTest.java

Hi,


I added a first prototype of the utils class. Suggestions and comments
are welcome. I'll proceed with the CAS Editor for now...


Best,


Peter


Am 18.07.2016 um 13:40 schrieb pkluegl@apache.org:
> Author: pkluegl
> Date: Mon Jul 18 11:40:47 2016
> New Revision: 1753208
>
> URL: http://svn.apache.org/viewvc?rev=1753208&view=rev
> Log:
> UIMA-4685
> - added SerializationFormat enum
> - added CasIOUtils for loading/storing with different formats
> - added test
>
> Added:
>     uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java   (with props)
>     uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java   (with props)
>     uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java   (with props)
>
> Added: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
> URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java?rev=1753208&view=auto
> ==============================================================================
> --- uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java (added)
> +++ uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java Mon Jul 18 11:40:47 2016
> @@ -0,0 +1,507 @@
> +/*
> + * Licensed to the Apache Software Foundation (ASF) under one
> + * or more contributor license agreements.  See the NOTICE file
> + * distributed with this work for additional information
> + * regarding copyright ownership.  The ASF licenses this file
> + * to you under the Apache License, Version 2.0 (the
> + * "License"); you may not use this file except in compliance
> + * with the License.  You may obtain a copy of the License at
> + * 
> + *   http://www.apache.org/licenses/LICENSE-2.0
> + * 
> + * Unless required by applicable law or agreed to in writing,
> + * software distributed under the License is distributed on an
> + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> + * KIND, either express or implied.  See the License for the
> + * specific language governing permissions and limitations
> + * under the License.
> + */
> +package org.apache.uima.util;
> +
> +import static org.apache.uima.cas.impl.Serialization.deserializeCAS;
> +import static org.apache.uima.cas.impl.Serialization.deserializeCASComplete;
> +import static org.apache.uima.cas.impl.Serialization.serializeCAS;
> +import static org.apache.uima.cas.impl.Serialization.serializeCASComplete;
> +import static org.apache.uima.cas.impl.Serialization.serializeCASMgr;
> +import static org.apache.uima.cas.impl.Serialization.serializeWithCompression;
> +
> +import java.io.BufferedInputStream;
> +import java.io.DataInputStream;
> +import java.io.DataOutputStream;
> +import java.io.File;
> +import java.io.IOException;
> +import java.io.InputStream;
> +import java.io.ObjectInputStream;
> +import java.io.ObjectOutputStream;
> +import java.io.OutputStream;
> +import java.net.URL;
> +import java.nio.file.Path;
> +import java.util.Arrays;
> +
> +import org.apache.uima.cas.CAS;
> +import org.apache.uima.cas.impl.CASCompleteSerializer;
> +import org.apache.uima.cas.impl.CASImpl;
> +import org.apache.uima.cas.impl.CASMgrSerializer;
> +import org.apache.uima.cas.impl.CASSerializer;
> +import org.apache.uima.cas.impl.TypeSystemImpl;
> +import org.apache.uima.cas.impl.XCASDeserializer;
> +import org.apache.uima.cas.impl.XCASSerializer;
> +import org.apache.uima.cas.impl.XmiCasDeserializer;
> +import org.apache.uima.cas.impl.XmiCasSerializer;
> +import org.apache.uima.resource.ResourceInitializationException;
> +import org.xml.sax.SAXException;
> +
> +public class CasIOUtils {
> +
> +  public static final byte[] UIMA_TS_HEADER = new byte[] { 'U', 'I', 'M', 'A', 'T', 'S' };
> +
> +  public static final byte[] UIMA_HEADER = new byte[] { 'U', 'I', 'M', 'A' };
> +
> +  /**
> +   * 
> +   * @param casPath
> +   *          The path containing the CAS
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(Path casPath, CAS aCAS) throws IOException {
> +
> +    return load(casPath, null, aCAS);
> +  }
> +
> +  /**
> +   * 
> +   * @param casPath
> +   *          The path containing the CAS
> +   * @param tsPath
> +   *          The optional path containing the type system
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(Path casPath, Path tsPath, CAS aCAS) throws IOException {
> +
> +    URL casUrl = casPath.toUri().toURL();
> +    URL tsUrl = tsPath == null ? null : tsPath.toUri().toURL();
> +    return load(casUrl, tsUrl, aCAS);
> +  }
> +
> +  /**
> +   * 
> +   * @param casFile
> +   *          The file containing the CAS
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(File casFile, CAS aCAS) throws IOException {
> +
> +    return load(casFile, null, aCAS);
> +  }
> +
> +  /**
> +   * 
> +   * @param casFile
> +   *          The file containing the CAS
> +   * @param tsFile
> +   *          The optional file containing the type system
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(File casFile, File tsFile, CAS aCAS) throws IOException {
> +
> +    URL casUrl = casFile.toURI().toURL();
> +    URL tsUrl = tsFile == null ? null : tsFile.toURI().toURL();
> +    return load(casUrl, tsUrl, aCAS);
> +  }
> +
> +  /**
> +   * 
> +   * @param casUrl
> +   *          The url containing the CAS
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(URL casUrl, CAS aCAS) throws IOException {
> +
> +    return load(casUrl, null, aCAS);
> +  }
> +
> +  /**
> +   * 
> +   * @param casUrl
> +   *          The url containing the CAS
> +   * @param tsUrl
> +   *          The optional url containing the type system
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(URL casUrl, URL tsUrl, CAS aCAS) throws IOException {
> +    String path = casUrl.getPath().toLowerCase();
> +    if (path.endsWith(".xmi")) {
> +      try {
> +        XmiCasDeserializer.deserialize(casUrl.openStream(), aCAS, true);
> +        return SerializationFormat.XMI;
> +      } catch (SAXException e) {
> +        throw new IOException(e);
> +      }
> +    } else if (path.endsWith(".xcas") || path.endsWith(".xml")) {
> +      try {
> +        XCASDeserializer.deserialize(casUrl.openStream(), aCAS, true);
> +        return SerializationFormat.XCAS;
> +      } catch (SAXException e) {
> +        throw new IOException(e);
> +      }
> +    }
> +return      loadBinary(casUrl.openStream(), tsUrl == null ? null : tsUrl.openStream(), aCAS);
> +  }
> +
> +  /**
> +   * This method tries to guess the format of the input stream. It supports binary format and XMI
> +   * but not XCAS
> +   * 
> +   * @param casInputStream
> +   *          The input stream containing the CAS
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(InputStream casInputStream, CAS aCAS) throws IOException {
> +    return load(casInputStream, null, aCAS);
> +  }
> +
> +  /**
> +   * This method tries to guess the format of the input stream. It supports binary format and XMI
> +   * but not XCAS
> +   * 
> +   * @param casInputStream
> +   *          The input stream containing the CAS
> +   * @param tsInputStream
> +   *          The optional input stream containing the type system
> +   * @param aCAS
> +   *          The CAS that should be filled
> +   * @throws IOException
> +   */
> +  public static SerializationFormat load(InputStream casInputStream, InputStream tsInputStream, CAS aCAS)
> +          throws IOException {
> +    BufferedInputStream bis = new BufferedInputStream(casInputStream);
> +    bis.mark(32);
> +    byte[] headerXml = new byte[16];
> +    bis.read(headerXml);
> +    bis.reset();
> +    String start = new String(headerXml);
> +    if (start.startsWith("<?xml ")) {
> +      try {
> +        XmiCasDeserializer.deserialize(bis, aCAS, true);
> +        return SerializationFormat.XMI;
> +      } catch (SAXException e) {
> +        throw new IOException(e);
> +      }
> +    }
> +return      loadBinary(bis, tsInputStream, aCAS);
> +  }
> +
> +  /**
> +   * Read CAS from the specified stream.
> +   * 
> +   * @param is
> +   *          The input stream of the CAS
> +   * @param aCAS
> +   *          the CAS in which the inpout stream will be deserialized
> +   * @throws IOException
> +   */
> +  public static SerializationFormat loadBinary(InputStream is, CAS aCAS) throws IOException {
> +    return loadBinary(is, (CASMgrSerializer) null, aCAS);
> +  }
> +
> +  /**
> +   * Read CAS from the specified stream.
> +   * 
> +   * @param is
> +   *          The input stream of the CAS
> +   * @param typeIS
> +   *          Optional stream from which typesystem information may be read. This is only used if
> +   *          the binary format read from the primary input stream does not already contain
> +   *          typesystem information.
> +   * @param aCAS
> +   *          the CAS in which the input stream will be deserialized
> +   * @throws IOException
> +   */
> +  public static SerializationFormat loadBinary(InputStream is, InputStream typeIS, CAS aCAS) throws IOException {
> +    CASMgrSerializer casMgr = null;
> +    if (typeIS != null) {
> +      casMgr = readCasManager(typeIS);
> +    }
> +
> +    return loadBinary(is, casMgr, aCAS);
> +  }
> +
> +  /**
> +   * Read CAS from the specified stream.
> +   * 
> +   * @param is
> +   *          The input stream of the CAS
> +   * @param casMgr
> +   *          Optional CASMgrSerializer. This is only used if the binary format read from the
> +   *          primary input stream does not already contain typesystem information.
> +   * @param aCAS
> +   *          the CAS in which the input stream will be deserialized
> +   * @throws IOException
> +   */
> +  public static SerializationFormat loadBinary(InputStream is, CASMgrSerializer casMgr, CAS aCAS)
> +          throws IOException {
> +    try {
> +      BufferedInputStream bis = new BufferedInputStream(is);
> +      TypeSystemImpl ts = null;
> +
> +      // Check if this is original UIMA CAS format or an extended format with type system
> +      bis.mark(32);
> +      DataInputStream dis = new DataInputStream(bis);
> +
> +      byte[] header = new byte[UIMA_TS_HEADER.length];
> +      dis.read(header);
> +
> +      // If it is UIMA with type system format, read the type system
> +      if (Arrays.equals(header, UIMA_TS_HEADER)) {
> +        ObjectInputStream ois = new ObjectInputStream(bis);
> +        CASMgrSerializer casMgrSerializer = (CASMgrSerializer) ois.readObject();
> +        ts = casMgrSerializer.getTypeSystem();
> +        ts.commit();
> +      } else {
> +        bis.reset();
> +      }
> +
> +      if (ts != null) {
> +        // Only format 6 can have type system information
> +        deserializeCAS(aCAS, bis, ts, null);
> +        return SerializationFormat.S6p;
> +      } else {
> +
> +        // Check if this is a UIMA binary CAS stream
> +        byte[] header4 = new byte[UIMA_HEADER.length];
> +        dis.read(header4);
> +
> +        if (header4[0] != 'U') {
> +          // ArrayUtils.reverse(header4);
> +          for (int i = 0; i < header4.length / 2; i++) {
> +            byte temp = header4[i];
> +            header4[i] = header4[header4.length - i - 1];
> +            header4[header4.length - i - 1] = temp;
> +          }
> +        }
> +
> +        // Peek into the version
> +        int version = dis.readInt();
> +        int version1 = dis.readInt();
> +        bis.reset();
> +
> +        if (Arrays.equals(header4, UIMA_HEADER)) {
> +          // It is a binary CAS stream
> +
> +          if ((version & 4) == 4 && (version1 != 0)) {
> +            // This is a form 6
> +            if (ts == null && casMgr != null) {
> +              // If there was not type system in the file but one is set, then load it
> +              ts = casMgr.getTypeSystem();
> +              ts.commit();
> +            }
> +            deserializeCAS(aCAS, bis, ts, null);
> +            return SerializationFormat.S6;
> +          } else {
> +            // This is a form 0 or 4
> +            deserializeCAS(aCAS, bis);
> +            if(version == 4) {
> +              return SerializationFormat.S4;
> +            }
> +            return SerializationFormat.S0;
> +          }
> +        } else {
> +          // If it is not a UIMA binary CAS stream and not xml, assume it is output from
> +          // SerializedCasWriter
> +          ObjectInputStream ois = new ObjectInputStream(bis);
> +          Object object = ois.readObject();
> +          if (object instanceof CASCompleteSerializer) {
> +            CASCompleteSerializer serializer = (CASCompleteSerializer) object;
> +            deserializeCASComplete(serializer, (CASImpl) aCAS);
> +            return SerializationFormat.Sp;
> +          } else if (object instanceof CASSerializer) {
> +            CASCompleteSerializer serializer;
> +            if (casMgr != null) {
> +              // Annotations and CAS metadata saved separately
> +              serializer = new CASCompleteSerializer();
> +              serializer.setCasMgrSerializer(casMgr);
> +              serializer.setCasSerializer((CASSerializer) object);
> +            } else {
> +              // Expecting that CAS is already initialized as required
> +              serializer = serializeCASComplete((CASImpl) aCAS);
> +              serializer.setCasSerializer((CASSerializer) object);
> +            }
> +            deserializeCASComplete(serializer, (CASImpl) aCAS);
> +            return SerializationFormat.S;
> +          } else {
> +            throw new IOException("Unknown serialized object found with type ["
> +                    + object.getClass().getName() + "]");
> +          }
> +        }
> +      }
> +    } catch (ResourceInitializationException e) {
> +      throw new IOException(e);
> +    } catch (ClassNotFoundException e) {
> +      throw new IOException(e);
> +    } finally {
> +      if (is != null) {
> +        is.close();
> +      }
> +    }
> +
> +  }
> +
> +  /**
> +   * Write the CAS in the specified format.
> +   * 
> +   * @param aCas
> +   *          The CAS that should be serialized and stored
> +   * @param docOS
> +   *          The output stream for the CAS
> +   * @param formatName
> +   *          The format string in which the CAS should be stored.
> +   * @throws IOException
> +   */
> +  public static void save(CAS aCas, OutputStream docOS, String formatName) throws IOException {
> +    SerializationFormat format = SerializationFormat.valueOf(formatName);
> +    save(aCas, docOS, null, format);
> +  }
> +
> +  /**
> +   * Write the CAS in the specified format.
> +   * 
> +   * @param aCas
> +   *          The CAS that should be serialized and stored
> +   * @param docOS
> +   *          The output stream for the CAS
> +   * @param format
> +   *          The SerializationFormat in which the CAS should be stored.
> +   * @throws IOException
> +   */
> +  public static void save(CAS aCas, OutputStream docOS, SerializationFormat format)
> +          throws IOException {
> +    save(aCas, docOS, null, format);
> +  }
> +
> +  /**
> +   * Write the CAS in the specified format. If the format does not include typesystem information
> +   * and the optional output stream of the typesystem is specified, then the typesystem information
> +   * is written there.
> +   * 
> +   * @param aCas
> +   *          The CAS that should be serialized and stored
> +   * @param docOS
> +   *          The output stream for the CAS
> +   * @param typeOS
> +   *          Optional output stream for type system information. Only used if the format does not
> +   *          support storing typesystem information directly in the main output file.
> +   * @param format
> +   *          The SerializationFormat in which the CAS should be stored.
> +   * @throws IOException
> +   */
> +  public static void save(CAS aCas, OutputStream docOS, OutputStream typeOS,
> +          SerializationFormat format) throws IOException {
> +    boolean typeSystemWritten = false;
> +    try {
> +      switch (format) {
> +        case XMI:
> +          XmiCasSerializer.serialize(aCas, docOS);
> +          break;
> +        case XCAS:
> +          XCASSerializer xcasSerializer = new XCASSerializer(aCas.getTypeSystem());
> +          XMLSerializer xmlSerialzer = new XMLSerializer(docOS, true);
> +          xcasSerializer.serialize(aCas, xmlSerialzer.getContentHandler());
> +          break;
> +        case S:
> +        // Java-serialized CAS without type system
> +        {
> +          CASSerializer serializer = new CASSerializer();
> +          serializer.addCAS((CASImpl) aCas);
> +          ObjectOutputStream objOS = new ObjectOutputStream(docOS);
> +          objOS.writeObject(serializer);
> +          objOS.flush();
> +        }
> +          break;
> +        case Sp:
> +        // Java-serialized CAS with type system
> +        {
> +          ObjectOutputStream objOS = new ObjectOutputStream(docOS);
> +          CASCompleteSerializer serializer = serializeCASComplete((CASImpl) aCas);
> +          objOS.writeObject(serializer);
> +          objOS.flush();
> +          typeSystemWritten = true; // Embedded type system
> +        }
> +          break;
> +        case S0:
> +          // Java-serialized CAS without type system
> +          serializeCAS(aCas, docOS);
> +          break;
> +        case S4:
> +          // Binary compressed CAS without type system (form 4)
> +          serializeWithCompression(aCas, docOS);
> +          break;
> +
> +        case S6:
> +          // Binary compressed CAS (form 6)
> +          serializeWithCompression(aCas, docOS, aCas.getTypeSystem());
> +          break;
> +        case S6p:
> +          // Binary compressed CAS (form 6)
> +          // ... with embedded Java-serialized type system
> +          writeHeader(docOS);
> +          writeTypeSystem(aCas, docOS);
> +          typeSystemWritten = true; // Embedded type system
> +          serializeWithCompression(aCas, docOS, aCas.getTypeSystem());
> +          break;
> +        default:
> +          throw new IllegalArgumentException("Unknown format [" + format.name()
> +                  + "]. Must be one of: " + SerializationFormat.values());
> +      }
> +    } catch (IOException e) {
> +      throw e;
> +    } catch (Exception e) {
> +      throw new IOException(e);
> +    }
> +
> +    // To support writing to ZIPs, the type system must be written separately from the CAS data
> +    if (typeOS != null && !typeSystemWritten) {
> +      writeTypeSystem(aCas, typeOS);
> +      typeSystemWritten = true;
> +    }
> +  }
> +
> +  private static CASMgrSerializer readCasManager(InputStream aIs) throws IOException {
> +    CASMgrSerializer casMgrSerializer;
> +
> +    try {
> +      ObjectInputStream is = new ObjectInputStream(aIs);
> +      casMgrSerializer = (CASMgrSerializer) is.readObject();
> +    } catch (ClassNotFoundException e) {
> +      throw new IOException(e);
> +    }
> +
> +    return casMgrSerializer;
> +  }
> +
> +  private static void writeHeader(OutputStream aOS) throws IOException {
> +    DataOutputStream dataOS = new DataOutputStream(aOS);
> +    dataOS.write(UIMA_TS_HEADER);
> +    dataOS.flush();
> +  }
> +
> +  private static void writeTypeSystem(CAS aCas, OutputStream aOS) throws IOException {
> +    ObjectOutputStream typeOS = new ObjectOutputStream(aOS);
> +    CASMgrSerializer casMgrSerializer = serializeCASMgr((CASImpl) aCas);
> +    typeOS.writeObject(casMgrSerializer);
> +    typeOS.flush();
> +  }
> +}
>
> Propchange: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/CasIOUtils.java
> ------------------------------------------------------------------------------
>     svn:eol-style = native
>
> Added: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java
> URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java?rev=1753208&view=auto
> ==============================================================================
> --- uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java (added)
> +++ uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java Mon Jul 18 11:40:47 2016
> @@ -0,0 +1,66 @@
> +/*
> + * Licensed to the Apache Software Foundation (ASF) under one
> + * or more contributor license agreements.  See the NOTICE file
> + * distributed with this work for additional information
> + * regarding copyright ownership.  The ASF licenses this file
> + * to you under the Apache License, Version 2.0 (the
> + * "License"); you may not use this file except in compliance
> + * with the License.  You may obtain a copy of the License at
> + * 
> + *   http://www.apache.org/licenses/LICENSE-2.0
> + * 
> + * Unless required by applicable law or agreed to in writing,
> + * software distributed under the License is distributed on an
> + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> + * KIND, either express or implied.  See the License for the
> + * specific language governing permissions and limitations
> + * under the License.
> + */
> +package org.apache.uima.util;
> +
> +/**
> + * The available serialization formats in uimaj-core. Additional serializers like json are not included.
> + *
> + */
> +public enum SerializationFormat {
> +  
> +  /**
> +   * XML-serialized CAS
> +   */
> +  XMI, 
> +  
> +  /**
> +   * XML-serialized CAS
> +   */
> +  XCAS, 
> +  
> +  /**
> +   * Java-serialized CAS without type system
> +   */
> +  S, 
> +  
> +  /**
> +   * Java-serialized CAS with type system
> +   */
> +  Sp, 
> +  
> +  /**
> +   * Java-serialized CAS without type system
> +   */
> +  S0, 
> +  
> +  /**
> +   * Binary compressed CAS without type system (form 4)
> +   */
> +  S4, 
> +  
> +  /**
> +   * Binary compressed CAS (form 6)
> +   */
> +  S6, 
> +  
> +  /**
> +   * Binary compressed CAS (form 6) with embedded Java-serialized type system
> +   */
> +  S6p;
> +}
>
> Propchange: uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/util/SerializationFormat.java
> ------------------------------------------------------------------------------
>     svn:eol-style = native
>
> Added: uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java
> URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java?rev=1753208&view=auto
> ==============================================================================
> --- uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java (added)
> +++ uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java Mon Jul 18 11:40:47 2016
> @@ -0,0 +1,144 @@
> +/*
> + * Licensed to the Apache Software Foundation (ASF) under one
> + * or more contributor license agreements.  See the NOTICE file
> + * distributed with this work for additional information
> + * regarding copyright ownership.  The ASF licenses this file
> + * to you under the Apache License, Version 2.0 (the
> + * "License"); you may not use this file except in compliance
> + * with the License.  You may obtain a copy of the License at
> + * 
> + *   http://www.apache.org/licenses/LICENSE-2.0
> + * 
> + * Unless required by applicable law or agreed to in writing,
> + * software distributed under the License is distributed on an
> + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> + * KIND, either express or implied.  See the License for the
> + * specific language governing permissions and limitations
> + * under the License.
> + */
> +package org.apache.uima.util;
> +
> +import java.io.ByteArrayInputStream;
> +import java.io.ByteArrayOutputStream;
> +import java.io.File;
> +import java.io.FileInputStream;
> +import java.io.FileOutputStream;
> +import java.io.IOException;
> +import java.io.ObjectOutput;
> +import java.io.ObjectOutputStream;
> +
> +import org.apache.uima.UIMAFramework;
> +import org.apache.uima.cas.CAS;
> +import org.apache.uima.resource.metadata.FsIndexDescription;
> +import org.apache.uima.resource.metadata.TypeSystemDescription;
> +import org.apache.uima.resource.metadata.impl.TypePriorities_impl;
> +import org.apache.uima.test.junit_extension.JUnitExtension;
> +
> +import junit.framework.Assert;
> +import junit.framework.TestCase;
> +
> +public class CasIOUtilsTest extends TestCase{
> +
> +  private static final int SIMPLE_CAS_DEFAULT_INDEX_SIZE = 7;
> +  
> +  private CAS cas;
> +
> +  public CasIOUtilsTest(String arg0) {
> +    super(arg0);
> +  }
> +  
> +  protected void setUp() throws Exception {
> +    File typeSystemFile = JUnitExtension.getFile("ExampleCas/testTypeSystem.xml");
> +    File indexesFile = JUnitExtension.getFile("ExampleCas/testIndexes.xml");
> +
> +    TypeSystemDescription typeSystem = UIMAFramework.getXMLParser().parseTypeSystemDescription(
> +            new XMLInputSource(typeSystemFile));
> +    FsIndexDescription[] indexes = UIMAFramework.getXMLParser().parseFsIndexCollection(new XMLInputSource(indexesFile))
> +            .getFsIndexes();
> +    cas = CasCreationUtils.createCas(typeSystem, new TypePriorities_impl(), indexes);
> +    CasIOUtils.load(JUnitExtension.getFile("ExampleCas/simpleCas.xmi"), cas);
> +  }
> +  
> +  public void testXMI() throws Exception {
> +    File casFile = new File("target/temp-test-output/simpleCas.xmi");
> +    casFile.getParentFile().mkdirs();
> +    CasIOUtils.save(cas, new FileOutputStream(casFile), SerializationFormat.XMI);
> +    cas.reset();
> +    CasIOUtils.load(casFile, cas);
> +    Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> +    cas.reset();
> +    CasIOUtils.load(new FileInputStream(casFile), cas);
> +    Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> +    cas.reset();
> +    CasIOUtils.load(casFile.toURI().toURL(), cas);
> +    Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> +  }
> +  
> +  public void testXCAS() throws Exception {
> +    File casFile = new File("target/temp-test-output/simpleCas.xcas");
> +    casFile.getParentFile().mkdirs();
> +    CasIOUtils.save(cas, new FileOutputStream(casFile), SerializationFormat.XCAS);
> +    cas.reset();
> +    CasIOUtils.load(casFile, cas);
> +    Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> +    cas.reset();
> +    CasIOUtils.load(casFile.toURI().toURL(), cas);
> +    Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> +  }
> +
> +  public void testS() throws Exception {
> +    testFormat(SerializationFormat.S, "bins");
> +  }
> +  
> +  public void testSp() throws Exception {
> +    testFormat(SerializationFormat.Sp, "binsp");
> +  }
> +  
> +  public void testS0() throws Exception {
> +    testFormat(SerializationFormat.S0, "bins0");
> +  }
> +  
> +  public void testS4() throws Exception {
> +    testFormat(SerializationFormat.S4, "bins4");
> +  }
> +  
> +  public void testS6() throws Exception {
> +    testFormat(SerializationFormat.S6, "bins6");
> +  }
> +  
> +  public void testS6p() throws Exception {
> +    testFormat(SerializationFormat.S6p, "bins6p");
> +  }
> +  
> +  private void testFormat(SerializationFormat format, String fileEnding) throws Exception {
> +    File casFile = new File("target/temp-test-output/simpleCas."+ fileEnding);
> +    casFile.getParentFile().mkdirs();
> +    CasIOUtils.save(cas, new FileOutputStream(casFile), format);
> +    cas.reset();
> +    SerializationFormat loadedFormat = CasIOUtils.load(new FileInputStream(casFile), cas);
> +    Assert.assertEquals(format, loadedFormat);
> +    Assert.assertEquals(SIMPLE_CAS_DEFAULT_INDEX_SIZE, cas.getAnnotationIndex().size());
> +  }
> +  
> +  public void testWrongInputStream() throws Exception {
> +    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
> +    ObjectOutput out = null;
> +
> +    out = new ObjectOutputStream(byteArrayOutputStream);
> +    out.writeObject(new String("WRONG OBJECT"));
> +
> +    byte[] casBytes = byteArrayOutputStream.toByteArray();
> +    try {
> +      CasIOUtils.load(new ByteArrayInputStream(casBytes), cas);
> +    } catch (Exception e) {
> +      Assert.assertTrue(e instanceof IOException);
> +      return;
> +    }
> +    Assert.fail("An exception should have been thrown for wrong input.");
> +  }
> +  
> +  
> +  protected void tearDown() throws Exception {
> +    cas.release();
> +  }
> +}
>
> Propchange: uima/uimaj/trunk/uimaj-core/src/test/java/org/apache/uima/util/CasIOUtilsTest.java
> ------------------------------------------------------------------------------
>     svn:eol-style = native
>
>


Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src: main/java/org/apache/uima/util/CasIOUtils.java main/java/org/apache/uima/util/SerializationFormat.java test/java/org/apache/uima/util/CasIOUtilsTest.java

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 18.07.2016, at 14:14, Peter Klügl <pe...@averbis.com> wrote:
> 
> I would prefer the .Xcas variants. Why distinguish between serialized
> and binary?

We don't necessarily have to distinguish. But we may want to because
"normal" UIMA binary formats identify themselves as such, but the
serialized CAS afaik doesn't.

> This is rather a convention concerning the CasIOUtils, right? The code
> works right now with any file extension (exception: xcas). The output is
> specified by the given outputstream.
> 
> Well, for the CAS Editor, I could really use that convention in order to
> link the editor.

I was thinking about the CAS Editor (because I saw extensions being used
there when I checked the code. But simply also in general as a best practice.

Cheers,

-- Richard


Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src: main/java/org/apache/uima/util/CasIOUtils.java main/java/org/apache/uima/util/SerializationFormat.java test/java/org/apache/uima/util/CasIOUtilsTest.java

Posted by Peter Klügl <pe...@averbis.com>.
I would prefer the .Xcas variants. Why distinguish between serialized
and binary?


This is rather a convention concerning the CasIOUtils, right? The code
works right now with any file extension (exception: xcas). The output is
specified by the given outputstream.

Well, for the CAS Editor, I could really use that convention in order to
link the editor.


Peter


Am 18.07.2016 um 13:48 schrieb Richard Eckart de Castilho:
> On 18.07.2016, at 13:43, Peter Kl�gl <pe...@averbis.com> wrote:
>> I added a first prototype of the utils class. Suggestions and comments
>> are welcome. I'll proceed with the CAS Editor for now...
> Suggestion for the default file extension for 
>
> - binary CAS files: ".bin" or ".bcas" 
>
> - "serialized" CASes: ".ser" or ".scas" 
>
> Other suggestions?
>
> Cheers,
>
> -- Richard


Re: svn commit: r1753208 - in /uima/uimaj/trunk/uimaj-core/src: main/java/org/apache/uima/util/CasIOUtils.java main/java/org/apache/uima/util/SerializationFormat.java test/java/org/apache/uima/util/CasIOUtilsTest.java

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 18.07.2016, at 13:43, Peter Klügl <pe...@averbis.com> wrote:
> 
> I added a first prototype of the utils class. Suggestions and comments
> are welcome. I'll proceed with the CAS Editor for now...

Suggestion for the default file extension for 

- binary CAS files: ".bin" or ".bcas" 

- "serialized" CASes: ".ser" or ".scas" 

Other suggestions?

Cheers,

-- Richard