You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by kl...@apache.org on 2003/12/02 18:46:01 UTC

cvs commit: jakarta-poi/src/testcases/org/apache/poi/hpsf/data TestChineseProperties.doc

klute       2003/12/02 09:46:01

  Modified:    src/documentation/content/xdocs changes.xml
               src/documentation/content/xdocs/hpsf how-to.xml
                        internals.xml todo.xml
               src/examples/src/org/apache/poi/hpsf/examples
                        CopyCompare.java WriteAuthorAndTitle.java
               src/java/org/apache/poi/hpsf MutableProperty.java
                        MutableSection.java Property.java PropertySet.java
                        Section.java TypeWriter.java VariantSupport.java
               src/testcases/org/apache/poi/hpsf/basic TestWrite.java
  Added:       src/testcases/org/apache/poi/hpsf/data
                        TestChineseProperties.doc
  Log:
  HPSF: codepage support added
  
  Revision  Changes    Path
  1.7       +4 -0      jakarta-poi/src/documentation/content/xdocs/changes.xml
  
  Index: changes.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/documentation/content/xdocs/changes.xml,v
  retrieving revision 1.6
  retrieving revision 1.7
  diff -u -r1.6 -r1.7
  --- changes.xml	5 Aug 2003 04:00:13 -0000	1.6
  +++ changes.xml	2 Dec 2003 17:46:00 -0000	1.7
  @@ -12,7 +12,11 @@
           <person id="MJ" name="Marc Johnson" email="mjohnson@apache.org"/>
           <person id="NKB" name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
           <person id="POI-DEVELOPERS" name="POI Developers" email="poi-dev@jakarta.apache.org"/>
  +        <person id="RK" name="Rainer Klute" email="klute@apache.org"/>
       </devs>
  +    <release version="2.0-pre3" date="unreleased">
  +        <action dev="RK" type="add">HPSF: Much better codepage support</action>
  +    </release>
       <release version="2.0-pre1" date="unreleased">
           <action dev="POI-DEVELOPERS" type="add">Patch applied for deep cloning of worksheets was provided</action>
           <action dev="POI-DEVELOPERS" type="add">Patch applied to allow sheet reordering</action>
  
  
  
  1.10      +30 -13    jakarta-poi/src/documentation/content/xdocs/hpsf/how-to.xml
  
  Index: how-to.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/documentation/content/xdocs/hpsf/how-to.xml,v
  retrieving revision 1.9
  retrieving revision 1.10
  diff -u -r1.9 -r1.10
  --- how-to.xml	20 Sep 2003 15:43:07 -0000	1.9
  +++ how-to.xml	2 Dec 2003 17:46:00 -0000	1.10
  @@ -708,8 +708,9 @@
          <td>The property's value is the number of a <strong>codepage</strong>,
           i.e. a mapping from character codes to characters. All strings in the
           section containing this property must be interpreted using this
  -        codepage. Typical property values are 1252 (8-bit "western" characters)
  -        or 1200 (16-bit Unicode characters).</td>
  +        codepage. Typical property values are 1252 (8-bit "western" characters,
  +	ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit
  +	Unicode characters, UFT-8).</td>
         </tr>
        </table>
       </section>
  @@ -833,18 +834,34 @@
       </section>
   
       <section><title>Codepage support</title>
  -     <fixme author="Rainer Klute">Improve codepage support!</fixme>
   
        <p>The property with ID 1 holds the number of the codepage which was used
  -      to encode the strings in this section. The present HPSF codepage support
  -      is still very limited: When reading property value strings, HPSF
  -      distinguishes between 16-bit characters and 8-bit characters. 16-bit
  -      characters should be Unicode characters and thus be okay. 8-bit
  -      characters are interpreted according to the platform's default character
  -      set. This is fine as long as the document being read has been written on
  -      a platform with the same default character set. However, if you receive a
  -      document from another region of the world and want to process it with
  -      HPSF you are in trouble - unless the creator used Unicode, of course.</p>
  +      to encode the strings in this section. If this property is not available
  +      in a section, the platform's default character encoding will be
  +      used. This works fine as long as the document being read has been written
  +      on a platform with the same default character encoding. However, if you
  +      receive a document from another region of the world and the codepage is
  +      undefined, you are in trouble.</p>
  +
  +     <p>HPSF's codepage support is as good as the character encoding support of
  +      the Java Virtual Machine (JVM) the application runs on. If HPSF
  +      encounters a codepage number it assumes that the JVM has a character
  +      encoding with a corresponding name. For example, if the codepage is 1252,
  +      HPSF uses the character encoding "cp1252" to read or write strings. If
  +      the JVM does not have that character encoding installed or if the
  +      codepage number is illegal, an UnsupportedEncodingException will be
  +      thrown.</p>
  +
  +     <p>There are two exceptions to the rule that a character encoding's name
  +      is derived from the codepage number by prepending the string "cp" to
  +      it:</p>
  +
  +     <dl>
  +      <dt>Codepage 1200</dt>
  +      <dd>is mapped to the character encoding "UTF-16".</dd>
  +      <dt>Codepage 65001</dt>
  +      <dd>is mapped to the character encoding "UTF-8".</dd>
  +     </dl>
       </section>
      </section>
   
  
  
  
  1.9       +55 -1     jakarta-poi/src/documentation/content/xdocs/hpsf/internals.xml
  
  Index: internals.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/documentation/content/xdocs/hpsf/internals.xml,v
  retrieving revision 1.8
  retrieving revision 1.9
  diff -u -r1.8 -r1.9
  --- internals.xml	11 Sep 2003 21:48:47 -0000	1.8
  +++ internals.xml	2 Dec 2003 17:46:00 -0000	1.9
  @@ -944,6 +944,60 @@
   
   
   
  +   <section>
  +    <title>The Dictionary</title>
  +
  +    <p>What a dictionary is good for is explained in the <link
  +      href="how-to.html">HPSF HOW-TO</link>. This chapter explains how it is
  +     organized internally.</p>
  +
  +    <p>The dictionary has a simple header consisting of a single UInt value. It
  +    tells how many entries the dictionary comprises:</p>
  +
  +    <table>
  +     <tr>
  +      <th>Name</th>
  +      <th>Data type</th>
  +      <th>Description</th>
  +     </tr>
  +     <tr>
  +      <td>nrEntries</td>
  +      <th>UInt</th>
  +      <td>Number of dictionary entries</td>
  +     </tr>
  +    </table>
  +
  +    <p>The dictionary entries follow the header. Each one looks like this:</p>
  +
  +    <table>
  +     <tr>
  +      <th>Name</th>
  +      <td>Data type</td>
  +      <th>Description</th>
  +     </tr>
  +     <tr>
  +      <td>key</td>
  +      <td>UInt</td>
  +      <td>The unique number of this property, i.e. the PID</td>
  +     </tr>
  +     <tr>
  +      <td>length</td>
  +      <td>UInt</td>
  +      <td>The length of the property name associated with the key</td>
  +     </tr>
  +     <tr>
  +      <td>value</td>
  +      <td>String</td>
  +      <td>The property's name, terminated with a 0x00 character</td>
  +     </tr>
  +    </table>
  +
  +    <p>The entries are not aligned, i.e. each one follows its predecessor
  +     without any gap or fill characters.</p>
  +   </section>
  +
  +
  +
      <section><title>References</title>
   
       <p>In order to assemble the HPSF description I used information publically
  
  
  
  1.4       +10 -15    jakarta-poi/src/documentation/content/xdocs/hpsf/todo.xml
  
  Index: todo.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/documentation/content/xdocs/hpsf/todo.xml,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- todo.xml	30 Aug 2003 09:19:04 -0000	1.3
  +++ todo.xml	2 Dec 2003 17:46:00 -0000	1.4
  @@ -21,25 +21,20 @@
        information streams.
       </li>
       <li>
  -     Add codepage support: Presently the bytes making out the string in a
  -      property's value are interpreted using the platform's default character
  -      set.
  -    </li>
  -    <li>
  -      Add resource bundles to
  -      <code>org.apache.poi.hpsf.wellknown</code> to ease
  -      localizations. This would be useful for mapping standard property IDs to
  -      localized strings. Example: The property ID 4 could be mapped to "Author"
  -      in English or "Verfasser" in German.
  +     Add resource bundles to
  +     <code>org.apache.poi.hpsf.wellknown</code> to ease
  +     localizations. This would be useful for mapping standard property IDs to
  +     localized strings. Example: The property ID 4 could be mapped to "Author"
  +     in English or "Verfasser" in German.
       </li>
       <li>
        Implement reading functionality for those property types that are not
  -      yet supported. HPSF should return proper Java types instead of just byte
  -      arrays.
  +     yet supported. HPSF should return proper Java types instead of just byte
  +     arrays.
       </li>
       <li>
  -     Add WMF to <code>java.awt.Image</code> example code in <link
  -     href="thumbnails.html">Thumbnail HOW TO</link>.
  +     Add WMF to <code>java.awt.Image</code> example code in the <link
  +      href="thumbnails.html">Thumbnail HOW-TO</link>.
       </li>
      </ol>
     </section>
  
  
  
  1.2       +5 -2      jakarta-poi/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java
  
  Index: CopyCompare.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- CopyCompare.java	20 Sep 2003 15:43:08 -0000	1.1
  +++ CopyCompare.java	2 Dec 2003 17:46:01 -0000	1.2
  @@ -558,7 +558,10 @@
                    * exists. However, since we have full control about directory
                    * creation we can ensure that this will never happen. */
                   ex.printStackTrace(System.err);
  -                throw new RuntimeException(ex);
  +                throw new RuntimeException(ex.toString());
  +                /* FIXME (2): Replace the previous line by the following once we
  +                 * no longer need JDK 1.3 compatibility. */
  +                // throw new RuntimeException(ex);
               }
           }
       }
  
  
  
  1.4       +5 -2      jakarta-poi/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java
  
  Index: WriteAuthorAndTitle.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- WriteAuthorAndTitle.java	20 Sep 2003 15:43:08 -0000	1.3
  +++ WriteAuthorAndTitle.java	2 Dec 2003 17:46:01 -0000	1.4
  @@ -444,7 +444,10 @@
                    * exists. However, since we have full control about directory
                    * creation we can ensure that this will never happen. */
                   ex.printStackTrace(System.err);
  -                throw new RuntimeException(ex);
  +                throw new RuntimeException(ex.toString());
  +                /* FIXME (2): Replace the previous line by the following once we
  +                 * no longer need JDK 1.3 compatibility. */
  +                // throw new RuntimeException(ex);
               }
           }
       }
  
  
  
  1.3       +4 -3      jakarta-poi/src/java/org/apache/poi/hpsf/MutableProperty.java
  
  Index: MutableProperty.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/MutableProperty.java,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- MutableProperty.java	4 Sep 2003 20:15:24 -0000	1.2
  +++ MutableProperty.java	2 Dec 2003 17:46:01 -0000	1.3
  @@ -80,19 +80,20 @@
        * <p>Writes the property to an output stream.</p>
        * 
        * @param out The output stream to write to.
  +     * @param codepage The codepage to use for writing non-wide strings
        * @return the number of bytes written to the stream
        * 
        * @exception IOException if an I/O error occurs
        * @exception WritingNotSupportedException if a variant type is to be
        * written that is not yet supported
        */
  -    public int write(final OutputStream out)
  +    public int write(final OutputStream out, final int codepage)
           throws IOException, WritingNotSupportedException
       {
           int length = 0;
           long variantType = getType();
           length += TypeWriter.writeUIntToStream(out, variantType);
  -        length += VariantSupport.write(out, variantType, getValue());
  +        length += VariantSupport.write(out, variantType, getValue(), codepage);
           return length;
       }
   
  
  
  
  1.7       +6 -6      jakarta-poi/src/java/org/apache/poi/hpsf/MutableSection.java
  
  Index: MutableSection.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/MutableSection.java,v
  retrieving revision 1.6
  retrieving revision 1.7
  diff -u -r1.6 -r1.7
  --- MutableSection.java	23 Oct 2003 20:44:24 -0000	1.6
  +++ MutableSection.java	2 Dec 2003 17:46:01 -0000	1.7
  @@ -420,16 +420,16 @@
   
               /* If the property ID is not equal 0 we write the property and all
                * is fine. However, if it equals 0 we have to write the section's
  -             * dictionary which does not have a type but just a value. */
  +             * dictionary which has an implicit type only and an explicit
  +             * value. */
               if (id != 0)
                   /* Write the property and update the position to the next
                    * property. */
  -                position += p.write(propertyStream);
  +                position += p.write(propertyStream, getCodepage());
               else
               {
  -                final Integer codepage =
  -                    (Integer) getProperty(PropertyIDMap.PID_CODEPAGE);
  -                if (codepage == null)
  +                final int codepage = getCodepage();
  +                if (codepage == -1)
                       throw new IllegalPropertySetDataException
                           ("Codepage (property 1) is undefined.");
                   position += writeDictionary(propertyStream, dictionary);
  
  
  
  1.16      +28 -3     jakarta-poi/src/java/org/apache/poi/hpsf/Property.java
  
  Index: Property.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/Property.java,v
  retrieving revision 1.15
  retrieving revision 1.16
  diff -u -r1.15 -r1.16
  --- Property.java	18 Sep 2003 18:56:35 -0000	1.15
  +++ Property.java	2 Dec 2003 17:46:01 -0000	1.16
  @@ -62,9 +62,11 @@
    */
   package org.apache.poi.hpsf;
   
  +import java.io.UnsupportedEncodingException;
   import java.util.HashMap;
   import java.util.Map;
   
  +import org.apache.poi.util.HexDump;
   import org.apache.poi.util.LittleEndian;
   
   /**
  @@ -161,9 +163,13 @@
        * @param length The property's type/value pair's length in bytes.
        * @param codepage The section's and thus the property's
        * codepage. It is needed only when reading string values.
  +     * 
  +     * @exception UnsupportedEncodingException if the specified codepage is not
  +     * supported
        */
       public Property(final long id, final byte[] src, final long offset,
                       final int length, final int codepage)
  +    throws UnsupportedEncodingException
       {
           this.id = id;
   
  @@ -183,7 +189,7 @@
   
           try
           {
  -            value = VariantSupport.read(src, o, length, (int) type);
  +            value = VariantSupport.read(src, o, length, (int) type, codepage);
           }
           catch (UnsupportedVariantTypeException ex)
           {
  @@ -382,8 +388,27 @@
           b.append(getID());
           b.append(", type: ");
           b.append(getType());
  +        final Object value = getValue();
           b.append(", value: ");
  -        b.append(getValue());
  +        b.append(value.toString());
  +        if (value instanceof String)
  +        {
  +            final String s = (String) value;
  +            final int l = s.length();
  +            final byte[] bytes = new byte[l * 2];
  +            for (int i = 0; i < l; i++)
  +            {
  +                final char c = s.charAt(i);
  +                final byte high = (byte) ((c & 0x00ff00) >> 8);
  +                final byte low  = (byte) ((c & 0x0000ff) >> 0);
  +                bytes[i * 2]     = high;
  +                bytes[i * 2 + 1] = low;
  +            }
  +            final String hex = HexDump.dump(bytes, 0L, 0);
  +            b.append(" [");
  +            b.append(hex);
  +            b.append("]");
  +        }
           b.append(']');
           return b.toString();
       }
  
  
  
  1.15      +12 -5     jakarta-poi/src/java/org/apache/poi/hpsf/PropertySet.java
  
  Index: PropertySet.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/PropertySet.java,v
  retrieving revision 1.14
  retrieving revision 1.15
  diff -u -r1.14 -r1.15
  --- PropertySet.java	23 Oct 2003 20:44:24 -0000	1.14
  +++ PropertySet.java	2 Dec 2003 17:46:01 -0000	1.15
  @@ -56,6 +56,7 @@
   
   import java.io.IOException;
   import java.io.InputStream;
  +import java.io.UnsupportedEncodingException;
   import java.util.ArrayList;
   import java.util.List;
   
  @@ -300,9 +301,11 @@
        * @param length The length of the stream data.
        * @throws NoPropertySetStreamException if the byte array is not a
        * property set stream.
  +     * 
  +     * @exception UnsupportedEncodingException if the codepage is not supported
        */
       public PropertySet(final byte[] stream, final int offset, final int length)
  -        throws NoPropertySetStreamException
  +        throws NoPropertySetStreamException, UnsupportedEncodingException
       {
           if (isPropertySetStream(stream, offset, length))
               init(stream, offset, length);
  @@ -321,8 +324,11 @@
        * complete byte array contents is the stream data.
        * @throws NoPropertySetStreamException if the byte array is not a
        * property set stream.
  +     * 
  +     * @exception UnsupportedEncodingException if the codepage is not supported
        */
  -    public PropertySet(final byte[] stream) throws NoPropertySetStreamException
  +    public PropertySet(final byte[] stream)
  +    throws NoPropertySetStreamException, UnsupportedEncodingException
       {
           this(stream, 0, stream.length);
       }
  @@ -435,6 +441,7 @@
        * @param length Length of the property set stream.
        */
       private void init(final byte[] src, final int offset, final int length)
  +    throws UnsupportedEncodingException
       {
           /* FIXME (3): Ensure that at most "length" bytes are read. */
           
  @@ -651,7 +658,7 @@
           final PropertySet ps = (PropertySet) o;
           int byteOrder1 = ps.getByteOrder();
           int byteOrder2 = getByteOrder();
  -        ClassID classId1 = ps.getClassID();
  +        ClassID classID1 = ps.getClassID();
           ClassID classID2 = getClassID();
           int format1 = ps.getFormat();
           int format2 = getFormat();
  @@ -660,7 +667,7 @@
           int sectionCount1 = ps.getSectionCount();
           int sectionCount2 = getSectionCount();
           if (byteOrder1 != byteOrder2      ||
  -            !classId1.equals(classID2)    ||
  +            !classID1.equals(classID2)    ||
               format1 != format2            ||
               osVersion1 != osVersion2      ||
               sectionCount1 != sectionCount2)
  
  
  
  1.21      +20 -1     jakarta-poi/src/java/org/apache/poi/hpsf/Section.java
  
  Index: Section.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/Section.java,v
  retrieving revision 1.20
  retrieving revision 1.21
  diff -u -r1.20 -r1.21
  --- Section.java	23 Oct 2003 20:44:24 -0000	1.20
  +++ Section.java	2 Dec 2003 17:46:01 -0000	1.21
  @@ -54,6 +54,7 @@
    */
   package org.apache.poi.hpsf;
   
  +import java.io.UnsupportedEncodingException;
   import java.util.ArrayList;
   import java.util.Collections;
   import java.util.Iterator;
  @@ -193,8 +194,12 @@
        * @param src Contains the complete property set stream.
        * @param offset The position in the stream that points to the
        * section's format ID.
  +     * 
  +     * @exception UnsupportedEncodingException if the section's codepage is not
  +     * supported.
        */
       public Section(final byte[] src, final int offset)
  +    throws UnsupportedEncodingException
       {
           int o1 = offset;
   
  @@ -636,6 +641,20 @@
       public Map getDictionary()
       {
           return dictionary;
  +    }
  +
  +
  +
  +    /**
  +     * <p>Gets the section's codepage, if any.</p>
  +     *
  +     * @return The section's codepage if one is defined, else -1.
  +     */
  +    public int getCodepage()
  +    {
  +        final Integer codepage =
  +            (Integer) getProperty(PropertyIDMap.PID_CODEPAGE);
  +        return codepage != null ? codepage.intValue() : -1;
       }
   
   }
  
  
  
  1.3       +4 -3      jakarta-poi/src/java/org/apache/poi/hpsf/TypeWriter.java
  
  Index: TypeWriter.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/TypeWriter.java,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- TypeWriter.java	30 Aug 2003 09:13:52 -0000	1.2
  +++ TypeWriter.java	2 Dec 2003 17:46:01 -0000	1.3
  @@ -185,7 +185,8 @@
        * @exception IOException if an I/O error occurs
        */
       public static void writeToStream(final OutputStream out,
  -                                     final Property[] properties)
  +                                     final Property[] properties,
  +                                     final int codepage)
           throws IOException, UnsupportedVariantTypeException
       {
           /* If there are no properties don't write anything. */
  @@ -207,7 +208,7 @@
               final Property p = (Property) properties[i];
               long type = p.getType();
               writeUIntToStream(out, type);
  -            VariantSupport.write(out, (int) type, p.getValue());
  +            VariantSupport.write(out, (int) type, p.getValue(), codepage);
           }
       }
   
  
  
  
  1.6       +62 -26    jakarta-poi/src/java/org/apache/poi/hpsf/VariantSupport.java
  
  Index: VariantSupport.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/java/org/apache/poi/hpsf/VariantSupport.java,v
  retrieving revision 1.5
  retrieving revision 1.6
  diff -u -r1.5 -r1.6
  --- VariantSupport.java	23 Oct 2003 20:44:24 -0000	1.5
  +++ VariantSupport.java	2 Dec 2003 17:46:01 -0000	1.6
  @@ -64,6 +64,7 @@
   
   import java.io.IOException;
   import java.io.OutputStream;
  +import java.io.UnsupportedEncodingException;
   import java.util.Date;
   import java.util.LinkedList;
   import java.util.List;
  @@ -163,17 +164,21 @@
        * @param length The length of the variant including the variant
        * type field
        * @param type The variant type to read
  +     * @param codepage The codepage to use to write non-wide strings
        * @return A Java object that corresponds best to the variant
        * field. For example, a VT_I4 is returned as a {@link Long}, a
        * VT_LPSTR as a {@link String}.
        * @exception ReadingNotSupportedException if a property is to be written
        * who's variant type HPSF does not yet support
  +     * @exception UnsupportedEncodingException if the specified codepage is not
  +     * supported
        *
        * @see Variant
        */
       public static Object read(final byte[] src, final int offset,
  -                              final int length, final long type)
  -        throws ReadingNotSupportedException
  +                              final int length, final long type,
  +                              final int codepage)
  +        throws ReadingNotSupportedException, UnsupportedEncodingException
       {
           Object value;
           int o1 = offset;
  @@ -221,18 +226,18 @@
                    * Read a byte string. In Java it is represented as a
                    * String object. The 0x00 bytes at the end must be
                    * stripped.
  -                 *
  -                 * FIXME (2): Reading an 8-bit string should pay attention
  -                 * to the codepage. Currently the byte making out the
  -                 * property's value are interpreted according to the
  -                 * platform's default character set.
                    */
                   final int first = o1 + LittleEndian.INT_SIZE;
                   long last = first + LittleEndian.getUInt(src, o1) - 1;
                   o1 += LittleEndian.INT_SIZE;
  +                final int rawLength = (int) (last - first + 1);
                   while (src[(int) last] == 0 && first <= last)
                       last--;
  -                value = new String(src, (int) first, (int) (last - first + 1));
  +                final int l = (int) (last - first + 1);
  +                value = codepage != -1 ?
  +                    new String(src, (int) first, l,
  +                               codepageToEncoding(codepage)) :
  +                    new String(src, (int) first, l);
                   break;
               }
               case Variant.VT_LPWSTR:
  @@ -299,12 +304,45 @@
   
   
       /**
  +     * <p>Turns a codepage number into the equivalent character encoding's 
  +     * name.</p>
  +     *
  +     * @param codepage The codepage number
  +     * 
  +     * @return The character encoding's name. If the codepage number is 65001, 
  +     * the encoding name is "UTF-8". All other positive numbers are mapped to
  +     * "cp" followed by the number, e.g. if the codepage number is 1252 the 
  +     * returned character encoding name will be "cp1252".
  +     * 
  +     * @exception UnsupportedEncodingException if the specified codepage is
  +     * less than zero.
  +     */
  +    public static String codepageToEncoding(final int codepage)
  +    throws UnsupportedEncodingException
  +    {
  +        if (codepage <= 0)
  +            throw new UnsupportedEncodingException
  +                ("Codepage number may not be " + codepage);
  +        switch (codepage)
  +        {
  +            case 1200:
  +                return "UTF-16";
  +            case 65001:
  +                return "UTF-8";
  +            default:
  +                return "cp" + codepage;
  +        }
  +    }
  +
  +
  +    /**
        * <p>Writes a variant value to an output stream. This method ensures that
        * always a multiple of 4 bytes is written.</p>
        *
        * @param out The stream to write the value to.
        * @param type The variant's type.
        * @param value The variant's value.
  +     * @param codepage The codepage to use to write non-wide strings
        * @return The number of entities that have been written. In many cases an
        * "entity" is a byte but this is not always the case.
        * @exception IOException if an I/O exceptions occurs
  @@ -312,7 +350,7 @@
        * who's variant type HPSF does not yet support
        */
       public static int write(final OutputStream out, final long type,
  -                            final Object value)
  +                            final Object value, final int codepage)
           throws IOException, WritingNotSupportedException
       {
           int length = 0;
  @@ -330,16 +368,13 @@
               }
               case Variant.VT_LPSTR:
               {
  -                length = TypeWriter.writeUIntToStream
  -                    (out, ((String) value).length() + 1);
  -                char[] s = Util.pad4((String) value);
  -                /* FIXME (2): The following line forces characters to bytes.
  -                 * This is generally wrong and should only be done according to
  -                 * a codepage. Alternatively Unicode could be written (see 
  -                 * Variant.VT_LPWSTR). */
  -                byte[] b = new byte[s.length + 1];
  -                for (int i = 0; i < s.length; i++)
  -                    b[i] = (byte) s[i];
  +                final byte[] bytes =
  +                    (codepage == -1 ?
  +                    ((String) value).getBytes() :
  +                    ((String) value).getBytes(codepageToEncoding(codepage)));
  +                length = TypeWriter.writeUIntToStream(out, bytes.length + 1);
  +                final byte[] b = new byte[bytes.length + 1];
  +                System.arraycopy(bytes, 0, b, 0, bytes.length);
                   b[b.length - 1] = 0x00;
                   out.write(b);
                   length += b.length;
  @@ -419,12 +454,13 @@
               }
           }
   
  -        /* Add 0x00 character to write a multiple of four bytes: */
  -        while (length % 4 != 0)
  -        {
  -            out.write(0);
  -            length++;
  -        }
  +        /* Add 0x00 characters to write a multiple of four bytes: */
  +        // FIXME (1) Try this!
  +//        while (length % 4 != 0)
  +//        {
  +//            out.write(0);
  +//            length++;
  +//        }
           return length;
       }
   
  
  
  
  1.8       +44 -36    jakarta-poi/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java
  
  Index: TestWrite.java
  ===================================================================
  RCS file: /home/cvs/jakarta-poi/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java,v
  retrieving revision 1.7
  retrieving revision 1.8
  diff -u -r1.7 -r1.8
  --- TestWrite.java	18 Sep 2003 18:56:35 -0000	1.7
  +++ TestWrite.java	2 Dec 2003 17:46:01 -0000	1.8
  @@ -357,7 +357,10 @@
                       catch (Exception ex)
                       {
                           ex.printStackTrace();
  -                        throw new RuntimeException(ex);
  +                        throw new RuntimeException(ex.toString());
  +                        /* FIXME (2): Replace the previous line by the following
  +                         * one once we no longer need JDK 1.3 compatibility. */
  +                        // throw new RuntimeException(ex);
                       }
                   }
               },
  @@ -398,37 +401,40 @@
       public void testVariantTypes()
       {
           Throwable t = null;
  +        final int codepage = -1;
  +        /* FIXME (2): Add tests for various codepages! */
           try
           {
  -            check(Variant.VT_EMPTY, null);
  -            check(Variant.VT_BOOL, new Boolean(true));
  -            check(Variant.VT_BOOL, new Boolean(false));
  -            check(Variant.VT_CF, new byte[]{0});
  -            check(Variant.VT_CF, new byte[]{0, 1});
  -            check(Variant.VT_CF, new byte[]{0, 1, 2});
  -            check(Variant.VT_CF, new byte[]{0, 1, 2, 3});
  -            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4});
  -            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5});
  -            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
  -            check(Variant.VT_I2, new Integer(27));
  -            check(Variant.VT_I4, new Long(28));
  -            check(Variant.VT_FILETIME, new Date());
  -            check(Variant.VT_LPSTR, "");
  -            check(Variant.VT_LPSTR, "�");
  -            check(Variant.VT_LPSTR, "��");
  -            check(Variant.VT_LPSTR, "���");
  -            check(Variant.VT_LPSTR, "����");
  -            check(Variant.VT_LPSTR, "�����");
  -            check(Variant.VT_LPSTR, "������");
  -            check(Variant.VT_LPSTR, "�������");
  -            check(Variant.VT_LPWSTR, "");
  -            check(Variant.VT_LPWSTR, "�");
  -            check(Variant.VT_LPWSTR, "��");
  -            check(Variant.VT_LPWSTR, "���");
  -            check(Variant.VT_LPWSTR, "����");
  -            check(Variant.VT_LPWSTR, "�����");
  -            check(Variant.VT_LPWSTR, "������");
  -            check(Variant.VT_LPWSTR, "�������");
  +            check(Variant.VT_EMPTY, null, codepage);
  +            check(Variant.VT_BOOL, new Boolean(true), codepage);
  +            check(Variant.VT_BOOL, new Boolean(false), codepage);
  +            check(Variant.VT_CF, new byte[]{0}, codepage);
  +            check(Variant.VT_CF, new byte[]{0, 1}, codepage);
  +            check(Variant.VT_CF, new byte[]{0, 1, 2}, codepage);
  +            check(Variant.VT_CF, new byte[]{0, 1, 2, 3}, codepage);
  +            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4}, codepage);
  +            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5}, codepage);
  +            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 
  +                  codepage);
  +            check(Variant.VT_I2, new Integer(27), codepage);
  +            check(Variant.VT_I4, new Long(28), codepage);
  +            check(Variant.VT_FILETIME, new Date(), codepage);
  +            check(Variant.VT_LPSTR, "", codepage);
  +            check(Variant.VT_LPSTR, "�", codepage);
  +            check(Variant.VT_LPSTR, "��", codepage);
  +            check(Variant.VT_LPSTR, "���", codepage);
  +            check(Variant.VT_LPSTR, "����", codepage);
  +            check(Variant.VT_LPSTR, "�����", codepage);
  +            check(Variant.VT_LPSTR, "������", codepage);
  +            check(Variant.VT_LPSTR, "�������", codepage);
  +            check(Variant.VT_LPWSTR, "", codepage);
  +            check(Variant.VT_LPWSTR, "�", codepage);
  +            check(Variant.VT_LPWSTR, "��", codepage);
  +            check(Variant.VT_LPWSTR, "���", codepage);
  +            check(Variant.VT_LPWSTR, "����", codepage);
  +            check(Variant.VT_LPWSTR, "�����", codepage);
  +            check(Variant.VT_LPWSTR, "������", codepage);
  +            check(Variant.VT_LPWSTR, "�������", codepage);
           }
           catch (Exception ex)
           {
  @@ -466,20 +472,22 @@
        * @throws UnsupportedVariantTypeException if the variant is not supported.
        * @throws IOException if an I/O exception occurs.
        */
  -    private void check(final long variantType, final Object value)
  +    private void check(final long variantType, final Object value, 
  +                       final int codepage)
           throws UnsupportedVariantTypeException, IOException
       {
           final ByteArrayOutputStream out = new ByteArrayOutputStream();
  -        VariantSupport.write(out, variantType, value);
  +        VariantSupport.write(out, variantType, value, codepage);
           out.close();
           final byte[] b = out.toByteArray();
           final Object objRead =
               VariantSupport.read(b, 0, b.length + LittleEndian.INT_SIZE,
  -                                variantType);
  +                                variantType, -1);
           if (objRead instanceof byte[])
           {
  -            final int diff = diff(org.apache.poi.hpsf.Util.pad4
  -                ((byte[]) value), (byte[]) objRead);
  +//            final int diff = diff(org.apache.poi.hpsf.Util.pad4
  +//                ((byte[]) value), (byte[]) objRead);
  +            final int diff = diff((byte[]) value, (byte[]) objRead);
               if (diff >= 0)
                   fail("Byte arrays are different. First different byte is at " +
                        "index " + diff + ".");
  
  
  
  1.1                  jakarta-poi/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc
  
  	<<Binary file>>
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org