You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by sebb <se...@gmail.com> on 2009/03/28 16:14:35 UTC

Re: svn commit: r759472 - in /commons/proper/compress/trunk/src/site: site.xml xdoc/examples.xml xdoc/index.xml xdoc/zip.xml

On 28/03/2009, bodewig@apache.org <bo...@apache.org> wrote:
> Author: bodewig
>  Date: Sat Mar 28 14:46:32 2009
>  New Revision: 759472
>
>  URL: http://svn.apache.org/viewvc?rev=759472&view=rev
>  Log:
>  some more in depth documentation

Very useful!

>  Added:
>     commons/proper/compress/trunk/src/site/xdoc/examples.xml   (with props)
>     commons/proper/compress/trunk/src/site/xdoc/zip.xml   (with props)
>  Modified:
>     commons/proper/compress/trunk/src/site/site.xml
>     commons/proper/compress/trunk/src/site/xdoc/index.xml
>
>  Modified: commons/proper/compress/trunk/src/site/site.xml
>  URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/site.xml?rev=759472&r1=759471&r2=759472&view=diff
>  ==============================================================================
>  --- commons/proper/compress/trunk/src/site/site.xml (original)
>  +++ commons/proper/compress/trunk/src/site/site.xml Sat Mar 28 14:46:32 2009
>  @@ -28,6 +28,7 @@
>    <body>
>      <menu name="Compress">
>        <item name="Overview"    href="/index.html"/>
>  +      <item name="Examples"    href="/examples.html"/>
>        <item name="Issue Tracking" href="/issue-tracking.html"/>
>        <item name="Download"    href="/downloads.html"/>
>        <item name="Wiki"        href="http://wiki.apache.org/commons/Compress"/>
>
>  Added: commons/proper/compress/trunk/src/site/xdoc/examples.xml
>  URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/examples.xml?rev=759472&view=auto
>  ==============================================================================
>  --- commons/proper/compress/trunk/src/site/xdoc/examples.xml (added)
>  +++ commons/proper/compress/trunk/src/site/xdoc/examples.xml Sat Mar 28 14:46:32 2009
>  @@ -0,0 +1,279 @@
>  +<?xml version="1.0"?>
>  +<!--
>  +
>  +   Licensed to the Apache Software Foundation (ASF) under one or more
>  +   contributor license agreements.  See the NOTICE file distributed with
>  +   this work for additional information regarding copyright ownership.
>  +   The ASF licenses this file to You under the Apache License, Version 2.0
>  +   (the "License"); you may not use this file except in compliance with
>  +   the License.  You may obtain a copy of the License at
>  +
>  +       http://www.apache.org/licenses/LICENSE-2.0
>  +
>  +   Unless required by applicable law or agreed to in writing, software
>  +   distributed under the License is distributed on an "AS IS" BASIS,
>  +   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  +   See the License for the specific language governing permissions and
>  +   limitations under the License.
>  +
>  +-->
>  +<document>
>  +  <properties>
>  +    <title>Commons Compress Examples</title>
>  +    <author email="dev@commons.apache.org">Commons Documentation Team</author>
>  +  </properties>
>  +  <body>
>  +    <section name="Examples">
>  +
>  +      <subsection name="Factories">
>  +
>  +        <p>Compress provides factory methods to create input/output
>  +          streams based on the names of the compressor or archiver
>  +          format as well as factory methods that try to guess the
>  +          format of an input stream.</p>
>  +
>  +        <p>To create a compressor writing to a given output by using
>  +          the algorithm name:</p>
>  +        <source><![CDATA[
>  +CompressorOutputStream gzippedOut = new CompressorStreamFactory()
>  +    .createCompressorOutputStream("gz", myOutputStream);
>  +]]></source>
>  +
>  +        <p>Make the factory guess the input format for a given stream:</p>
>  +        <source><![CDATA[
>  +ArchiveInputStream input = new ArchiveStreamFactory()
>  +    .createArchiveInputStream(originalInput);
>  +]]></source>
>  +
>  +      </subsection>
>  +
>  +      <subsection name="ar">
>  +
>  +        <p>In addition to the information stored
>  +          in <code>ArchiveEntry</code> a <code>ArArchiveEntry</code>
>  +          stores information about the owner user and group as well as
>  +          Unix permissions.</p>
>  +
>  +        <p>Adding an entry to an ar archive:</p>
>  +<source><![CDATA[
>  +ArArchiveEntry entry = new ArArchiveEntry(name, size);
>  +arOutput.putNextEntry(entry);
>  +arOutput.write(contentOfEntry);
>  +arOutput.closeArchiveEntry();
>  +]]></source>
>  +
>  +        <p>Reading entries from an ar archive:</p>
>  +<source><![CDATA[
>  +ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
>  +byte[] content = new byte[entry.getSize()];
>  +LOOP UNTIL entry.getSize() HAS BEEN READ {

I thought the idea was that the ArchiveInputStreams would not allow
one to read past the end of the entry, so one can just read until
read() returns -1?

>  +    arInput(read, offset, content.length - offset);
>  +}
>  +]]></source>
>  +
>  +      </subsection>
>  +
>  +      <subsection name="cpio">
>  +
>  +        <p>In addition to the information stored
>  +          in <code>ArchiveEntry</code> a <code>CpioArchiveEntry</code>
>  +          stores various attributes including information about the
>  +          original owner and permissions.</p>
>  +
>  +        <p>The cpio package supports the "new portable" as well as the
>  +          "old" format of CPIO archives in their binary, ASCII and
>  +          "with CRC" variants.</p>
>  +
>  +        <p>Adding an entry to a cpio archive:</p>
>  +<source><![CDATA[
>  +CpioArchiveEntry entry = new CpioArchiveEntry(name, size);
>  +cpioOutput.putNextEntry(entry);
>  +cpioOutput.write(contentOfEntry);
>  +cpioOutput.closeArchiveEntry();
>  +]]></source>
>  +
>  +        <p>Reading entries from an cpio archive:</p>
>  +<source><![CDATA[
>  +CpioArchiveEntry entry = cpioInput.getNextCPIOEntry();
>  +byte[] content = new byte[entry.getSize()];
>  +LOOP UNTIL entry.getSize() HAS BEEN READ {

As above.

>  +    cpioInput(read, offset, content.length - offset);
>  +}
>  +]]></source>
>  +
>  +      </subsection>
>  +
>  +      <subsection name="tar">
>  +
>  +        <p>In addition to the information stored
>  +          in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code>
>  +          stores various attributes including information about the
>  +          original owner and permissions.</p>
>  +
>  +        <p>There are several different tar formats and the TAR package
>  +          of Compress 1.0 only provides the common functionality of
>  +          the existing variants.</p>
>  +        <p>The original format didn't support file names longer than
>  +          100 characters and the tar package will fail if you try to
>  +          add an entry longer than that.
>  +          The <code>longFileMode</code> option
>  +          of <code>TarArchiveOutputStream</code> can be used to make
>  +          the archive truncate such names or use the GNU tar variant
>  +          of storing such names.  If you choose the GNU tar option,
>  +          the archive can not be extracted using many other tar
>  +          implementations like the ones of OpenBSD, Solaris or MacOS
>  +          X.</p>
>  +
>  +        <p><code>TarArchiveInputStream</code> will recognize the GNU
>  +          tar extension for long file names and read the longer names
>  +          accordingly.</p>
>  +
>  +        <p>Adding an entry to a tar archive:</p>
>  +<source><![CDATA[
>  +TarArchiveEntry entry = new TarArchiveEntry(name);
>  +entry.setSize(size);
>  +tarOutput.putNextEntry(entry);
>  +tarOutput.write(contentOfEntry);
>  +tarOutput.closeArchiveEntry();
>  +]]></source>
>  +
>  +        <p>Reading entries from an tar archive:</p>
>  +<source><![CDATA[
>  +TarArchiveEntry entry = tarInput.getNextTarEntry();
>  +byte[] content = new byte[entry.getSize()];
>  +LOOP UNTIL entry.getSize() HAS BEEN READ {

As above.

>  +    tarInput(read, offset, content.length - offset);
>  +}
>  +]]></source>
>  +      </subsection>
>  +
>  +      <subsection name="zip">
>  +        <p>The ZIP package has a <a href="zip.html">dedicated
>  +            documentation page</a>.</p>
>  +
>  +        <p>Adding an entry to a zip archive:</p>
>  +<source><![CDATA[
>  +ZipArchiveEntry entry = new ZipArchiveEntry(name);
>  +entry.setSize(size);
>  +zipOutput.putNextEntry(entry);
>  +zipOutput.write(contentOfEntry);
>  +zipOutput.closeArchiveEntry();
>  +]]></source>
>  +
>  +        <p>Reading entries from an zip archive:</p>
>  +<source><![CDATA[
>  +ZipArchiveEntry entry = zipInput.getNextZipEntry();
>  +byte[] content = new byte[entry.getSize()];
>  +LOOP UNTIL entry.getSize() HAS BEEN READ {

As above

>  +    zipInput(read, offset, content.length - offset);
>  +}
>  +]]></source>
>  +
>  +        <p>Reading entries from an zip archive using the
>  +          recommended <code>ZipFile</code> class:</p>
>  +<source><![CDATA[
>  +ZipArchiveEntry entry = zipFile.getEntry(name);
>  +InputStream content = zipFile.getInputStream(entry);
>  +try {
>  +    READ UNTIL content IS EXHAUSTED
>  +} finally {
>  +    content.close();
>  +}
>  +]]></source>
>  +      </subsection>
>  +
>  +      <subsection name="jar">
>  +        <p>In general, JAR archives are ZIP files, so the JAR package
>  +          supports all options provided by the ZIP package.</p>
>  +
>  +        <p>To be interoperable JAR archives should always be created
>  +          using the UTF-8 encoding for file names (which is the
>  +          default).</p>
>  +
>  +        <p>Archives created using <code>JarArchiveOutputStream</code>
>  +          will implicitly add a <code>JarMarker</code> extra field to
>  +          the very first archive entry of the archive which will make
>  +          Solaris recognize them as Java archives and allows them to
>  +          be used as executables.</p>
>  +
>  +        <p>Note that <code>ArchiveStreamFactory</code> doesn't
>  +          distinguish ZIP archives from JAR archives, so if you use
>  +          the one-argument <code>createArchiveInputStream</code>
>  +          method on a JAR archive, it will still return the more
>  +          generic <code>ZipArchiveInputStream</code>.</p>
>  +
>  +        <p>The <code>JarArchiveEntry</code> class contains fields for
>  +          certificates and attributes that are planned to be supported
>  +          in the future but are not supported as of Compress 1.0.</p>
>  +
>  +        <p>Adding an entry to a jar archive:</p>
>  +<source><![CDATA[
>  +JarArchiveEntry entry = new JarArchiveEntry(name, size);
>  +entry.setSize(size);
>  +jarOutput.putNextEntry(entry);
>  +jarOutput.write(contentOfEntry);
>  +jarOutput.closeArchiveEntry();
>  +]]></source>
>  +
>  +        <p>Reading entries from an jar archive:</p>
>  +<source><![CDATA[
>  +JarArchiveEntry entry = jarInput.getNextJarEntry();
>  +byte[] content = new byte[entry.getSize()];
>  +LOOP UNTIL entry.getSize() HAS BEEN READ {

As above

>  +    jarInput(read, offset, content.length - offset);
>  +}
>  +]]></source>
>  +      </subsection>
>  +
>  +      <subsection name="bzip2">
>  +
>  +        <p>Note that <code>BZipCompressorOutputStream</code> keeps
>  +          hold of some big data structures in memory.  While it is
>  +          true recommended for any stream that you close it as soon as
>  +          you no longer needed, this is even more important
>  +          for <code>BZipCompressorOutputStream</code>.</p>
>  +
>  +        <p>Uncompressing a given bzip2 compressed file (you would
>  +          certainly add exception handling and make sure all streams
>  +          get closed properly):</p>
>  +<source><![CDATA[
>  +FileInputStream in = new FileInputStream("archive.tar.bz2");
>  +FileOutputStream out = new FileOutputStream("archive.tar");
>  +BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
>  +final byte[] buffer = new byte[buffersize];
>  +int n = 0;
>  +while (-1 != (n = bzIn.read(buffer))) {
>  +    out.write(buffer, 0, n);
>  +}
>  +out.close();
>  +bzIn.close();
>  +]]></source>
>  +
>  +      </subsection>
>  +
>  +      <subsection name="gzip">
>  +
>  +        <p>The implementation of this package is provided by
>  +          the <code>java.util.zip</code> package of the Java class
>  +          library.</p>
>  +
>  +        <p>Uncompressing a given bzip2 compressed file (you would
>  +          certainly add exception handling and make sure all streams
>  +          get closed properly):</p>
>  +<source><![CDATA[
>  +FileInputStream in = new FileInputStream("archive.tar.gz");
>  +FileOutputStream out = new FileOutputStream("archive.tar");
>  +GZipCompressorInputStream bzIn = new GZipCompressorInputStream(in);
>  +final byte[] buffer = new byte[buffersize];
>  +int n = 0;
>  +while (-1 != (n = bzIn.read(buffer))) {
>  +    out.write(buffer, 0, n);
>  +}
>  +out.close();
>  +bzIn.close();
>  +]]></source>
>  +      </subsection>
>  +
>  +    </section>
>  +  </body>
>  +</document>
>
>  Propchange: commons/proper/compress/trunk/src/site/xdoc/examples.xml
>  ------------------------------------------------------------------------------
>     svn:eol-style = native
>
>  Modified: commons/proper/compress/trunk/src/site/xdoc/index.xml
>  URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/index.xml?rev=759472&r1=759471&r2=759472&view=diff
>  ==============================================================================
>  --- commons/proper/compress/trunk/src/site/xdoc/index.xml (original)
>  +++ commons/proper/compress/trunk/src/site/xdoc/index.xml Sat Mar 28 14:46:32 2009
>  @@ -56,7 +56,34 @@
>              </subsection>
>          </section>
>          <section name="Documentation">
>  +          <p>The compress component is split into <em>compressors</em> and
>  +            <em>archivers</em>.  While <em>compressors</em>
>  +            (un)compress streams that usually store a single
>  +            entry, <em>archivers</em> deal with archives that contain
>  +            structured content represented
>  +            by <code>ArchiveEntry</code> instances which in turn
>  +            usually correspond to single files or directories.</p>
>  +
>  +          <p>Currently the bzip2 and gzip formats are supported as
>  +            compressors where gzip support is provided by
>  +            the <code>java.util.zip</code> package of the Java class
>  +            library.</p>
>  +
>  +          <p>The ar, cpio, tar and zip formats are supported as
>  +            archivers where the <a href="zip.html">zip</a>
>  +            implementation provides capabilities that go beyond the
>  +            features found in java.util.zip.</p>
>  +
>  +          <p>The compress component provides abstract base classes for
>  +            compressors and archivers together with factories that can
>  +            be used to choose implementations by algorithm name.  In
>  +            the case of input streams the factories can also be used
>  +            to guess the format and provide the matching
>  +            implementation.</p>
>  +
>            <ul>
>  +            <li>The <a href="examples.html">examples page</a> contains
>  +            more detailed information and some examples.</li>
>              <li>The <a href="apidocs/index.html">Javadoc</a> of the latest SVN</li>
>              <li>The <a href="http://svn.apache.org/viewvc/commons/proper/compress/">SVN
>                  repository</a> can be browsed.</li>
>
>  Added: commons/proper/compress/trunk/src/site/xdoc/zip.xml
>  URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/zip.xml?rev=759472&view=auto
>  ==============================================================================
>  --- commons/proper/compress/trunk/src/site/xdoc/zip.xml (added)
>  +++ commons/proper/compress/trunk/src/site/xdoc/zip.xml Sat Mar 28 14:46:32 2009
>  @@ -0,0 +1,226 @@
>  +<?xml version="1.0"?>
>  +<!--
>  +
>  +   Licensed to the Apache Software Foundation (ASF) under one or more
>  +   contributor license agreements.  See the NOTICE file distributed with
>  +   this work for additional information regarding copyright ownership.
>  +   The ASF licenses this file to You under the Apache License, Version 2.0
>  +   (the "License"); you may not use this file except in compliance with
>  +   the License.  You may obtain a copy of the License at
>  +
>  +       http://www.apache.org/licenses/LICENSE-2.0
>  +
>  +   Unless required by applicable law or agreed to in writing, software
>  +   distributed under the License is distributed on an "AS IS" BASIS,
>  +   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  +   See the License for the specific language governing permissions and
>  +   limitations under the License.
>  +
>  +-->
>  +<document>
>  +  <properties>
>  +    <title>Commons Compress ZIP package</title>
>  +    <author email="dev@commons.apache.org">Commons Documentation Team</author>
>  +  </properties>
>  +  <body>
>  +    <section name="The ZIP package">
>  +
>  +      <p>The ZIP package provides features not found
>  +        in <code>java.util.zip</code>:</p>
>  +
>  +      <ul>
>  +        <li>Support for encodings other than UTF-8 for filenames and
>  +          comments.</li>
>  +        <li>Access to internal and external attributes (which are used
>  +          to store Unix permission by some zip implementations).</li>
>  +        <li>Structured support for extra fields.</li>
>  +      </ul>
>  +
>  +      <p>In addition to the information stored
>  +        in <code>ArchiveEntry</code> a <code>ZipArchiveEntry</code>
>  +        stores internal and external attributes as well as extra
>  +        fields which may contain information like Unix permissions,
>  +        information about the platform they've been created on, their
>  +        last modification time and an optional comment.</p>
>  +
>  +      <subsection name="ZipArchiveInputStream vs ZipFile">
>  +
>  +        <p>ZIP archives store a archive entries in sequence and
>  +          contain a registry of all entries at the very end of the
>  +          archive.  It is acceptable for an archive to contain several
>  +          entries of the same name and have the registry (called the
>  +          central directory) decide which entry is actually to be used
>  +          (if any).</p>
>  +
>  +        <p>In addition the ZIP format stores certain information only
>  +          inside the central directory but not together with the entry
>  +          itself, this is:</p>
>  +
>  +        <ul>
>  +          <li>internal and external attributes</li>
>  +          <li>different or additional extra fields</li>
>  +        </ul>
>  +
>  +        <p>This means the ZIP format cannot really be parsed
>  +          correctly while reading a non-seekable stream, which is what
>  +          <code>ZipArchiveInputStream</code> is forced to do.  As a
>  +          result <code>ZipArchiveInputStream</code></p>
>  +        <ul>
>  +          <li>may return entries that are not part of the central
>  +            directory at all and shouldn't be considered part of the
>  +            archive.</li>
>  +          <li>may return several entries with the same name.</li>
>  +          <li>will not return internal or external attributes.</li>
>  +          <li>may return incomplete extra field data.</li>
>  +        </ul>
>  +
>  +        <p><code>ZipArchiveInputStream</code> shares these limitations
>  +          with <code>java.util.zip.ZipInputStream</code>.</p>
>  +
>  +        <p><code>ZipFile</code> is able to read the central directory
>  +          first and provide correct and complete information on any
>  +          ZIP archive.</p>
>  +
>  +        <p>If possible, you should always prefer <code>ZipFile</code>
>  +          over <code>ZipArchiveInputStream</code>.</p>
>  +      </subsection>
>  +
>  +      <subsection name="Extra Fields">
>  +
>  +        <p>Inside a ZIP archive, additional data can be attached to
>  +          each entry.  The <code>java.util.zip.ZipEntry</code> class
>  +          provides access to this via the <code>get/setExtra</code>
>  +          methods as arrays of <code>byte</code>s.</p>
>  +
>  +        <p>Actually the extra data is supposed to be more structured
>  +          than that and Compress' ZIP package provides access to the
>  +          structured data as <code>ExtraField</code> instances.  Only
>  +          a subset of all defined extra field formats is supported by
>  +          the package, any other extra field will be stored
>  +          as <code>UnrecognizedExtraField</code>.</p>
>  +
>  +      </subsection>
>  +
>  +      <subsection name="Encoding" id="encoding">
>  +
>  +        <p>Traditionally the ZIP archive format uses CodePage 437 as
>  +          encoding for file name, which is not sufficient for many
>  +          international character sets.</p>
>  +
>  +        <p>Over time different archivers have chosen different ways to
>  +          work around the limitation - the <code>java.util.zip</code>
>  +          packages simply uses UTF-8 as its encoding for example.</p>
>  +
>  +        <p>Ant has been offering the encoding attribute of the zip and
>  +          unzip task as a way to explicitly specify the encoding to
>  +          use (or expect) since Ant 1.4.  It defaults to the
>  +          platform's default encoding for zip and UTF-8 for jar and
>  +          other jar-like tasks (war, ear, ...) as well as the unzip
>  +          family of tasks.</p>
>  +
>  +        <p>More recent versions of the ZIP specification introduce
>  +          something called the &quot;language encoding flag&quot;
>  +          which can be used to signal that a file name has been
>  +          encoded using UTF-8.  All ZIP-archives written by Compress
>  +          will set this flag, if the encoding has been set to UTF-8.
>  +          Our interoperability tests with existing archivers didn't
>  +          show any ill effects (in fact, most archivers ignore the
>  +          flag to date), but you can turn off the "language encoding
>  +          flag" by setting the attribute
>  +          <code>useLanguageEncodingFlag</code> to <code>false</code> on the
>  +          <code>ZipArchiveOutputStream</code> if you should encounter
>  +          problems.</p>
>  +
>  +        <p>The <code>ZipFile</code>
>  +          and <code>ZipArchiveInputStream</code> classes will
>  +          recognize the language encoding flag and ignore the encoding
>  +          set in the constructor if it has been found.</p>
>  +
>  +        <p>The InfoZIP developers have introduced new ZIP extra fields
>  +          that can be used to add an additional UTF-8 encoded file
>  +          name to the entry's metadata.  Most archivers ignore these
>  +          extra fields.  <code>ZipArchiveOutputStream</code> supports
>  +          an option <code>createUnicodeExtraFields</code> which makes
>  +          it write these extra fields either for all entries
>  +          ("always") or only those whose name cannot be encoded using
>  +          the specified encoding (not-encodeable), it defaults to
>  +          "never" since the extra fields create bigger archives.</p>
>  +
>  +        <p>The fallbackToUTF8 attribute
>  +          of <code>ZipArchiveOutputStream</code> can be used to create
>  +          archives that use the specified encoding in the majority of
>  +          cases but UTF-8 and the language encoding flag for filenames
>  +          that cannot be encoded using the specified encoding.</p>
>  +
>  +        <p>The <code>ZipFile</code>
>  +          and <code>ZipArchiveInputStream</code> classes recognize the
>  +          Unicode extra fields by default and read the file name
>  +          information from them, unless you set the constructor parameter
>  +          <code>scanForUnicodeExtraFields</code> to false.</p>
>  +
>  +        <h4>Recommendations for Interoperability</h4>
>  +
>  +        <p>The optimal setting of flags depends on the archivers you
>  +          expect as consumers/producers of the ZIP archives.  Below
>  +          are some test results which may be superseded with later
>  +          versions of each tool.</p>
>  +
>  +        <ul>
>  +          <li>The java.util.zip package used by the jar executable or
>  +            to read jars from your CLASSPATH reads and writes UTF-8
>  +            names, it doesn't set or recognize any flags or Unicode
>  +            extra fields.</li>
>  +
>  +          <li>7Zip writes CodePage 437 by default but uses UTF-8 and
>  +            the language encoding flag when writing entries that
>  +            cannot be encoded as CodePage 437 (similar to the zip task
>  +            with fallbacktoUTF8 set to true).  It recognizes the
>  +            language encoding flag when reading and ignores the
>  +            Unicode extra fields.</li>
>  +
>  +          <li>WinZIP writes CodePage 437 and uses Unicode extra fields
>  +            by default.  It recognizes the Unicode extra field and the
>  +            language encoding flag when reading.</li>
>  +
>  +          <li>Windows' "compressed folder" feature doesn't recognize
>  +            any flag or extra field and creates archives using the
>  +            platforms default encoding - and expects archives to be in
>  +            that encoding when reading them.</li>
>  +
>  +          <li>InfoZIP based tools can recognize and write both, it is
>  +            a compile time option and depends on the platform so your
>  +            mileage may vary.</li>
>  +
>  +          <li>PKWARE zip tools recognize both and prefer the language
>  +            encoding flag.  They create archives using CodePage 437 if
>  +            possible and UTF-8 plus the language encoding flag for
>  +            file names that cannot be encoded as CodePage 437.</li>
>  +        </ul>
>  +
>  +        <p>So, what to do?</p>
>  +
>  +        <p>If you are creating jars, then java.util.zip is your main
>  +          consumer.  We recommend you set the encoding to UTF-8 and
>  +          keep the language encoding flag enabled.  The flag won't
>  +          help or hurt java.util.zip but archivers that support it
>  +          will show the correct file names.</p>
>  +
>  +        <p>For maximum interop it is probably best to set the encoding
>  +          to UTF-8, enable the language encoding flag and create
>  +          Unicode extra fields when writing ZIPs.  Such archives
>  +          should be extracted correctly by java.util.zip, 7Zip,
>  +          WinZIP, PKWARE tools and most likely InfoZIP tools.  They
>  +          will be unusable with Windows' "compressed folders" feature
>  +          and bigger than archives without the Unicode extra fields,
>  +          though.</p>
>  +
>  +        <p>If Windows' "compressed folders" is your primary consumer,
>  +          then your best option is to explicitly set the encoding to
>  +          the target platform.  You may want to enable creation of
>  +          Unicode extra fields so the tools that support them will
>  +          extract the file names correctly.</p>
>  +      </subsection>
>  +
>  +    </section>
>  +  </body>
>  +</document>
>
>  Propchange: commons/proper/compress/trunk/src/site/xdoc/zip.xml
>  ------------------------------------------------------------------------------
>     svn:eol-style = native
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r759472 - in /commons/proper/compress/trunk/src/site: site.xml xdoc/examples.xml xdoc/index.xml xdoc/zip.xml

Posted by sebb <se...@gmail.com>.
On 30/03/2009, Stefan Bodewig <bo...@apache.org> wrote:
> On 2009-03-28, sebb <se...@gmail.com> wrote:
>
>  > On 28/03/2009, Stefan Bodewig <bo...@apache.org> wrote:
>  >> On 2009-03-28, sebb <se...@gmail.com> wrote:
>
>
> >>> I thought the idea was that the ArchiveInputStreams would not allow
>  >>> one to read past the end of the entry, so one can just read until
>  >>> read() returns -1?
>
>
> >>  I don't think AR is the only archiver that does not return -1 once you
>  >>  read past the end of the current entry, nor am I convinced that it is
>  >>  a good idea to expect the streams to do so.
>
>  > I thought that was the main idea of the archive input stream.
>  > IMO, it makes using the classes much easier.
>
>
> It probably does.  If this is the intended behaviour, we should
>  document it properly.  Ideally at the ArchiveInputStream level.
>

I already did that ;-)

The Output Javadoc still needs to be done; I'll make a start later.

>  >>  BTW, while catching up with mail I saw a lot of discussion going on
>  >>  inside JIRA instead of on the dev list.  This may be a project
>  >>  cultural thing, but to me JIRA is not the correct place for that.
>
>  > I tend to agree, but it can be useful.
>
>  > I've seen JIRA issues that have almost no information and so are hard
>  > to follow; probably there was other information on the mailing list,
>  > but if it's not referenced from the JIRA it can be hard to find later.
>
>
> Personally I prefer to have the JIRA entry point to the mailing list
>  archive of the dev list in such a case.  Not everybody who might be
>  innterested in the discussion may be subscribed to the issues list.
>
>  Stefan
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r759472 - in /commons/proper/compress/trunk/src/site: site.xml xdoc/examples.xml xdoc/index.xml xdoc/zip.xml

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-03-28, sebb <se...@gmail.com> wrote:

> On 28/03/2009, Stefan Bodewig <bo...@apache.org> wrote:
>> On 2009-03-28, sebb <se...@gmail.com> wrote:

>>> I thought the idea was that the ArchiveInputStreams would not allow
>>> one to read past the end of the entry, so one can just read until
>>> read() returns -1?

>>  I don't think AR is the only archiver that does not return -1 once you
>>  read past the end of the current entry, nor am I convinced that it is
>>  a good idea to expect the streams to do so.

> I thought that was the main idea of the archive input stream.
> IMO, it makes using the classes much easier.

It probably does.  If this is the intended behaviour, we should
document it properly.  Ideally at the ArchiveInputStream level.

>>  BTW, while catching up with mail I saw a lot of discussion going on
>>  inside JIRA instead of on the dev list.  This may be a project
>>  cultural thing, but to me JIRA is not the correct place for that.

> I tend to agree, but it can be useful.

> I've seen JIRA issues that have almost no information and so are hard
> to follow; probably there was other information on the mailing list,
> but if it's not referenced from the JIRA it can be hard to find later.

Personally I prefer to have the JIRA entry point to the mailing list
archive of the dev list in such a case.  Not everybody who might be
innterested in the discussion may be subscribed to the issues list.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r759472 - in /commons/proper/compress/trunk/src/site: site.xml xdoc/examples.xml xdoc/index.xml xdoc/zip.xml

Posted by sebb <se...@gmail.com>.
On 28/03/2009, Stefan Bodewig <bo...@apache.org> wrote:
> On 2009-03-28, sebb <se...@gmail.com> wrote:
>
>  >>>        <p>Reading entries from an ar archive:</p>
>  >>> <source><![CDATA[
>  >>> ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
>  >>> byte[] content = new byte[entry.getSize()];
>  >>> LOOP UNTIL entry.getSize() HAS BEEN READ {
>
>  > I thought the idea was that the ArchiveInputStreams would not allow
>  > one to read past the end of the entry, so one can just read until
>  > read() returns -1?
>
>
> I wrote that on the train last night while I was offline and committed
>  it this afternoon before reading my mail 8-)
>
>  I don't think AR is the only archiver that does not return -1 once you
>  read past the end of the current entry, nor am I convinced that it is
>  a good idea to expect the streams to do so.

I thought that was the main idea of the archive input stream.
IMO, it makes using the classes much easier.

AFAICS, the test cases currently assume it.

If users are not prevented from reading past the entry without calling
getNextEntry(), the input stream class can easily get lost.

>  The code of the examples should work and IMHO the API user should
>  rather rely on the entry's size than on the stream returning -1.  Are
>  we sure our streams return -1 on directory entries immediately?

That needs to be tested and fixed if not.

>  BTW, while catching up with mail I saw a lot of discussion going on
>  inside JIRA instead of on the dev list.  This may be a project
>  cultural thing, but to me JIRA is not the correct place for that.

I tend to agree, but it can be useful.

I've seen JIRA issues that have almost no information and so are hard
to follow; probably there was other information on the mailing list,
but if it's not referenced from the JIRA it can be hard to find later.

>  Stefan
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r759472 - in /commons/proper/compress/trunk/src/site: site.xml xdoc/examples.xml xdoc/index.xml xdoc/zip.xml

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-03-28, sebb <se...@gmail.com> wrote:

>>>        <p>Reading entries from an ar archive:</p>
>>> <source><![CDATA[
>>> ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
>>> byte[] content = new byte[entry.getSize()];
>>> LOOP UNTIL entry.getSize() HAS BEEN READ {

> I thought the idea was that the ArchiveInputStreams would not allow
> one to read past the end of the entry, so one can just read until
> read() returns -1?

I wrote that on the train last night while I was offline and committed
it this afternoon before reading my mail 8-)

I don't think AR is the only archiver that does not return -1 once you
read past the end of the current entry, nor am I convinced that it is
a good idea to expect the streams to do so.

The code of the examples should work and IMHO the API user should
rather rely on the entry's size than on the stream returning -1.  Are
we sure our streams return -1 on directory entries immediately?

BTW, while catching up with mail I saw a lot of discussion going on
inside JIRA instead of on the dev list.  This may be a project
cultural thing, but to me JIRA is not the correct place for that.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org