You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Christian Grobmeier <gr...@gmail.com> on 2009/02/26 07:31:58 UTC

[compress] Archiver Detection fails

Hi,

I recently figured out that a compress created zip file doesn't
necessary match the signature of
ZipArchiveInputStream.matches(...)

For example:
AbstractTestCase.createArchive creates a zip archive with several files in it.

The resulting zip archive cannot be matched in
ArchiveStreamFactory.createArchiveInputStream(xxx)

and a little debuggin showed me that the expected signature really
differs from the actual.
Choosing the Implementation manually and extracting the file is no problem.

I know there was some implementation in that area recently, any ideas
why this happens?

Cheers,
Chris.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Archiver Detection fails

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-02-26, Christian Grobmeier <gr...@gmail.com> wrote:

>> Note that the method (or better the input stream) is still broken in a
>> more general sense since it will not detect self extracting ZIP files
>> which do have a tiny native bootstrapper tacked to the front of the
>> archive.  The ZipFile class can read them, ZipArchiveInputStream
>> can't.

> Is there a chance that we can fix this in our implementation?

Well, I'll open a JIRA issue for ZipArchiveInputStream anyway, see
below for biggest problem I see with ZipArchiveInputStream (which is
why Ant never had one).

The specific question of self-extracting archives could be solved by
scanning more of the archive for a local file header and skipping
everything that comes upfront.  The native bootstrap code isn't big,
usually somewhere below 48k, so we could limit the search to a
specific amount of data and avoid scanning several gigabytes.

I wouldn't want to do that inside the matches-Method, though.  Rather
I'd say we don't autodetect self-extracting archives but make
ZipArchiveInputStream deal with it when used explicitly.

Generally speaking the InputStream metaphor doesn't work for ZIP
archives.

A ZIP archive contains what is called "central directory" at the end
of the archive.  This is the only authoritative source telling you
what is inside that archive.

Before the central directory there are the actualy contents (among
other things).  For each entry you get a local file header describing
the entry (duplicating information from the central directory) and the
actual contents.  The central directory contains a pointer to the
local file data.

java.util.ZipInputStream reads the stream in sequence and creates
ZipEntries as it finds local file information.

ZipFile (our, not the one in java.util.zip - I don't know what the
later does) reads the archve from the back and parses the central
directory to see what is inside the archive.

It is not uncommon for archiver to "update" existing archives by
adding new local file data at the end and rewrite the central
directory without removing the old local file data.  In such a case
java.util.ZipInputStream will find entries that shouldn't be there or
worse old data for updated entries.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Archiver Detection fails

Posted by Christian Grobmeier <gr...@gmail.com>.
Thanks Stefan.
I allready thought that you can identify the problem within a short time.

> Note that the method (or better the input stream) is still broken in a
> more general sense since it will not detect self extracting ZIP files
> which do have a tiny native bootstrapper tacked to the front of the
> archive.  The ZipFile class can read them, ZipArchiveInputStream
> can't.

Is there a chance that we can fix this in our implementation?

Cheers
Christian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Archiver Detection fails

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-02-26, Stefan Bodewig <bo...@apache.org> wrote:

> On 2009-02-26, Christian Grobmeier <gr...@gmail.com> wrote:

>> I recently figured out that a compress created zip file doesn't
>> necessary match the signature of
>> ZipArchiveInputStream.matches(...)

> This is because the method is wrong - will be "fixed" in a few
> minutes, see below.

Note that JarArchiveInputStream.matches was just as wrong but with the
added twist that it expected the general purpose bit 3 to be on (while
zip expected it to be off).

What this really means is "if the bit is set, I'm using the DEFLATED
method and store the length inside the data descriptor" - something
that is completely irrelevant to the question of whether it is a jar
or a zip file.

Given that jars are zips, you can't really autodetect jars and the new
code reflects this - I had to change the unit test which assumed one
could.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Archiver Detection fails

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-02-26, Christian Grobmeier <gr...@gmail.com> wrote:

> I recently figured out that a compress created zip file doesn't
> necessary match the signature of
> ZipArchiveInputStream.matches(...)

This is because the method is wrong - will be "fixed" in a few
minutes, see below.

The expected signature right now is

"local file header marker" - four bytes
"version needed to extract" - two bytes
"general purpose flag" - two bytes

you really only can rely on the LFH-signature since "version needed to
extract" will be different for many archivers, as will be the general
purpose bits.

I'll reduce the sugnature check to the LFH.

Note that the method (or better the input stream) is still broken in a
more general sense since it will not detect self extracting ZIP files
which do have a tiny native bootstrapper tacked to the front of the
archive.  The ZipFile class can read them, ZipArchiveInputStream
can't.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org