You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2013/05/20 17:48:18 UTC

[compress] ZipFile and Duplicate Entries

Hi,

over in Ant land a bug was raised that points at a problem in ZipFile
(Commons Compress' zip package is a fork of Ant's code and I try to keep
them in sync).

When an archive contains duplicate entries - which is totally valid in
ZIPs - ZipFile's getEntry can sometimes return ZipArchiveEntry instances
that will receive null for ZipFile's getInputStream.  This is
COMPRESS-227 which contains the details of the problem.

I've for now fixed it in trunk by ignoring all but the last entry od the
same name seen while parsing the central directory.  I've chosen to pick
the last since this is what ZipFile used to do for duplicate entries
without extra fields anyway.

There may be reasons to return only the first entry and there may even
be reasons to provide a different method that returned all entries of a
given name, something like

     List<ZipArchiveEntry> getEntries(String name)

The later would require some more book-keeping but I don't think the
performance impact would be too big.

Tools like InfoZIP's zip/unzip list all entries of a given name.

Do you think it is worth it?

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] ZipFile and Duplicate Entries

Posted by sebb <se...@gmail.com>.
On 21 May 2013 10:17, Stefan Bodewig <bo...@apache.org> wrote:
> On 2013-05-20, sebb wrote:
>
>> On 20 May 2013 16:48, Stefan Bodewig <bo...@apache.org> wrote:
>
>>> I've for now fixed it in trunk by ignoring all but the last entry od the
>>> same name seen while parsing the central directory.  I've chosen to pick
>>> the last since this is what ZipFile used to do for duplicate entries
>>> without extra fields anyway.
>
>>> There may be reasons to return only the first entry and there may even
>>> be reasons to provide a different method that returned all entries of a
>>> given name, something like
>
>>>      List<ZipArchiveEntry> getEntries(String name)
>
>>> The later would require some more book-keeping but I don't think the
>>> performance impact would be too big.
>
>>> Tools like InfoZIP's zip/unzip list all entries of a given name.
>
>>> Do you think it is worth it?
>
>> What do WinZip and Windows do?
>
> No idea.  This is what the InfoZIP tools do:
>
> ,----
> | $ zip -Tv /tmp/testoutput/test.zip
> | Archive:  /tmp/testoutput/test.zip
> |     testing: test1.txt                OK
> |     testing: test1.txt                OK
> | No errors detected in compressed data of /tmp/testoutput/test.zip.
> | test of /tmp/testoutput/test.zip OK
> | $ unzip -l /tmp/testoutput/test.zip
> | Archive:  /tmp/testoutput/test.zip
> |   Length      Date    Time    Name
> | ---------  ---------- -----   ----
> |         0  2013-05-20 15:45   test1.txt
> |         0  2013-05-20 15:45   test1.txt
> | ---------                     -------
> |         0                     2 files
> | $ unzip /tmp/testoutput/test.zip
> | Archive:  /tmp/testoutput/test.zip
> |   inflating: test1.txt
> | replace test1.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
> |   inflating: test1.txt
> `----
>
>> Are there any sample Zips with multiple entries?
>
> I've attached one to COMPRESS-227 but it is trivial to create one (see
> testDuplicateEntry in ZipFileTest).

Thanks - both WinZip and 7-Zip show the 2 entries.
Not sure about Windows yet - need access to a different system.

> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] ZipFile and Duplicate Entries

Posted by Stefan Bodewig <bo...@apache.org>.
On 2013-05-20, sebb wrote:

> On 20 May 2013 16:48, Stefan Bodewig <bo...@apache.org> wrote:

>> I've for now fixed it in trunk by ignoring all but the last entry od the
>> same name seen while parsing the central directory.  I've chosen to pick
>> the last since this is what ZipFile used to do for duplicate entries
>> without extra fields anyway.

>> There may be reasons to return only the first entry and there may even
>> be reasons to provide a different method that returned all entries of a
>> given name, something like

>>      List<ZipArchiveEntry> getEntries(String name)

>> The later would require some more book-keeping but I don't think the
>> performance impact would be too big.

>> Tools like InfoZIP's zip/unzip list all entries of a given name.

>> Do you think it is worth it?

> What do WinZip and Windows do?

No idea.  This is what the InfoZIP tools do:

,----
| $ zip -Tv /tmp/testoutput/test.zip 
| Archive:  /tmp/testoutput/test.zip
|     testing: test1.txt                OK
|     testing: test1.txt                OK
| No errors detected in compressed data of /tmp/testoutput/test.zip.
| test of /tmp/testoutput/test.zip OK
| $ unzip -l /tmp/testoutput/test.zip 
| Archive:  /tmp/testoutput/test.zip
|   Length      Date    Time    Name
| ---------  ---------- -----   ----
|         0  2013-05-20 15:45   test1.txt
|         0  2013-05-20 15:45   test1.txt
| ---------                     -------
|         0                     2 files
| $ unzip /tmp/testoutput/test.zip 
| Archive:  /tmp/testoutput/test.zip
|   inflating: test1.txt               
| replace test1.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
|   inflating: test1.txt 
`----

> Are there any sample Zips with multiple entries?

I've attached one to COMPRESS-227 but it is trivial to create one (see
testDuplicateEntry in ZipFileTest).

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] ZipFile and Duplicate Entries

Posted by sebb <se...@gmail.com>.
On 20 May 2013 16:48, Stefan Bodewig <bo...@apache.org> wrote:
> Hi,
>
> over in Ant land a bug was raised that points at a problem in ZipFile
> (Commons Compress' zip package is a fork of Ant's code and I try to keep
> them in sync).
>
> When an archive contains duplicate entries - which is totally valid in
> ZIPs - ZipFile's getEntry can sometimes return ZipArchiveEntry instances
> that will receive null for ZipFile's getInputStream.  This is
> COMPRESS-227 which contains the details of the problem.
>
> I've for now fixed it in trunk by ignoring all but the last entry od the
> same name seen while parsing the central directory.  I've chosen to pick
> the last since this is what ZipFile used to do for duplicate entries
> without extra fields anyway.
>
> There may be reasons to return only the first entry and there may even
> be reasons to provide a different method that returned all entries of a
> given name, something like
>
>      List<ZipArchiveEntry> getEntries(String name)
>
> The later would require some more book-keeping but I don't think the
> performance impact would be too big.
>
> Tools like InfoZIP's zip/unzip list all entries of a given name.
>
> Do you think it is worth it?

What do WinZip and Windows do?
Are there any sample Zips with multiple entries?

> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org