You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2015/01/04 06:27:07 UTC

[compress] zip64 writing seems to be broken

Hi

building the compress antlib in Gump fails (and has been failing for a
few days but I didn't notice):

http://vmgump.apache.org/gump/public/antlibs-compress/compress-antlib-test/gump_work/build_antlibs-compress_compress-antlib-test.html

The test creates a ZIP with a single 5GB entry and then asserts it can
locate the entry inside the archive:

java.util.zip.ZipException: archive's ZIP64 end of central directory locator is corrupt.
  at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory64(ZipFile.java:812)
  at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:791)
  at org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:519)
  at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:216)
  at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:192)
  at org.apache.ant.compress.resources.ZipResource.fetchEntry(ZipResource.java:175)

$ zip -Tv /tmp/testoutput/test.zip
zip warning: Zip64 EOCDR not found where expected - compensating
zip warning: (try -A to adjust offsets)
zip warning: expected 1 entries but found 0

I'll try to reproduce this as a unit test for compress later today, just
wanted to send an early heads up.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-04, Stefan Bodewig wrote:

> I'll leave my box alone to run the whole suite of ZIP64 ITs (i.e. enable
> the run-zipit profile) and report back later.

Failed tests:
  Zip64SupportIT.write100KFilesFile:305->withTemporaryArchive:2342
  arrays first differed at element [8]; expected:<52> but was:<122>
    Zip64SupportIT.write100KFilesFileModeAlways:313->withTemporaryArchive:2342
  arrays first differed at element [8]; expected:<52> but was:<-6>
    Zip64SupportIT.write100KFilesStream:309->withTemporaryArchive:2342
  arrays first differed at element [8]; expected:<52> but was:<122>
    Zip64SupportIT.write100KFilesStreamModeAlways:318->withTemporaryArchive:2342
  arrays first differed at element [8]; expected:<52> but was:<-6>

Tests in error:
  Zip64SupportIT.writeAndRead5GBOfZerosUsingZipFile:137->read5GBOfZerosUsingZipFileImpl:2450
  ยป Zip


Tests run: 79, Failures: 4, Errors: 1, Skipped: 2

and the two skipped tests have been skipped because of

Failed to write archive because of: archive's ZIP64 end of central
directory locator is corrupt. - likely not enough disk space.
Failed to write archive because of: archive's ZIP64 end of central
directory locator is corrupt. - likely not enough disk space.

so are actually failures, the tests are
read3EntriesCreatingBigArchiveFileUsingZipFile and
readSelfGenerated100KFilesUsingZipFile

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-04, Kristian Rosenvold wrote:

> Not entirely unsurprisingly this broken in r1648585. I'll try to
> understand it tonight

I might find time before that, not sure, but I'll have a look myself as
well.

It might be a good idea to run the zip and tar ITs in some continuous
build environment, the problem is they'd take about an hour and
temporarily create files of up to 8GB on disk.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Kristian Rosenvold <kr...@apache.org>.
2015-01-05 15:12 GMT+01:00 sebb <se...@gmail.com>:
> On 5 January 2015 at 13:43, Stefan Bodewig <bo...@apache.org> wrote:
>> On 2015-01-04, Kristian Rosenvold wrote:
>>
>>> Most surprising to me is that it seems like the overhead of lots of
>>> small calls to RandomAccessFile.write seems to be a lot costlier than
>>> I thought it would be. It seems like consolidating to a larger byte
>>> array before calling write is a *lot* faster.
>>
>> This surprises me as well.
>
> Could be due to need to lock data in memory in native code.
> This usually means data has to be copied to a safe buffer.
> A single large copy will be faster than lots of small ones.

All of this disappears into native code pretty quickly so there might
be OS-specific badness happening on OSX for all I know. I'll check on
linux to see if there's a difference. But one thing is quite clear; if
I do 1000 writes of 100 bytes each (for a grand total of 100K data)  I
can probably /copy/ the data at least 10 times in memory to make up
for the difference in speed.

Kristian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by sebb <se...@gmail.com>.
On 5 January 2015 at 13:43, Stefan Bodewig <bo...@apache.org> wrote:
> On 2015-01-04, Kristian Rosenvold wrote:
>
>> Most surprising to me is that it seems like the overhead of lots of
>> small calls to RandomAccessFile.write seems to be a lot costlier than
>> I thought it would be. It seems like consolidating to a larger byte
>> array before calling write is a *lot* faster.
>
> This surprises me as well.

Could be due to need to lock data in memory in native code.
This usually means data has to be copied to a safe buffer.
A single large copy will be faster than lots of small ones.

>> So in some places where the upper memory constraint is known (like
>> writing the central directory), it seems to make a lot of sense to do
>> it in a single/a few writes.
>
> Makes sense.
>
>> I'm also looking at modifications to write the full file single pass
>> (without seek operations for sizes), it's reasonably expensive to do
>> all that seeking to establish information we already have.
>
> I'm not sure I understand where you are heading, but I'll see when you
> are ready :-)
>
> Cheers
>
>         Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-04, Kristian Rosenvold wrote:

> Most surprising to me is that it seems like the overhead of lots of
> small calls to RandomAccessFile.write seems to be a lot costlier than
> I thought it would be. It seems like consolidating to a larger byte
> array before calling write is a *lot* faster.

This surprises me as well.

> So in some places where the upper memory constraint is known (like
> writing the central directory), it seems to make a lot of sense to do
> it in a single/a few writes.

Makes sense.

> I'm also looking at modifications to write the full file single pass
> (without seek operations for sizes), it's reasonably expensive to do
> all that seeking to establish information we already have.

I'm not sure I understand where you are heading, but I'll see when you
are ready :-)

Cheers

        Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Kristian Rosenvold <kr...@gmail.com>.
Great stuff !

> getBytesWritten vs getTotalBytesWritten - svn revision 1649322
>
> Maybe we should rename getBytesWritten to something like
> getBytesWrittenForLastEntry to make the difference more obvious?

I had  hard time keeping those "written" counters correct - which you
found out :) I renamed to
getBytesWrittenForLastEntry in  r1649374.

Functionally speaking, all the maven testcases now pass with the
parallel zip algorithm.

I have been studying the performance of the gather phase for the last
days and I have a few interesting finds. Currently it manages to
gather right below 200 megabytes/s on my SSD MBP. Using various small
tweaks I seem to be able to at least double that.

Most surprising to me is that it seems like the overhead of lots of
small calls to RandomAccessFile.write seems to be a lot costlier than
I thought it would be. It seems like consolidating to a larger byte
array before calling write is a *lot* faster. So in some places where
the upper memory constraint is known (like writing the central
directory), it seems to make a lot of sense to do it in a single/a few
writes.

I'm also looking at modifications to write the full file single pass
(without seek operations for sizes), it's reasonably expensive to do
all that seeking to establish information we already have.

I'm hoping to finish all this in a few days.

Kristian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-04, Stefan Bodewig wrote:

> I'm just running all ITs again to be sure.

Tests run: 79, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
1,400.308 sec - in
org.apache.commons.compress.archivers.zip.Zip64SupportIT

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-04, Kristian Rosenvold wrote:

> Not entirely unsurprisingly this broken in r1648585. I'll try to
> understand it tonight

getBytesWritten vs getTotalBytesWritten - svn revision 1649322

Maybe we should rename getBytesWritten to something like
getBytesWrittenForLastEntry to make the difference more obvious?

I'm just running all ITs again to be sure.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Kristian Rosenvold <kr...@apache.org>.
Not entirely unsurprisingly this broken in r1648585. I'll try to
understand it tonight

Kristia


2015-01-04 11:37 GMT+01:00 Stefan Bodewig <bo...@apache.org>:
> On 2015-01-04, Stefan Bodewig wrote:
>
>> I'll try to reproduce this as a unit test for compress later today,
>
> svn revision 1649312 - run via
>
> $ mvn test -Dtest=Zip64SupportIT#writeAndRead5GBOfZerosUsingZipFile
>
> I'll leave my box alone to run the whole suite of ZIP64 ITs (i.e. enable
> the run-zipit profile) and report back later.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] zip64 writing seems to be broken

Posted by Stefan Bodewig <bo...@apache.org>.
On 2015-01-04, Stefan Bodewig wrote:

> I'll try to reproduce this as a unit test for compress later today,

svn revision 1649312 - run via

$ mvn test -Dtest=Zip64SupportIT#writeAndRead5GBOfZerosUsingZipFile

I'll leave my box alone to run the whole suite of ZIP64 ITs (i.e. enable
the run-zipit profile) and report back later.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org