You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2011/08/02 15:14:19 UTC

[compress] Deflater#getBytesRead and friends

Hi,

one of the main drivers for switching to Java5 was that the methods you
use to determine the compressed and uncompressed sizes of data for ZIP
archives used to return ints in Java 1.4 and new methods have been added
that return longs.

I've committed an ignored unit test inside the ZIP package
(DeflaterInflaterTest) that shows that those methods really only return
unsigned ints and not longs for any JDK < 7 that I have tested so far.
The tests compress 4GByte + 4KByte of data and the methods return 4KByte
when asked how many bytes they have seen.

ZipArchiveOutputStream has already been changed to count the bytes
itself and not rely on Deflater, but only for the uncompressed size.
I'm afraid the same "unsigned int" behavior applies to the methods
returning the compressed sizes as well but I don't have the patience to
wait for Deflater to eat up enough random data so that the compressed
result finally exceeds 4GByte (the existing test case already takes four
minutes on my personal notebook - less than two at work, time to invest,
maybe).

ZipArchiveOutputStream can intercept the stream and simply count how
many bytes have been written to determine the compressed size, but
ZipArchiveInputStream is a different beast.

It may be a useful heuristic to assume that the result is correct modulo
2^32.  ZipArchiveInputStream knows how many bytes it has read but it
might have read more than it needed to and has to push back the excess
bytes when decompressing a file.  It knows the compressed size must be
between the number of bytes read and the number of bytes read before the
last read operation so the offset in multiples of 4GByte that is missing
for the remainder could be determined.

For this heuristic to work we'd need to be sure the value returned by
Inflater is either correct or correct modulo 2^32 and I'd ask anybody
with a more exotic Java impl than I have used (OpenJDK on Linux,
Sun/Oracle versions of Java5/6/7 on Win7) to remove the @Ignore from the
test case and run it.  It should either pass or return something like

Failed tests:
  deflaterBytesRead(org.apache.commons.compress.archivers.zip.DeflaterInflaterTest): expected:<4294971392> but was:<4096>
  inflaterBytesWritten(org.apache.commons.compress.archivers.zip.DeflaterInflaterTest): expected:<4294971392> but was:<4096>

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org