You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2013/08/17 14:45:48 UTC
[jira] [Comment Edited] (PDFBOX-1694) Bug in
org.apache.pdfbox.io.Ascii85InputStream
[ https://issues.apache.org/jira/browse/PDFBOX-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742913#comment-13742913 ]
Tilman Hausherr edited comment on PDFBOX-1694 at 8/17/13 12:44 PM:
-------------------------------------------------------------------
Here's the test, to be put into src\test\java\org\apache\pdfbox\io\TestIOUtils.java.
After correcting PDFBOX-1696, the test never completed, or crashed because of gc problems. A run with the profiler showed that there were tons of java.lang.ref.Finalizer objects, and ASCII85OutputStream objects. Deleting finalize() in ASCII85OutputStream.java solved the problem. I see no sense in having that function: per java documentation, it isn't known WHEN it will be called, so the flush() that is called there might happening much too late if a user forgot to call close(). The test took 25 minutes on my 4 year old W7 system. So it isn't for normal builds, only to test changes made in the two modules.
was (Author: tilman):
Here's the test, to be put into src\test\java\org\apache\pdfbox\io\TestIOUtils.java.
/**
* Tests ASCII85OutputStream and ASCII85InputStream
*/
public void testAscii85()
{
System.out.println(new Date());
try
{
// test 2^32 possible data values in 4 bytes (PDFBOX-1694)
byte[] buf = new byte[4];
for (long l = 0; l <= 0xFFFFFFFFl; ++l)
{
//too slow
//testAscii85Buffer(ByteBuffer.allocate(4).putInt(l).array());
buf[0] = (byte) (l & 0xFF);
buf[1] = (byte) ((l >> 8) & 0xFF);
buf[2] = (byte) ((l >> 16) & 0xFF);
buf[3] = (byte) ((l >> 24) & 0xFF);
testAscii85Buffer(buf);
if (l % 1000000 == 0)
System.out.println(l + ", " + (l * 100 / 0xFFFFFFFFl) + " percent done, " + new Date());
}
// test some fixed and some random filled buffers
for (int size = 1; size < 500; ++size)
{
buf = new byte[size];
for (int cnt = 0; cnt < 500; ++cnt)
{
if (cnt < 256)
{
for (int i = 0; i < buf.length; ++i)
{
buf[i] = (byte) (cnt - 128);
}
}
else
{
for (int i = 0; i < buf.length; ++i)
{
buf[i] = (byte) (Math.random() * 256 - 128);
}
}
testAscii85Buffer(buf);
}
}
}
catch (IOException ex)
{
fail("IOException: " + ex);
}
System.out.println(new Date());
}
private void testAscii85Buffer(byte[] srcBuf) throws IOException
{
int size = srcBuf.length;
/*
System.out.print("src: ");
for (int i = 0; i < srcBuf.length; ++i)
System.out.print(srcBuf[i] + " ");
System.out.println();
*/
// encode + write
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(size * 2);
try (ASCII85OutputStream a85out = new ASCII85OutputStream(byteArrayOutputStream))
{
a85out.write(srcBuf);
}
byte[] encodedByteArray = byteArrayOutputStream.toByteArray();
byteArrayOutputStream.reset();
byteArrayOutputStream = null; // for gc
if (encodedByteArray.length == 0)
{
fail("empty encoding for size " + size);
}
/*
System.out.print("cod: ");
for (int i = 0; i < encodedByteArray.length; ++i)
System.out.print (encodedByteArray[i] + " ");
System.out.println();
*/
// read + decode
byte[] dstBuf = new byte[size + 1];
ASCII85InputStream a85in = new ASCII85InputStream(new ByteArrayInputStream(encodedByteArray));
int newLen = a85in.read(dstBuf);
a85in.close();
/*
System.out.print ("dst: ");
for (int i = 0; i < newLen; ++i)
System.out.print (dstBuf[i] + " ");
System.out.println();
*/
// Compare
if (newLen != srcBuf.length)
{
fail("different length for size " + size + ", src len: " + srcBuf.length + " vs. dst len: " + newLen);
}
for (int i = 0; i < srcBuf.length; ++i)
{
if (srcBuf[i] != dstBuf[i])
{
fail("different content for size " + size);
}
}
}
After correcting PDFBOX-1696, the test never completed, or crashed because of gc problems. A run with the profiler showed that there were tons of java.lang.ref.Finalizer objects, and ASCII85OutputStream objects. Deleting finalize() in ASCII85OutputStream.java solved the problem. I see no sense in having that function: per java documentation, it isn't known WHEN it will be called, so the flush() that is called there might happening much too late if a user forgot to call close(). The test took 25 minutes on my 4 year old W7 system. So it isn't for normal builds, only to test changes made in the two modules.
> Bug in org.apache.pdfbox.io.Ascii85InputStream
> ----------------------------------------------
>
> Key: PDFBOX-1694
> URL: https://issues.apache.org/jira/browse/PDFBOX-1694
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.7.1
> Environment: Any
> Reporter: Peter Costello
> Labels: Ascii85Decode
> Attachments: test.java
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> Method 'org.apache.pdfbox.io.Ascii85InputStream.read()' has bug when reading final set of char that are not modulo-4.
> Test file="www.mzweb.com.br/grupobimbo/web/arquivos/Bimbo_Historia_20070409_Esp.pdf".
> On page#0 there is a dictionary "323 0 obj << /Length 1492 /Filter [/Ascii85Decode /FlateDecode]>>"
> Last set of bytes to decode is "%f" or 0x25, 0x66
> Ascii85InputStream pads this to "%f~!!" and correctly generates the final byte 0x0f.
> Including the '~' end-of-data char in the padding is a major bug.
> If the final padding were "%f!!!", the final byte decoded would be 0x0e (which is wrong).
> The correct padding is the 'u' char, or "%fuuu" (See http://en.wikipedia.org/wiki/Ascii85)
> This is a quick fix.
> The PDF files for corporate website "Grupo Bimbo" include lots of examples using Ascii85Decode/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira