You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2013/08/17 14:45:48 UTC

[jira] [Comment Edited] (PDFBOX-1694) Bug in org.apache.pdfbox.io.Ascii85InputStream

    [ https://issues.apache.org/jira/browse/PDFBOX-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742913#comment-13742913 ] 

Tilman Hausherr edited comment on PDFBOX-1694 at 8/17/13 12:44 PM:
-------------------------------------------------------------------

Here's the test, to be put into src\test\java\org\apache\pdfbox\io\TestIOUtils.java. 

After correcting PDFBOX-1696, the test never completed, or crashed because of gc problems. A run with the profiler showed that there were tons of java.lang.ref.Finalizer objects, and ASCII85OutputStream objects. Deleting finalize() in ASCII85OutputStream.java solved the problem. I see no sense in having that function: per java documentation, it isn't known WHEN it will be called, so the flush() that is called there might happening much too late if a user forgot to call close(). The test took 25 minutes on my 4 year old W7 system. So it isn't for normal builds, only to test changes made in the two modules.
                
      was (Author: tilman):
    Here's the test, to be put into src\test\java\org\apache\pdfbox\io\TestIOUtils.java. 

    /**
     * Tests ASCII85OutputStream and ASCII85InputStream
     */
    public void testAscii85()
    {
        System.out.println(new Date());
        try
        {
            // test 2^32 possible data values in 4 bytes (PDFBOX-1694)
            byte[] buf = new byte[4];
            for (long l = 0; l <= 0xFFFFFFFFl; ++l)
            {
                //too slow
                //testAscii85Buffer(ByteBuffer.allocate(4).putInt(l).array());
                buf[0] = (byte) (l & 0xFF);
                buf[1] = (byte) ((l >> 8) & 0xFF);
                buf[2] = (byte) ((l >> 16) & 0xFF);
                buf[3] = (byte) ((l >> 24) & 0xFF);
                testAscii85Buffer(buf);

                if (l % 1000000 == 0)
                System.out.println(l + ", " + (l * 100 / 0xFFFFFFFFl) + " percent done, " + new Date());

            }
            // test some fixed and some random filled buffers
            for (int size = 1; size < 500; ++size)
            {
                buf = new byte[size];
                for (int cnt = 0; cnt < 500; ++cnt)
                {
                    if (cnt < 256)
                    {
                        for (int i = 0; i < buf.length; ++i)
                        {
                            buf[i] = (byte) (cnt - 128);
                        }
                    }
                    else
                    {
                        for (int i = 0; i < buf.length; ++i)
                        {
                            buf[i] = (byte) (Math.random() * 256 - 128);
                        }
                    }
                    testAscii85Buffer(buf);
                }
            }
        }
        catch (IOException ex)
        {
            fail("IOException: " + ex);
        }
        System.out.println(new Date());
    }

    private void testAscii85Buffer(byte[] srcBuf) throws IOException
    {
        int size = srcBuf.length;
        /*
         System.out.print("src: ");
         for (int i = 0; i < srcBuf.length; ++i)
         System.out.print(srcBuf[i] + " ");
         System.out.println();
         */

        // encode + write
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(size * 2);
        try (ASCII85OutputStream a85out = new ASCII85OutputStream(byteArrayOutputStream))
        {
            a85out.write(srcBuf);
        }

        byte[] encodedByteArray = byteArrayOutputStream.toByteArray();
        byteArrayOutputStream.reset();
        byteArrayOutputStream = null; // for gc
        if (encodedByteArray.length == 0)
        {
            fail("empty encoding for size " + size);
        }

        /*
         System.out.print("cod: ");
         for (int i = 0; i < encodedByteArray.length; ++i)
         System.out.print (encodedByteArray[i] + " ");
         System.out.println();
         */

        // read + decode
        byte[] dstBuf = new byte[size + 1];
        ASCII85InputStream a85in = new ASCII85InputStream(new ByteArrayInputStream(encodedByteArray));
        int newLen = a85in.read(dstBuf);
        a85in.close();

        /*
         System.out.print ("dst: ");
         for (int i = 0; i < newLen; ++i)
         System.out.print (dstBuf[i] + " ");
         System.out.println();
         */

        // Compare
        if (newLen != srcBuf.length)
        {
            fail("different length for size " + size + ", src len: " + srcBuf.length + " vs. dst len: " + newLen);
        }
        for (int i = 0; i < srcBuf.length; ++i)
        {
            if (srcBuf[i] != dstBuf[i])
            {
                fail("different content for size " + size);
            }
        }
    }

After correcting PDFBOX-1696, the test never completed, or crashed because of gc problems. A run with the profiler showed that there were tons of java.lang.ref.Finalizer objects, and ASCII85OutputStream objects. Deleting finalize() in ASCII85OutputStream.java solved the problem. I see no sense in having that function: per java documentation, it isn't known WHEN it will be called, so the flush() that is called there might happening much too late if a user forgot to call close(). The test took 25 minutes on my 4 year old W7 system. So it isn't for normal builds, only to test changes made in the two modules.
                  
> Bug in org.apache.pdfbox.io.Ascii85InputStream
> ----------------------------------------------
>
>                 Key: PDFBOX-1694
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1694
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>         Environment: Any
>            Reporter: Peter Costello
>              Labels: Ascii85Decode
>         Attachments: test.java
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Method 'org.apache.pdfbox.io.Ascii85InputStream.read()' has bug when reading final set of char that are not modulo-4.
> Test file="www.mzweb.com.br/grupobimbo/web/arquivos/Bimbo_Historia_20070409_Esp.pdf". 
> On page#0 there is a dictionary "323 0 obj << /Length 1492 /Filter [/Ascii85Decode /FlateDecode]>>"
> Last set of bytes to decode is "%f" or  0x25, 0x66
> Ascii85InputStream pads this to "%f~!!" and correctly generates the final byte 0x0f.
> Including the '~' end-of-data char in the padding is a major bug.
> If the final padding were "%f!!!", the final byte decoded would be 0x0e (which is wrong).
> The correct padding is the 'u' char, or "%fuuu" (See http://en.wikipedia.org/wiki/Ascii85)
> This is a quick fix. 
> The PDF files for corporate website "Grupo Bimbo" include lots of examples using Ascii85Decode/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira