You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Phil Varner (JIRA)" <ji...@apache.org> on 2010/01/05 21:11:54 UTC

[jira] Created: (PDFBOX-590) PDFXrefStreamParser iterates when no elements are available

PDFXrefStreamParser iterates when no elements are available
-----------------------------------------------------------

                 Key: PDFBOX-590
                 URL: https://issues.apache.org/jira/browse/PDFBOX-590
             Project: PDFBox
          Issue Type: Bug
            Reporter: Phil Varner


Exception:

org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at mycompany....
        at java.lang.Thread.run(Thread.java:534)
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(AbstractList.java:426)
        at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 11 more

PDF file: www.oppenheim.pl/plpl/_download/09_05_11_Archiv.pdf
This is happening in the PDFXrefStreamParser.parse() method because there is no objIter.hasNext() test to protect the objIter.next() call on line 115. This is an outright bug.

Specifically, the current code looks like so:

public void parse() throws IOException {
    ...
            Iterator objIter = objNums.iterator(); //<------- here we create the Iterator
            /*
             * Calculating the size of the line in bytes
             */
            int w0 = xrefFormat.getInt(0);
            int w1 = xrefFormat.getInt(1);
            int w2 = xrefFormat.getInt(2);
            int lineSize = w0 + w1 + w2;
            
            while(pdfSource.available() > 0)
            {
                byte[] currLine = new byte[lineSize];
                pdfSource.read(currLine);

                int type = 0;
                /*
                 * Grabs the number of bytes specified for the first column in
                 * the W array and stores it.
                 */
                for(int i = 0; i < w0; i++)
                {
                    type += (currLine[i] & 0x00ff) << ((w0 - i - 1)* 8);
                }
                //Need to remember the current objID
                Integer objID = (Integer)objIter.next(); //<---- here we attempt to pull objects out of it.
                /*
                 * 3 different types of entries.
                 */
                switch(type)
                {
                    // ... do stuff ...
                }
            }
    ...
}

The code seems to be written with the assumption that if pdfSource.available() >0 that the object count will have another increment. That seems a bit vulnerable to corrupt streams. Further it is a logic error because the stream seems to contain lines of different types not processed as Xref objects. At least that seems clear from my cursory step through.

I think it should be

            while(pdfSource.available() > 0 && objIter.hasNext())

instead, so the call to next() returns the correct Integer when next() is called later on.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PDFBOX-590) PDFXrefStreamParser iterates when no elements are available

Posted by "Phil Varner (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Varner closed PDFBOX-590.
------------------------------

    Resolution: Fixed

Already fixed in trunk, but with no associated bug.

> PDFXrefStreamParser iterates when no elements are available
> -----------------------------------------------------------
>
>                 Key: PDFBOX-590
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-590
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Phil Varner
>
> Exception:
> org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at mycompany....
>         at java.lang.Thread.run(Thread.java:534)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(AbstractList.java:426)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 11 more
> PDF file: www.oppenheim.pl/plpl/_download/09_05_11_Archiv.pdf
> This is happening in the PDFXrefStreamParser.parse() method because there is no objIter.hasNext() test to protect the objIter.next() call on line 115. This is an outright bug.
> Specifically, the current code looks like so:
> public void parse() throws IOException {
>     ...
>             Iterator objIter = objNums.iterator(); //<------- here we create the Iterator
>             /*
>              * Calculating the size of the line in bytes
>              */
>             int w0 = xrefFormat.getInt(0);
>             int w1 = xrefFormat.getInt(1);
>             int w2 = xrefFormat.getInt(2);
>             int lineSize = w0 + w1 + w2;
>             
>             while(pdfSource.available() > 0)
>             {
>                 byte[] currLine = new byte[lineSize];
>                 pdfSource.read(currLine);
>                 int type = 0;
>                 /*
>                  * Grabs the number of bytes specified for the first column in
>                  * the W array and stores it.
>                  */
>                 for(int i = 0; i < w0; i++)
>                 {
>                     type += (currLine[i] & 0x00ff) << ((w0 - i - 1)* 8);
>                 }
>                 //Need to remember the current objID
>                 Integer objID = (Integer)objIter.next(); //<---- here we attempt to pull objects out of it.
>                 /*
>                  * 3 different types of entries.
>                  */
>                 switch(type)
>                 {
>                     // ... do stuff ...
>                 }
>             }
>     ...
> }
> The code seems to be written with the assumption that if pdfSource.available() >0 that the object count will have another increment. That seems a bit vulnerable to corrupt streams. Further it is a logic error because the stream seems to contain lines of different types not processed as Xref objects. At least that seems clear from my cursory step through.
> I think it should be
>             while(pdfSource.available() > 0 && objIter.hasNext())
> instead, so the call to next() returns the correct Integer when next() is called later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.