You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Vincent Hennebert (JIRA)" <ji...@apache.org> on 2009/12/04 20:46:20 UTC

[jira] Created: (PDFBOX-576) Errors in Stream Object Parsing

Errors in Stream Object Parsing
-------------------------------

                 Key: PDFBOX-576
                 URL: https://issues.apache.org/jira/browse/PDFBOX-576
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 0.8.0-incubator, 1.0.0
            Reporter: Vincent Hennebert
         Attachments: fixStreamParsing.diff

The readUntilEndStream method in org.apache.pdfbox.pdfparser.BaseParser doesn't work properly. The read(byte[]) method doesn't guarantee that it will fill the buffer, and as long as the buffer hasn't been filled the value of nextIdx will be wrong. For example, say the call to read returns 5 bytes in the buffer. nextIdx will have value 5, i.e. points to a yet uninitialized byte that will be written to the output stream. Same at the next loop iteration, and in the end the stream will start with 4 null bytes that don't belong to the original one.

Also, if the stream is terminated by endobj instead of endstream, the possible bytes that are at the beginning of the buffer and precede endobj won't be written out. The for loop will stop at the end of the buffer instead of looping back to the beginning of it if necessary (say, if endobj happens to occupy slots 3 to 8 of the buffer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-576) Errors in Stream Object Parsing

Posted by "Vincent Hennebert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vincent Hennebert updated PDFBOX-576:
-------------------------------------

    Attachment: fixStreamParsing.diff

Patch fixing the issue

> Errors in Stream Object Parsing
> -------------------------------
>
>                 Key: PDFBOX-576
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-576
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator, 1.0.0
>            Reporter: Vincent Hennebert
>         Attachments: fixStreamParsing.diff
>
>
> The readUntilEndStream method in org.apache.pdfbox.pdfparser.BaseParser doesn't work properly. The read(byte[]) method doesn't guarantee that it will fill the buffer, and as long as the buffer hasn't been filled the value of nextIdx will be wrong. For example, say the call to read returns 5 bytes in the buffer. nextIdx will have value 5, i.e. points to a yet uninitialized byte that will be written to the output stream. Same at the next loop iteration, and in the end the stream will start with 4 null bytes that don't belong to the original one.
> Also, if the stream is terminated by endobj instead of endstream, the possible bytes that are at the beginning of the buffer and precede endobj won't be written out. The for loop will stop at the end of the buffer instead of looping back to the beginning of it if necessary (say, if endobj happens to occupy slots 3 to 8 of the buffer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.