You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2012/06/06 15:24:22 UTC

[jira] [Created] (PDFBOX-1333) Stream parsing of BaseParser should fall back to scanning if length value is wrong

Timo Boehme created PDFBOX-1333:
-----------------------------------

             Summary: Stream parsing of BaseParser should fall back to scanning if length value is wrong
                 Key: PDFBOX-1333
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1333
             Project: PDFBox
          Issue Type: Improvement
          Components: Parsing
    Affects Versions: 1.7.0
            Reporter: Timo Boehme
            Assignee: Timo Boehme
             Fix For: 1.8.0


In 1.7.0 stream parsing in BaseParser was optimized to use length value if available. The advantage is faster parsing and independence of 'endstream' bytes sequences in stream. However the disadvantage is that streams with wrong length values cannot be parsed anymore (see PDFBOX-1331).
To solve this we should check if 'endstream' is really reached when using length value and if not, fall back to 'old' behavior of reading stream until 'endstream' is found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1333) Stream parsing of BaseParser should fall back to scanning if length value is wrong

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme updated PDFBOX-1333:
--------------------------------

    Attachment: 2012-06-06_BaseParser_streamFallBack.patch

patch for BaseParser which tests that stream parsing using length value reaches 'endstream', if not, parsed data are pushed back and stream is parsed again using scanning for 'endstream';
this patch also increases push back buffer to 64kB in order to be able to hold larger streams; size can be modified using system property org.apache.pdfbox.baseParser.pushBackSize
                
> Stream parsing of BaseParser should fall back to scanning if length value is wrong
> ----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1333
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1333
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 1.7.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>             Fix For: 1.8.0
>
>         Attachments: 2012-06-06_BaseParser_streamFallBack.patch
>
>
> In 1.7.0 stream parsing in BaseParser was optimized to use length value if available. The advantage is faster parsing and independence of 'endstream' bytes sequences in stream. However the disadvantage is that streams with wrong length values cannot be parsed anymore (see PDFBOX-1331).
> To solve this we should check if 'endstream' is really reached when using length value and if not, fall back to 'old' behavior of reading stream until 'endstream' is found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (PDFBOX-1333) Stream parsing of BaseParser should fall back to scanning if length value is wrong

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme closed PDFBOX-1333.
-------------------------------

    Resolution: Fixed

fixed by applying patch in rev. 1346891
                
> Stream parsing of BaseParser should fall back to scanning if length value is wrong
> ----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1333
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1333
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 1.7.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>             Fix For: 1.8.0
>
>         Attachments: 2012-06-06_BaseParser_streamFallBack.patch
>
>
> In 1.7.0 stream parsing in BaseParser was optimized to use length value if available. The advantage is faster parsing and independence of 'endstream' bytes sequences in stream. However the disadvantage is that streams with wrong length values cannot be parsed anymore (see PDFBOX-1331).
> To solve this we should check if 'endstream' is really reached when using length value and if not, fall back to 'old' behavior of reading stream until 'endstream' is found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira