You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2016/03/29 22:33:25 UTC

[jira] [Created] (PDFBOX-3295) Improve parsing performance of object streams

Andreas Lehmkühler created PDFBOX-3295:
------------------------------------------

             Summary: Improve parsing performance of object streams
                 Key: PDFBOX-3295
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3295
             Project: PDFBox
          Issue Type: Improvement
          Components: Parsing
    Affects Versions: 2.0.0, 2.1.0
            Reporter: Andreas Lehmkühler
            Assignee: Andreas Lehmkühler
             Fix For: 2.0.1, 2.1.0


Round about a year ago [~torakiki] posted a comment  about some xref refactoring on the dev list:
{quote}
few days ago I was profiling PDFBox when loading medium/large size
documents and I think I found something.
If you try loading the document
http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf you'll see
it takes quite some time and that's mostly spent in the
XrefTrailerResolver.getContainedObjectNumbers. The issue is that every time
an object contained in an unparsed object stream is found, the
XrefTrailerResolver performs a full scan of the xref entries found in the
document, in this case hundreds of thousands. If the object streams are
many (like in the given doc), it performs many full scans resulting in poor
performance.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org