You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2020/07/03 14:44:00 UTC

[jira] [Commented] (PDFBOX-4896) Don't save and restore graphic states around showGlyph in LegacyPDFStreamEngine

    [ https://issues.apache.org/jira/browse/PDFBOX-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151024#comment-17151024 ] 

Andreas Lehmkühler commented on PDFBOX-4896:
--------------------------------------------

[~Faltiska] that's an interesting observation and IMHO you are completely right. The operation {{PDFStreamEngine.showGlyph(Matrix, PDFont, int, Vector)}} should be considered as read only. I've simply removed the save/restore calls and everything works fine. I'ev studied the called code as well and can't find anything which proves you wrong.

 

[~tilman] WDYT, do we miss something important?

> Don't save and restore graphic states around showGlyph in LegacyPDFStreamEngine
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4896
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4896
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 2.0.20, 3.0.0 PDFBox
>            Reporter: Alfred
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>              Labels: Optimization
>         Attachments: PDFBOX-4896.patch
>
>
> One of the major performance bottlenecks in text extraction was the
> clone + push and the pop + clone operations on the graphic state before and after the call to showGlyph.
> Not only it was slow to clone, it also consumes large amounts of memory making the garbage collector work harder.
> When extracting text, showGlyph does not modify the graphic state so there's no need to save / restore the state.
> The same could be true in general, not just for text extraction, but I do not understand the code well enough to decide.
> I have only modified the behavior for the LegacyPDFStreamEngine, to be safe.
> The showGlyph operation sounds like a read only operation, that should not modify anything.
>  
> I have the code ready and I will submit a patch and a review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org