You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Jairo Figueroa Jiménez (Jira)" <ji...@apache.org> on 2021/02/03 07:55:00 UTC

[jira] [Commented] (PDFBOX-5049) PlainText.Paragraph.getLines extremely slow on long lines

    [ https://issues.apache.org/jira/browse/PDFBOX-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277779#comment-17277779 ] 

Jairo Figueroa Jiménez commented on PDFBOX-5049:
------------------------------------------------

Has no sense make one loop what running 99760 bytes again to again. The solution is store the code following and order of unicode in ArrayList. After  is running loop with list code and recover to with getCodeToWidthMap()... It would be something like this:

https://drive.google.com/file/d/1L9JbPy1wW15hY3Zxv4MFNPUVXIeIwjo3/view?usp=sharing

I have tried it and it works quite well

> PlainText.Paragraph.getLines extremely slow on long lines
> ---------------------------------------------------------
>
>                 Key: PDFBOX-5049
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5049
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 2.0.21
>            Reporter: Tilman Hausherr
>            Priority: Major
>         Attachments: GHOSTSCRIPT-690526-0.pdf, GHOSTSCRIPT-692591-0.pdf, GHOSTSCRIPT-692591-2.pdf
>
>
> The three attached files are very slow when constructing the appearance on the field "gendate" (on the last page). That is a multiline field but with an extremely long text.
> It happens at "// single word does not fit into width".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org