You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/06/12 19:28:00 UTC

[jira] [Comment Edited] (PDFBOX-4550) Poor performance with corrupt ToUnicode stream

    [ https://issues.apache.org/jira/browse/PDFBOX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862381#comment-16862381 ] 

Tilman Hausherr edited comment on PDFBOX-4550 at 6/12/19 7:27 PM:
------------------------------------------------------------------

Attached file no longer extracts text, ToUnicode stream is at {{Root/Pages/Kids/[0]/Resources/Font/F3/ToUnicode}}, and it has 
{code}
1 beginbfrange
<0000> <FFFF> <0000>
endbfrange
{code}
So that's a range > 255 and thus it is ignored.


was (Author: tilman):
Attached file no longer extracts text, ToUnicode stream is at {{Root/Pages/Kids/[0]/Resources/Font/F3/ToUnicode}}

> Poor performance with corrupt ToUnicode stream
> ----------------------------------------------
>
>                 Key: PDFBOX-4550
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4550
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering, Text extraction
>    Affects Versions: 2.0.15
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.16, 3.0.0 PDFBox
>
>         Attachments: LG5S35JUXSEH5XJC6QYISY3OBUXCKAKR-p1.pdf, PDFBOX-3442-DirectResources.pdf, PDFBOX-3442-DirectResources_unc.pdf, pdnekz1gvl7.pdf
>
>
> A confidential file with lots of corrupt streams has ToUnicode stream with corrupt contents in the beginbfrange segment where start and end have different lengths. This leads to poor performance. Such entries can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org