You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2014/04/20 13:17:15 UTC

[jira] [Resolved] (PDFBOX-2035) Ignore badly formatted toUnicode CMaps

     [ https://issues.apache.org/jira/browse/PDFBOX-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-2035.
----------------------------------------

    Resolution: Fixed

I've made the CMap-parser more lenient with regard to the usage of different kind of white spaces to format CMap files based on Cheng proposed patch. I've added those changes in revision 1588736 (trunk) and 1588737 (1.8 branch).

Thanks for the contribution!



> Ignore badly formatted toUnicode CMaps
> --------------------------------------
>
>                 Key: PDFBOX-2035
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2035
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.8.4, 2.0.0
>            Reporter: Cheng Leong
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.5, 2.0.0
>
>         Attachments: Ignore_badly-formatted_CMap_ToUnicode_instructions.patch, experienced_java_developer.pdf
>
>
> Copied from PDFBOX-399:
> Submitting a patch for ignoring badly-formatted CMap ToUnicode instructions.
> This allows parsing of some ToUnicode resource streams that would otherwise throw exceptions which were silently consumed. This allows text extraction to get the correctly mapped characters.
> Specifically parse token<hex> adjacency without whitespace separating them, eat all whitespace within a hex value, and return a partially constructed CMap instead of throwing an exception.
> I don't see a problem with the previous test case example (BlackHat...) but I've modified the test case based on an example from the wild: http://www.itsix.com/media/experienced_java_developer.pdf
> edit: forgot to mention that this patch was designed on 1.8.3, but also worked on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)