You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/06/13 00:21:02 UTC

[jira] [Comment Edited] (PDFBOX-1919) Font descriptor flags are not implemented

    [ https://issues.apache.org/jira/browse/PDFBOX-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029925#comment-14029925 ] 

John Hewson edited comment on PDFBOX-1919 at 6/12/14 10:20 PM:
---------------------------------------------------------------

I took a detailed look at the PDF in question, using Acrobat Pro XI I get "IN NoRtheRN IReLAND" and I always get the same result if I copy & paste, export plain text, or export accessible text. OS X Preview gives the same result, as does Chrome's PDF viewer.

I really don't think that Acrobat is using the span tags to repair the ToUnicode table (how would it know the table was bad? What if the span tags were bad?). Andreas, what version of Acrobat did you use? Given that every PDF viewer I've tried produces the same text in all cases, I'd say that PDFBox's behaviour is correct.


was (Author: jahewson):
I took a detailed look at the PDF in question, using Acrobat Pro XI I get "IN NoRtheRN IReLAND" and I always get the same result if I copy & paste, export plain text, or export accessible text. OS X Preview gives the same result, as does Chrome's PDF viewer.

I really don't think that Acrobat is using the span tags to repair the unicode table (how would it know the table was bad? What if the span tags were bad?). Andreas, what version of Acrobat did you use? Given that every PDF viewer I've tried produces the same text in all cases, I'd say that PDFBox's behaviour is correct.

> Font descriptor flags are not implemented
> -----------------------------------------
>
>                 Key: PDFBOX-1919
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1919
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.5, 1.8.6, 2.0.0
>            Reporter: Corentin Regal
>         Attachments: PDFBOX-1919.AdobeReader.txt, PDFBOX-1919.pdf, PDFBOX-1919.txt
>
>
> The font descriptor flags are not set.
> They are described in the document "PDF reference 1.7" at : 5.7.1 Font Descriptor Flags
> The methods in PDFontDescriptor are ready but never called :
> setFlags()
> setSerif()
> setAllCap() which is used in a lot of PDF
> ...
> I saw some TODO that relate to that issue in the code, is it planned to be implemented soon?



--
This message was sent by Atlassian JIRA
(v6.2#6252)