You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2019/09/10 17:21:00 UTC

[jira] [Comment Edited] (PDFBOX-4647) pdf内嵌字体解析不出来 ABCDEE+Arial 字体

    [ https://issues.apache.org/jira/browse/PDFBOX-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926791#comment-16926791 ] 

Tilman Hausherr edited comment on PDFBOX-4647 at 9/10/19 5:20 PM:
------------------------------------------------------------------

The chinese translates to "Inline font parsing does not come out".

The "OpenType Layout tables used in font" log message is not relevant here.

You're missing the text; this is because the "ToUnicode" mapping is missing in that font. Try with Adobe Reader, you will not be able to extract it. (It is the part with "Boulevard Miguel de Cervantes". The only solution will be OCR, e.g. with Apache Tika and Tesseract.

See also

[https://pdfbox.apache.org/2.0/faq.html#text-extraction]

 


was (Author: tilman):
The chinese translates to "Inline font parsing does not come out".

You're missing the text; this is because the "ToUnicode" mapping is missing in that font. Try with Adobe Reader, you will not be able to extract it. (It is the part with "Boulevard Miguel de Cervantes". The only solution will be OCR, e.g. with Apache Tika and Tesseract.

See also

[https://pdfbox.apache.org/2.0/faq.html#text-extraction]

 

> pdf内嵌字体解析不出来  ABCDEE+Arial 字体
> -----------------------------
>
>                 Key: PDFBOX-4647
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4647
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, PDModel
>    Affects Versions: 2.0.4
>            Reporter: wanling
>            Priority: Major
>         Attachments: 5e214f828f164322a6600f183191dda5.pdf
>
>
> 报错如下:
> OpenType Layout tables used in font ABCDEE+Arial are not implemented in PDFBox and will be ignored;
> No Unicode mapping for CID+24 (24) in font ABCDEE+Arial
> Adode可以正常查看
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org