You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2014/10/23 20:27:35 UTC
[jira] [Closed] (PDFBOX-1038) Strange signs after pdftohtml
parsing.
[ https://issues.apache.org/jira/browse/PDFBOX-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler closed PDFBOX-1038.
--------------------------------------
Resolution: Fixed
Fix Version/s: 1.6.0
Assignee: Andreas Lehmkühler
Works fine at least starting with 1.6.0 except a small part of the text which can't be extracted due to a missing mapping. Acrobat provides a similar result
> Strange signs after pdftohtml parsing.
> --------------------------------------
>
> Key: PDFBOX-1038
> URL: https://issues.apache.org/jira/browse/PDFBOX-1038
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.5.0
> Environment: windows vista
> Reporter: Funfel
> Assignee: Andreas Lehmkühler
> Fix For: 1.6.0
>
> Attachments: pg0007.html, pg0007.pdf
>
>
> After parsing pdf to html I've got a strange signs which supposed to be nice letter (not chinese or japanese). I've noticed that font description for them is UniversPro-Roman-Identity-H.
> How can get it generated properly?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)