You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Funfel (JIRA)" <ji...@apache.org> on 2011/06/14 17:33:47 UTC

[jira] [Created] (PDFBOX-1038) Strange signs after pdftohtml parsing.

Strange signs after pdftohtml parsing.
--------------------------------------

                 Key: PDFBOX-1038
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1038
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.5.0
         Environment: windows vista
            Reporter: Funfel


After parsing pdf to html I've got a strange signs which supposed to be nice letter (not chinese or japanese). I've noticed that font description for them is UniversPro-Roman-Identity-H. 
How can get it generated properly?


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (PDFBOX-1038) Strange signs after pdftohtml parsing.

Posted by "Funfel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049239#comment-13049239 ] 

Funfel edited comment on PDFBOX-1038 at 6/14/11 3:35 PM:
---------------------------------------------------------

I've attached the original pdf (one page only) and generated html

      was (Author: funfel):
    I've atached the originale pdf (one page only) and generated html
  
> Strange signs after pdftohtml parsing.
> --------------------------------------
>
>                 Key: PDFBOX-1038
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1038
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.5.0
>         Environment: windows vista
>            Reporter: Funfel
>         Attachments: pg0007.html, pg0007.pdf
>
>
> After parsing pdf to html I've got a strange signs which supposed to be nice letter (not chinese or japanese). I've noticed that font description for them is UniversPro-Roman-Identity-H. 
> How can get it generated properly?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1038) Strange signs after pdftohtml parsing.

Posted by "Funfel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Funfel updated PDFBOX-1038:
---------------------------

    Attachment: pg0007.pdf
                pg0007.html

I've atached the originale pdf (one page only) and generated html

> Strange signs after pdftohtml parsing.
> --------------------------------------
>
>                 Key: PDFBOX-1038
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1038
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.5.0
>         Environment: windows vista
>            Reporter: Funfel
>         Attachments: pg0007.html, pg0007.pdf
>
>
> After parsing pdf to html I've got a strange signs which supposed to be nice letter (not chinese or japanese). I've noticed that font description for them is UniversPro-Roman-Identity-H. 
> How can get it generated properly?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira