You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Villu Ruusmann (JIRA)" <ji...@apache.org> on 2010/03/17 09:17:27 UTC

[jira] Created: (PDFBOX-664) Incorrect rendering

Incorrect rendering
-------------------

                 Key: PDFBOX-664
                 URL: https://issues.apache.org/jira/browse/PDFBOX-664
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 1.1.0
            Reporter: Villu Ruusmann
         Attachments: frontpage.png

Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable results when trying to perform text extraction from the following Slovak language PDF document:
http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf

While I'm not expert enough to say anything about text extraction, I clearly see numerous rendering problems. Please take a look at the image attachment frontpage.png

Quite obviously, Slovak language makes use of custom character encoding schemes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-664) Incorrect rendering

Posted by "Villu Ruusmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Villu Ruusmann updated PDFBOX-664:
----------------------------------

    Attachment: frontpage.png

> Incorrect rendering
> -------------------
>
>                 Key: PDFBOX-664
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-664
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.1.0
>            Reporter: Villu Ruusmann
>         Attachments: frontpage.png
>
>
> Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable results when trying to perform text extraction from the following Slovak language PDF document:
> http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf
> While I'm not expert enough to say anything about text extraction, I clearly see numerous rendering problems. Please take a look at the image attachment frontpage.png
> Quite obviously, Slovak language makes use of custom character encoding schemes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-664) Incorrect rendering

Posted by "Daniel Wilson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846484#action_12846484 ] 

Daniel Wilson commented on PDFBOX-664:
--------------------------------------

I've seen worse, but ...
1. The outline box is messed up.
2. The word in the black box near the upper right is missing.
3. The word Register at the top of the left column has some text missing.

The Slovak encoding may well be to blame for point 2, but I suspect there are other problems occurring for the other problems.

> Incorrect rendering
> -------------------
>
>                 Key: PDFBOX-664
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-664
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.1.0
>            Reporter: Villu Ruusmann
>         Attachments: frontpage.png
>
>
> Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable results when trying to perform text extraction from the following Slovak language PDF document:
> http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf
> While I'm not expert enough to say anything about text extraction, I clearly see numerous rendering problems. Please take a look at the image attachment frontpage.png
> Quite obviously, Slovak language makes use of custom character encoding schemes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.