You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Villu Ruusmann (JIRA)" <ji...@apache.org> on 2010/03/17 09:17:27 UTC
[jira] Created: (PDFBOX-664) Incorrect rendering
Incorrect rendering
-------------------
Key: PDFBOX-664
URL: https://issues.apache.org/jira/browse/PDFBOX-664
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 1.1.0
Reporter: Villu Ruusmann
Attachments: frontpage.png
Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable results when trying to perform text extraction from the following Slovak language PDF document:
http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf
While I'm not expert enough to say anything about text extraction, I clearly see numerous rendering problems. Please take a look at the image attachment frontpage.png
Quite obviously, Slovak language makes use of custom character encoding schemes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-664) Incorrect rendering
Posted by "Villu Ruusmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Villu Ruusmann updated PDFBOX-664:
----------------------------------
Attachment: frontpage.png
> Incorrect rendering
> -------------------
>
> Key: PDFBOX-664
> URL: https://issues.apache.org/jira/browse/PDFBOX-664
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 1.1.0
> Reporter: Villu Ruusmann
> Attachments: frontpage.png
>
>
> Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable results when trying to perform text extraction from the following Slovak language PDF document:
> http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf
> While I'm not expert enough to say anything about text extraction, I clearly see numerous rendering problems. Please take a look at the image attachment frontpage.png
> Quite obviously, Slovak language makes use of custom character encoding schemes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-664) Incorrect rendering
Posted by "Daniel Wilson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846484#action_12846484 ]
Daniel Wilson commented on PDFBOX-664:
--------------------------------------
I've seen worse, but ...
1. The outline box is messed up.
2. The word in the black box near the upper right is missing.
3. The word Register at the top of the left column has some text missing.
The Slovak encoding may well be to blame for point 2, but I suspect there are other problems occurring for the other problems.
> Incorrect rendering
> -------------------
>
> Key: PDFBOX-664
> URL: https://issues.apache.org/jira/browse/PDFBOX-664
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 1.1.0
> Reporter: Villu Ruusmann
> Attachments: frontpage.png
>
>
> Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable results when trying to perform text extraction from the following Slovak language PDF document:
> http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf
> While I'm not expert enough to say anything about text extraction, I clearly see numerous rendering problems. Please take a look at the image attachment frontpage.png
> Quite obviously, Slovak language makes use of custom character encoding schemes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.