You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2011/02/02 20:45:29 UTC

[jira] Commented: (PDFBOX-938) Wrong extracted text using PDFBox 1.4

    [ https://issues.apache.org/jira/browse/PDFBOX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989765#comment-12989765 ] 

Andreas Lehmkühler commented on PDFBOX-938:
-------------------------------------------

@Hesham 
I can confirm the issue with your sample. But I can't help you. As I already said, I'm not an AWT expert, but it seems that something is wrong with the encoding or the used font in your application.

As the current trunk works fine I'm going to solve this issue.

> Wrong extracted text using PDFBox 1.4
> -------------------------------------
>
>                 Key: PDFBOX-938
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-938
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>             Fix For: 1.5.0
>
>         Attachments: Another book - Wrong extracted f char.pdf, Another+book+-+Wrong+extracted+f+char.txt, Sample.zip, Wrong extracted f char.pdf
>
>
> Hello ,
>  
> I am using PDFBox v1.4 to extract some text from a PDF, but some words are not extracted right.
> For example words :
> "Nefteiugansk" is read: "Nežeiugansk"
> "fiancee" is read: "Äancée"
> "first" is read: "Ärst"
>  
> Please check the attached file to test this.
> Best regards

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira