You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2013/04/19 12:03:16 UTC

[jira] [Commented] (PDFBOX-1572) PDFBox ExtracText problems with "ª"

    [ https://issues.apache.org/jira/browse/PDFBOX-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636232#comment-13636232 ] 

Timo Boehme commented on PDFBOX-1572:
-------------------------------------

To me this looks like an OCR problem which is not related to PDFBox. Could you please attach a PDF demonstrating the issue and/or test with another program (e.g. Acrobat Reader) the extraction of the problematic text?
                
> PDFBox ExtracText problems with "ª"
> -----------------------------------
>
>                 Key: PDFBOX-1572
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1572
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Daniel Tizon
>
> PDFBox have problems to detect ª in some PDF's.
> Examples: 
> I have in my PDF: 1ª
> When I extract text: P
> I have in my PDF: 2ª
> When I extract text: 22
> I have in my PDF: 3ª
> When I extract text: 32
> and there are a lot of more examples related with "ª"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira