You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/03/14 12:22:13 UTC

[jira] [Closed] (PDFBOX-188) doesn't convert properly russian characters

     [ https://issues.apache.org/jira/browse/PDFBOX-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-188.
-------------------------------------

    Resolution: Cannot Reproduce
      Assignee: Andreas Lehmkühler

Can't reproduce the described issue as there isn't any sample pdf -> set to closed
                
> doesn't convert properly russian characters
> -------------------------------------------
>
>                 Key: PDFBOX-188
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-188
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1537323
> Originally submitted by amashtakov on 2006-08-09 04:44.
> Hi,
> I've tried to extract text from attached PDF using
> both stable release PDFBox-0.7.2 and recent nightly 
> build PDFBox-0.7.3-dev-20060809 and the following code 
> snipped: 
> // 1. parse document
> PDFParser parser = new PDFParser(is);
> parser.parse();
> cos = parser.getDocument();
> // 2. extract text
> PDFTextStripper stripper = new PDFTextStripper();
> String text = stripper.getText(new PDDocument(cos));
> // 3. dump output
> FileOutputStream os = new FileOutputStream("file.txt");
> OutputStreamWriter ow = new 
>       OutputStreamWriter(os, "UTF-8");
> ow.write(text);
> ow.flush();
> Despite of russian contents of original PDF, the 
> output file doesn't contain any "valid"  russian 
> character(s). 
> I've also tried to convert the same PDF with the 
> foolabs-xpdf tool - the output contains valid
> UTF-8 russian text.
> PS: I couldn't attach file because of sourcecforge 
>     size limit. (the size is ~545K). Is it possible
>     to pass it to dev. team ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira