You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/03/14 12:22:13 UTC
[jira] [Closed] (PDFBOX-188) doesn't convert properly russian
characters
[ https://issues.apache.org/jira/browse/PDFBOX-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler closed PDFBOX-188.
-------------------------------------
Resolution: Cannot Reproduce
Assignee: Andreas Lehmkühler
Can't reproduce the described issue as there isn't any sample pdf -> set to closed
> doesn't convert properly russian characters
> -------------------------------------------
>
> Key: PDFBOX-188
> URL: https://issues.apache.org/jira/browse/PDFBOX-188
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Assignee: Andreas Lehmkühler
> Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1537323
> Originally submitted by amashtakov on 2006-08-09 04:44.
> Hi,
> I've tried to extract text from attached PDF using
> both stable release PDFBox-0.7.2 and recent nightly
> build PDFBox-0.7.3-dev-20060809 and the following code
> snipped:
> // 1. parse document
> PDFParser parser = new PDFParser(is);
> parser.parse();
> cos = parser.getDocument();
> // 2. extract text
> PDFTextStripper stripper = new PDFTextStripper();
> String text = stripper.getText(new PDDocument(cos));
> // 3. dump output
> FileOutputStream os = new FileOutputStream("file.txt");
> OutputStreamWriter ow = new
> OutputStreamWriter(os, "UTF-8");
> ow.write(text);
> ow.flush();
> Despite of russian contents of original PDF, the
> output file doesn't contain any "valid" russian
> character(s).
> I've also tried to convert the same PDF with the
> foolabs-xpdf tool - the output contains valid
> UTF-8 russian text.
> PS: I couldn't attach file because of sourcecforge
> size limit. (the size is ~545K). Is it possible
> to pass it to dev. team ?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira