You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2013/08/06 11:34:49 UTC
[jira] [Updated] (PDFBOX-1680) PDFTextStripper returns garbage
characters
[ https://issues.apache.org/jira/browse/PDFBOX-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-1680:
------------------------------------
Attachment: steveDoc.pdf
> PDFTextStripper returns garbage characters
> ------------------------------------------
>
> Key: PDFBOX-1680
> URL: https://issues.apache.org/jira/browse/PDFBOX-1680
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Environment: XP
> Reporter: Tilman Hausherr
> Attachments: steveDoc.pdf
>
>
> This code
> PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null);
> PDFTextStripper pdfTextStripper = new PDFTextStripper("UTF-8");
> pdfTextStripper.setStartPage(1);
> pdfTextStripper.setEndPage(999);
> System.out.println(pdfTextStripper.getText(document));
> returns this text when used with the file mentioned in PDFBOX-1436 :
> ===
> Downloads Stack
> Welcome to Mac OS X Snow Leopard.
> The Dock in Snow Leopard
> includes Stacks, which you
> can use to quickly access
> MYLX\LU[S`\ZLKÄSLZHUK
> applications right from
> the Dock.
> Stacks are simple to create. Just drag any folder to
> the right side of the Dock and it becomes a stack.
> Click a stack and it springs from the Dock in either
> HMHUVYHNYPK;VVWLUHÄSLPUHZ[HJRJSPJR[OL
> ÄSLVUJL
> Mac OS X Snow Leopard includes three premade
> stacks called Documents, Downloads, and Applications
> @V\VWLULK[OPZÄSLMYVT[OL+V^USVHKZZ[HJR
> The Downloads stack captures all of your Internet
> downloads and puts them in one convenient location.
> Files you download in Safari, Mail, and iChat go
> YPNO[PU[V[OL+V^USVHKZZ[HJR>OLUHÄSLÄUPZOLZ
> KV^USVHKPUN[OLZ[HJRUV[PÄLZ`V\I`IV\UJPUNHUK
> W\[Z[OLUL^ÄSLYPNO[VU[VWZVP[»ZLHZ`[VÄUK
> Stacks automatically display their contents in a fan or a
> grid based on the number of items in the stack. You
> can also view the stack as a list. If you prefer one style
> over the other, you can set the stack to always open in
> that style.
> :[HJRZPU[LSSPNLU[S`ZOV^[OLTVZ[YLSL]HU[P[LTZÄYZ[
> or you can set the sort order so that the items you care
> about most always appear at the top of the stack. To
> customize a stack, position the pointer over the stack
> icon and hold down the mouse button until a menu
> appears. Choose the settings you want from the menu.
> ;VYLTV]LHÄSLMYVT
> a stack, just open
> the stack and drag the
> item out to where you
> ^HU[P[;VKLSL[LHÄSL
> move it to the Trash.
> 0UMHJ[^OLU`V\»YL
> done reading this
> document, feel free
> to throw it out.
> Documents Downloads Applications
> TM and © 2009 Apple Inc. All rights reserved.
> ===
> The garbage characters are the same that were solved by the change in PDFBOX-490, so its probably a similar cause.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira