You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2013/08/06 11:34:47 UTC
[jira] [Created] (PDFBOX-1680) PDFTextStripper returns garbage
characters
Tilman Hausherr created PDFBOX-1680:
---------------------------------------
Summary: PDFTextStripper returns garbage characters
Key: PDFBOX-1680
URL: https://issues.apache.org/jira/browse/PDFBOX-1680
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.0
Environment: XP
Reporter: Tilman Hausherr
Attachments: steveDoc.pdf
This code
PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null);
PDFTextStripper pdfTextStripper = new PDFTextStripper("UTF-8");
pdfTextStripper.setStartPage(1);
pdfTextStripper.setEndPage(999);
System.out.println(pdfTextStripper.getText(document));
returns this text when used with the file mentioned in PDFBOX-1436 :
===
Downloads Stack
Welcome to Mac OS X Snow Leopard.
The Dock in Snow Leopard
includes Stacks, which you
can use to quickly access
MYLX\LU[S`\ZLKÄSLZHUK
applications right from
the Dock.
Stacks are simple to create. Just drag any folder to
the right side of the Dock and it becomes a stack.
Click a stack and it springs from the Dock in either
HMHUVYHNYPK;VVWLUHÄSLPUHZ[HJRJSPJR[OL
ÄSLVUJL
Mac OS X Snow Leopard includes three premade
stacks called Documents, Downloads, and Applications
@V\VWLULK[OPZÄSLMYVT[OL+V^USVHKZZ[HJR
The Downloads stack captures all of your Internet
downloads and puts them in one convenient location.
Files you download in Safari, Mail, and iChat go
YPNO[PU[V[OL+V^USVHKZZ[HJR>OLUHÄSLÄUPZOLZ
KV^USVHKPUN[OLZ[HJRUV[PÄLZ`V\I`IV\UJPUNHUK
W\[Z[OLUL^ÄSLYPNO[VU[VWZVP[»ZLHZ`[VÄUK
Stacks automatically display their contents in a fan or a
grid based on the number of items in the stack. You
can also view the stack as a list. If you prefer one style
over the other, you can set the stack to always open in
that style.
:[HJRZPU[LSSPNLU[S`ZOV^[OLTVZ[YLSL]HU[P[LTZÄYZ[
or you can set the sort order so that the items you care
about most always appear at the top of the stack. To
customize a stack, position the pointer over the stack
icon and hold down the mouse button until a menu
appears. Choose the settings you want from the menu.
;VYLTV]LHÄSLMYVT
a stack, just open
the stack and drag the
item out to where you
^HU[P[;VKLSL[LHÄSL
move it to the Trash.
0UMHJ[^OLU`V\»YL
done reading this
document, feel free
to throw it out.
Documents Downloads Applications
TM and © 2009 Apple Inc. All rights reserved.
===
The garbage characters are the same that were solved by the change in PDFBOX-490, so its probably a similar cause.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira