You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2013/08/06 11:34:49 UTC

[jira] [Updated] (PDFBOX-1680) PDFTextStripper returns garbage characters

     [ https://issues.apache.org/jira/browse/PDFBOX-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-1680:
------------------------------------

    Attachment: steveDoc.pdf
    
> PDFTextStripper returns garbage characters
> ------------------------------------------
>
>                 Key: PDFBOX-1680
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1680
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>         Environment: XP
>            Reporter: Tilman Hausherr
>         Attachments: steveDoc.pdf
>
>
> This code
>     PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null);
>     PDFTextStripper pdfTextStripper = new PDFTextStripper("UTF-8");
>     pdfTextStripper.setStartPage(1);
>     pdfTextStripper.setEndPage(999);
>     System.out.println(pdfTextStripper.getText(document));
> returns this text when used with the file mentioned in PDFBOX-1436 :
> ===
> Downloads Stack
> Welcome to Mac OS X Snow Leopard.
> The Dock in Snow Leopard 
> includes Stacks, which you 
> can use to quickly access 
> MYLX\LU[S`\ZLKÄSLZHUK
> applications right from  
> the Dock. 
> Stacks are simple to create. Just drag any folder to  
> the right side of the Dock and it becomes a stack.  
> Click a stack and it springs from the Dock in either  
> HMHUVYHNYPK;VVWLUHÄSLPUHZ[HJRJSPJR[OL 
> ÄSLVUJL
> Mac OS X Snow Leopard includes three premade 
> stacks called Documents, Downloads, and Applications 
> @V\VWLULK[OPZÄSLMYVT[OL+V^USVHKZZ[HJR
> The Downloads stack captures all of your Internet 
> downloads and puts them in one convenient location. 
> Files you download in Safari, Mail, and iChat go 
> YPNO[PU[V[OL+V^USVHKZZ[HJR>OLUHÄSLÄUPZOLZ
> KV^USVHKPUN[OLZ[HJRUV[PÄLZ`V\I`IV\UJPUNHUK
> W\[Z[OLUL^ÄSLYPNO[VU[VWZVP[»ZLHZ`[VÄUK
> Stacks automatically display their contents in a fan or a 
> grid based on the number of items in the stack. You 
> can also view the stack as a list. If you prefer one style 
> over the other, you can set the stack to always open in 
> that style.
> :[HJRZPU[LSSPNLU[S`ZOV^[OLTVZ[YLSL]HU[P[LTZÄYZ[
> or you can set the sort order so that the items you care 
> about most always appear at the top of the stack. To 
> customize a stack, position the pointer over the stack 
> icon and hold down the mouse button until a menu 
> appears. Choose the settings you want from the menu.
> ;VYLTV]LHÄSLMYVT 
> a stack, just open  
> the stack and drag the 
> item out to where you 
> ^HU[P[;VKLSL[LHÄSL
> move it to the Trash.  
> 0UMHJ[^OLU`V\»YL
> done reading this 
> document, feel free  
> to throw it out.
> Documents Downloads Applications
> TM and © 2009 Apple Inc. All rights reserved.
> ===
> The garbage characters are the same that were solved by the change in PDFBOX-490, so its probably a similar cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira