You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/12/02 23:38:11 UTC
[jira] Resolved: (PDFBOX-354) ClassCastException in FlateFilter

     [ https://issues.apache.org/jira/browse/PDFBOX-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved PDFBOX-354.
----------------------------------

    Resolution: Cannot Reproduce

The referenced document can no longer be found so I can't reproduce this.

> ClassCastException in FlateFilter
> ---------------------------------
>
>                 Key: PDFBOX-354
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-354
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: Jukka Zitting
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1965266&group_id=78314&atid=552832
> Hi, I'm trying to extract text from a pdf which can be found at
> http://www.vattenfall.com/www/vf_com/vf_com/Gemeinsame_Inhalte/DOCUMENT/360
> 168vatt/5965811xou/643131powe/892253fors/P0288421.pdf
> and am getting a ClassCastException from within PDFBox. The full stacktrace
> is:
> java.lang.ClassCastException: org.pdfbox.cos.COSArray
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:70)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:243)
> at
> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> at
> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray
> .java:200)
> at
> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
> at
> org.pdfbox.pdmodel.common.COSStreamArray.getStreamTokens(COSStreamArray.jav
> a:141)
> at
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202)
> at
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
> at
> org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
> at
> org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
> at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
> at
> 011763f3ecc678d2org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.jav
> a:149)
> at [my code].
> The code leading up to the call to extract text is as follows:
> org.pdfbox.pdfparser.PDFParser parser = new
> org.pdfbox.pdfparser.PDFParser(myInputStream);
> parser.parse();
> pdDocument = parser.getPDDocument();
> setContent(new PDFTextStripper().getText(pdDocument));
> I hope the formatting is ok! Have you encountered this error before and can
> you suggest any causes or solutions?
> Thanks,
> Ben Kirby
> kirby.bm@gmail.com
> [Comment on SourceForge]
> Date: 2008-05-22 20:02
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Ben,
> I just ran the PDF through our text extraction test -- without error.
> I'm not sure what to say about the error you've encountered.
> I do find that since version 0.73 came out Ben has updated the FlateFilter
> source code.  Have you tried the latest code or just the 0.73 build?
> [Comment on SourceForge]
> Date: 2008-05-29 10:11
> Sender: nobody
> Logged In: NO 
> Hi Daniel, sorry for the delay in responding. I've only tried the 0.7.3
> build - I'll grab the latest code build now, and let you kow how I get
> on...
> Thanks,
> Ben
> [Comment on SourceForge]
> Date: 2008-05-29 10:30
> Sender: nobody
> Logged In: NO 
> Hi again, sorry, but your 13/05 and 14/05 nightly build jar and zips seem
> to be corrupt. Maven can't use the maven jars, and WinRAR throws an error
> while opening all of them, zips and jars.
> Am I doing something wrong, or are they corrupt?
> Thanks,
> Ben
> [Comment on SourceForge]
> Date: 2008-07-31 11:52
> Sender: bmk06
> Logged In: YES 
> user_id=1683216
> Originator: YES
> Hi again. I've just checked back for a response, and as there isn't any,
> have gone to try the nightly builds again. However there don't seem to be
> any! I'm trying http://www.pdfbox.org/dist - have they moved? I really want
> to get this fixed in our build, so please could let me know what's going
> on.
> Thanks,
> Ben
> 							
> [Comment on SourceForge]
> Date: 2008-07-31 12:05
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> In the near future, http://incubator.apache.org/projects/pdfbox.html will
> be the place to look for PDFBox stuff.
> I don't see nightly builds ... I'll see if I can find out about those.
> [Comment on SourceForge]
> Date: 2008-07-31 13:12
> Sender: bmk06
> Logged In: YES 
> user_id=1683216
> Originator: YES
> Thanks Daniel - I came across the new site myself, but, you're right, I
> couldn't see any nightly builds. If you could let me know what the plan is,
> that'd be great, otherwise I'll check back next week.
> Thanks for your help,
> Ben

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.