You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Marius Melzer <ma...@rasumi.net> on 2015/08/18 16:08:17 UTC

Exception on reading PDF

Hi everyone,

on the website was written to consult the mailing list before opening a
bug report. I encountered an error reading PDF with PDFBox 1.8.10 and
the following Code:

PDDocument document = PDDocument.load(new File("test.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
stripper.setLineSeparator("\n");
StringWriter result = new StringWriter();
stripper.writeText(document, result);
System.out.println(result.toString());

The error does not occur with two other (simpler) pdfs. Here's the error
output:

     [java] Aug 18, 2015 1:48:37 PM
org.apache.pdfbox.pdfparser.PDFParser parse
     [java] INFORMATION: Document is encrypted
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
     [java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
     [java] Exception in thread "main" java.io.IOException
     [java] 	at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109)
     [java] 	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
     [java] 	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
     [java] 	at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
     [java] 	at
org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
     [java] 	at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
     [java] 	at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
     [java] 	at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
     [java] 	at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
     [java] 	at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:458)
     [java] 	at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:383)
     [java] 	at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:342)
     [java] 	at myi.Main.main(Unknown Source)
     [java] Caused by: java.util.zip.DataFormatException: incorrect
header check
     [java] 	at java.util.zip.Inflater.inflateBytes(Native Method)
     [java] 	at java.util.zip.Inflater.inflate(Inflater.java:259)
     [java] 	at java.util.zip.Inflater.inflate(Inflater.java:280)
     [java] 	at
org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
     [java] 	at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
     [java] 	... 12 more
     [java] Java Result: 1

Is the error in correlation with the "encrypted document" information?
What means encrypted? With my pdf reader (Evince) there are no problems
reading the pdf and a password is not needed.

Unfortunately, I can't provide the pdf itself because it's a bill and
contains personal information. But if there's another way to isolate the
problem, please let me know.

Thanks for your help,
Marius

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Exception on reading PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 18.08.2015 um 16:08 schrieb Marius Melzer:
> PDDocument document = PDDocument.load(new File("test.pdf"));

|if(document.isEncrypted()){DecryptionMaterialdecryptionMaterial=newStandardDecryptionMaterial(""); 
|
||document.openProtection(decryptionMaterial); | } |



> PDFTextStripper stripper = new PDFTextStripper();