You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Marius Melzer <ma...@rasumi.net> on 2015/08/18 16:08:17 UTC
Exception on reading PDF
Hi everyone,
on the website was written to consult the mailing list before opening a
bug report. I encountered an error reading PDF with PDFBox 1.8.10 and
the following Code:
PDDocument document = PDDocument.load(new File("test.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
stripper.setLineSeparator("\n");
StringWriter result = new StringWriter();
stripper.writeText(document, result);
System.out.println(result.toString());
The error does not occur with two other (simpler) pdfs. Here's the error
output:
[java] Aug 18, 2015 1:48:37 PM
org.apache.pdfbox.pdfparser.PDFParser parse
[java] INFORMATION: Document is encrypted
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Aug 18, 2015 1:48:37 PM org.apache.pdfbox.filter.FlateFilter
decode
[java] SCHWERWIEGEND: FlateFilter: stop reading corrupt stream due
to a DataFormatException
[java] Exception in thread "main" java.io.IOException
[java] at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109)
[java] at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
[java] at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
[java] at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
[java] at
org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
[java] at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
[java] at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
[java] at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
[java] at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
[java] at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:458)
[java] at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:383)
[java] at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:342)
[java] at myi.Main.main(Unknown Source)
[java] Caused by: java.util.zip.DataFormatException: incorrect
header check
[java] at java.util.zip.Inflater.inflateBytes(Native Method)
[java] at java.util.zip.Inflater.inflate(Inflater.java:259)
[java] at java.util.zip.Inflater.inflate(Inflater.java:280)
[java] at
org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
[java] at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
[java] ... 12 more
[java] Java Result: 1
Is the error in correlation with the "encrypted document" information?
What means encrypted? With my pdf reader (Evince) there are no problems
reading the pdf and a password is not needed.
Unfortunately, I can't provide the pdf itself because it's a bill and
contains personal information. But if there's another way to isolate the
problem, please let me know.
Thanks for your help,
Marius
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Exception on reading PDF
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 18.08.2015 um 16:08 schrieb Marius Melzer:
> PDDocument document = PDDocument.load(new File("test.pdf"));
|if(document.isEncrypted()){DecryptionMaterialdecryptionMaterial=newStandardDecryptionMaterial("");
|
||document.openProtection(decryptionMaterial); | } |
> PDFTextStripper stripper = new PDFTextStripper();