You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by jl...@gi-bon.sk on 2012/08/16 16:11:08 UTC
problems with pdf parsing
hi,
i'm trying to load some sample pdf documents but only 1 of 4 is parsed by
pdfbox without exception.
adobe reader opens all those pdf documents without any sign of problems.
public static void main(String[] args) throws Exception {
InputStream ins=TestGetTexts.class.getResourceAsStream(
"/034352.pdf"); // sample document
PDFParser parser=new PDFParser(ins);
parser.parse();
COSDocument cosDoc=parser.getDocument();
PDDocument pdDoc = new PDDocument(cosDoc);
}
it throws exceptions at line "parser.parse();"
what is wrong with that?
16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 252 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 34 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
Exception in thread "main" java.io.IOException
at org.apache.pdfbox.filter.FlateFilter.decode(
FlateFilter.java:138)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
COSStream.java:156)
at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
PDFXrefStreamParser.java:61)
at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
PDFParser.java:846)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:574)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at test.TestGetTexts.main(TestGetTexts.java:20)
Caused by: java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:238)
at java.util.zip.Inflater.inflate(Inflater.java:256)
at org.apache.pdfbox.filter.FlateFilter.decompress(
FlateFilter.java:169)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
)
... 8 more
the other pdf:
16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 4192 is wrong. Fall back to reading
stream until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 576 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 432 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 304 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 480 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 176 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 2096 is wrong. Fall back to reading
stream until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 137440 is wrong. Fall back to reading
stream until 'endstream'.
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
: Could not push back 137440 bytes in order to reparse stream. Try
increasing push back buffer using system property
org.apache.pdfbox.baseParser.pushBackSize
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:546)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at test.TestGetTexts.main(TestGetTexts.java:20)
Caused by: java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:144)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:133)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:542)
... 3 more
or:
16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 8 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 77788 is wrong. Fall back to reading
stream until 'endstream'.
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
: Could not push back 77788 bytes in order to reparse stream. Try
increasing push back buffer using system property
org.apache.pdfbox.baseParser.pushBackSize
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:546)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at test.TestGetTexts.main(TestGetTexts.java:21)
Caused by: java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:144)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:133)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:542)
... 3 more
best regards
Juraj Lonc
Re: problems with pdf parsing
Posted by jl...@gi-bon.sk.
hi,
thanks for reply.
your advice helped.
Best regards
Juraj Lonc
GI-BÓN, spol. s r.o.
Management Systems
Bratislavská 11
SK - 010 01 Žilina
Tel: +421-41-564 3437-8
Mobil: +421-907-815 147
Fax: +421-41-564 3439
e-mail: jlonc@gi-bon.sk
homepage: http://www.gi-bon.sk
From: Andreas Lehmkuehler <an...@lehmi.de>
To: users@pdfbox.apache.org,
Date: 23. 08. 2012 18:21
Subject: Re: problems with pdf parsing
Hi,
Am 16.08.2012 16:11, schrieb jlonc@gi-bon.sk:
> hi,
>
> i'm trying to load some sample pdf documents but only 1 of 4 is parsed
by
> pdfbox without exception.
> adobe reader opens all those pdf documents without any sign of problems.
>
>
> public static void main(String[] args) throws Exception {
> InputStream ins=TestGetTexts.class.getResourceAsStream(
> "/034352.pdf"); // sample document
>
> PDFParser parser=new PDFParser(ins);
> parser.parse();
> COSDocument cosDoc=parser.getDocument();
> PDDocument pdDoc = new PDDocument(cosDoc);
>
> }
First of all, you should use one of the static load-methods provided by
PDDocument.
InputStream
ins=TestGetTexts.class.getResourceAsStream("/034352.pdf");
PDDocument pdDoc = PDDocument.load(ins);
> it throws exceptions at line "parser.parse();"
> what is wrong with that?
Hard to say without having a hand on one of these pdfs. Did you ever try
the new
non-sequential parser (use loadNonSeq instead of load )?
> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 252 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 34 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> Exception in thread "main" java.io.IOException
> at org.apache.pdfbox.filter.FlateFilter.decode(
> FlateFilter.java:138)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
> at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
> COSStream.java:156)
> at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
> PDFXrefStreamParser.java:61)
> at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
> PDFParser.java:846)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:574)
> at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.util.zip.DataFormatException: incorrect header check
> at java.util.zip.Inflater.inflateBytes(Native Method)
> at java.util.zip.Inflater.inflate(Inflater.java:238)
> at java.util.zip.Inflater.inflate(Inflater.java:256)
> at org.apache.pdfbox.filter.FlateFilter.decompress(
> FlateFilter.java:169)
> at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
> )
> ... 8 more
>
>
> the other pdf:
>
> 16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 4192 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 576 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 432 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 304 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 480 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 176 is wrong. Fall back to reading
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 2096 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 137440 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main"
org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 137440 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
> at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.io.IOException: Push back buffer is full
> at
java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
> ... 3 more
>
>
>
> or:
>
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 8 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 77788 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main"
org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 77788 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
> at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> at test.TestGetTexts.main(TestGetTexts.java:21)
> Caused by: java.io.IOException: Push back buffer is full
> at
java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
> ... 3 more
>
> best regards
> Juraj Lonc
BR
Andreas Lehmkühler
Re: problems with pdf parsing
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 16.08.2012 16:11, schrieb jlonc@gi-bon.sk:
> hi,
>
> i'm trying to load some sample pdf documents but only 1 of 4 is parsed by
> pdfbox without exception.
> adobe reader opens all those pdf documents without any sign of problems.
>
>
> public static void main(String[] args) throws Exception {
> InputStream ins=TestGetTexts.class.getResourceAsStream(
> "/034352.pdf"); // sample document
>
> PDFParser parser=new PDFParser(ins);
> parser.parse();
> COSDocument cosDoc=parser.getDocument();
> PDDocument pdDoc = new PDDocument(cosDoc);
>
> }
First of all, you should use one of the static load-methods provided by PDDocument.
InputStream ins=TestGetTexts.class.getResourceAsStream("/034352.pdf");
PDDocument pdDoc = PDDocument.load(ins);
> it throws exceptions at line "parser.parse();"
> what is wrong with that?
Hard to say without having a hand on one of these pdfs. Did you ever try the new
non-sequential parser (use loadNonSeq instead of load )?
> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 252 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 34 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> Exception in thread "main" java.io.IOException
> at org.apache.pdfbox.filter.FlateFilter.decode(
> FlateFilter.java:138)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
> at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
> COSStream.java:156)
> at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
> PDFXrefStreamParser.java:61)
> at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
> PDFParser.java:846)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:574)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.util.zip.DataFormatException: incorrect header check
> at java.util.zip.Inflater.inflateBytes(Native Method)
> at java.util.zip.Inflater.inflate(Inflater.java:238)
> at java.util.zip.Inflater.inflate(Inflater.java:256)
> at org.apache.pdfbox.filter.FlateFilter.decompress(
> FlateFilter.java:169)
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
> )
> ... 8 more
>
>
> the other pdf:
>
> 16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 4192 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 576 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 432 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 304 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 480 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 176 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 2096 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 137440 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 137440 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.io.IOException: Push back buffer is full
> at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
> ... 3 more
>
>
>
> or:
>
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 8 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 77788 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 77788 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> at test.TestGetTexts.main(TestGetTexts.java:21)
> Caused by: java.io.IOException: Push back buffer is full
> at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
> at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
> ... 3 more
>
> best regards
> Juraj Lonc
BR
Andreas Lehmkühler