You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by jl...@gi-bon.sk on 2012/08/16 16:11:08 UTC

problems with pdf parsing

hi,

i'm trying to load some sample pdf documents but only 1 of 4 is parsed by 
pdfbox without exception.
adobe reader opens all those pdf documents without any sign of problems.


public static void main(String[] args) throws Exception {
                InputStream ins=TestGetTexts.class.getResourceAsStream(
"/034352.pdf");  // sample document
 
                PDFParser parser=new PDFParser(ins);
                parser.parse();
                COSDocument cosDoc=parser.getDocument();
                PDDocument pdDoc = new PDDocument(cosDoc);
 
}


it throws exceptions at line "parser.parse();"
what is wrong with that?


16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 252 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 34 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
Exception in thread "main" java.io.IOException
        at org.apache.pdfbox.filter.FlateFilter.decode(
FlateFilter.java:138)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
        at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
COSStream.java:156)
        at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
PDFXrefStreamParser.java:61)
        at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
PDFParser.java:846)
        at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:574)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
        at test.TestGetTexts.main(TestGetTexts.java:20)
Caused by: java.util.zip.DataFormatException: incorrect header check
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:238)
        at java.util.zip.Inflater.inflate(Inflater.java:256)
        at org.apache.pdfbox.filter.FlateFilter.decompress(
FlateFilter.java:169)
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
)
        ... 8 more


the other pdf:

16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 4192 is wrong. Fall back to reading 
stream until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 576 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 432 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 304 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 480 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 176 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 2096 is wrong. Fall back to reading 
stream until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 137440 is wrong. Fall back to reading 
stream until 'endstream'.
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
: Could not push back 137440 bytes in order to reparse stream. Try 
increasing push back buffer using system property 
org.apache.pdfbox.baseParser.pushBackSize
        at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:546)
        at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:566)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
        at test.TestGetTexts.main(TestGetTexts.java:20)
Caused by: java.io.IOException: Push back buffer is full
        at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
)
        at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:144)
        at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:133)
        at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:542)
        ... 3 more



or:

16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 8 is wrong. Fall back to reading stream 
until 'endstream'.
16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 77788 is wrong. Fall back to reading 
stream until 'endstream'.
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
: Could not push back 77788 bytes in order to reparse stream. Try 
increasing push back buffer using system property 
org.apache.pdfbox.baseParser.pushBackSize
        at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:546)
        at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:566)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
        at test.TestGetTexts.main(TestGetTexts.java:21)
Caused by: java.io.IOException: Push back buffer is full
        at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
)
        at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:144)
        at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:133)
        at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:542)
        ... 3 more





best regards
Juraj Lonc

Re: problems with pdf parsing

Posted by jl...@gi-bon.sk.
hi, 
thanks for reply.

your advice helped.



Best regards
Juraj Lonc


GI-BÓN, spol. s r.o.
Management Systems

Bratislavská 11
SK - 010 01 Žilina
Tel: +421-41-564 3437-8
Mobil: +421-907-815 147
Fax: +421-41-564 3439
e-mail: jlonc@gi-bon.sk
homepage: http://www.gi-bon.sk 





From:   Andreas Lehmkuehler <an...@lehmi.de>
To:     users@pdfbox.apache.org, 
Date:   23. 08. 2012 18:21
Subject:        Re: problems with pdf parsing



Hi,

Am 16.08.2012 16:11, schrieb jlonc@gi-bon.sk:
> hi,
>
> i'm trying to load some sample pdf documents but only 1 of 4 is parsed 
by
> pdfbox without exception.
> adobe reader opens all those pdf documents without any sign of problems.
>
>
> public static void main(String[] args) throws Exception {
>                  InputStream ins=TestGetTexts.class.getResourceAsStream(
> "/034352.pdf");  // sample document
>
>                  PDFParser parser=new PDFParser(ins);
>                  parser.parse();
>                  COSDocument cosDoc=parser.getDocument();
>                  PDDocument pdDoc = new PDDocument(cosDoc);
>
> }
First of all, you should use one of the static load-methods provided by 
PDDocument.

                 InputStream 
ins=TestGetTexts.class.getResourceAsStream("/034352.pdf");
                 PDDocument pdDoc = PDDocument.load(ins);


> it throws exceptions at line "parser.parse();"
> what is wrong with that?
Hard to say without having a hand on one of these pdfs. Did you ever try 
the new 
non-sequential parser (use loadNonSeq instead of load )?

> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 252 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 34 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> Exception in thread "main" java.io.IOException
>          at org.apache.pdfbox.filter.FlateFilter.decode(
> FlateFilter.java:138)
>          at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
>          at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
>          at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
> COSStream.java:156)
>          at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
> PDFXrefStreamParser.java:61)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
> PDFParser.java:846)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:574)
>          at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>          at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.util.zip.DataFormatException: incorrect header check
>          at java.util.zip.Inflater.inflateBytes(Native Method)
>          at java.util.zip.Inflater.inflate(Inflater.java:238)
>          at java.util.zip.Inflater.inflate(Inflater.java:256)
>          at org.apache.pdfbox.filter.FlateFilter.decompress(
> FlateFilter.java:169)
>          at 
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
> )
>          ... 8 more
>
>
> the other pdf:
>
> 16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 4192 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 576 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 432 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 304 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 480 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 176 is wrong. Fall back to reading 
stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 2096 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 137440 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main" 
org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 137440 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
>          at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>          at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.io.IOException: Push back buffer is full
>          at 
java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
>          ... 3 more
>
>
>
> or:
>
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 8 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 77788 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main" 
org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 77788 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
>          at 
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>          at test.TestGetTexts.main(TestGetTexts.java:21)
> Caused by: java.io.IOException: Push back buffer is full
>          at 
java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
>          ... 3 more
>
> best regards
> Juraj Lonc


BR
Andreas Lehmkühler



Re: problems with pdf parsing

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 16.08.2012 16:11, schrieb jlonc@gi-bon.sk:
> hi,
>
> i'm trying to load some sample pdf documents but only 1 of 4 is parsed by
> pdfbox without exception.
> adobe reader opens all those pdf documents without any sign of problems.
>
>
> public static void main(String[] args) throws Exception {
>                  InputStream ins=TestGetTexts.class.getResourceAsStream(
> "/034352.pdf");  // sample document
>
>                  PDFParser parser=new PDFParser(ins);
>                  parser.parse();
>                  COSDocument cosDoc=parser.getDocument();
>                  PDDocument pdDoc = new PDDocument(cosDoc);
>
> }
First of all, you should use one of the static load-methods provided by PDDocument.

	InputStream ins=TestGetTexts.class.getResourceAsStream("/034352.pdf");
	PDDocument pdDoc = PDDocument.load(ins);


> it throws exceptions at line "parser.parse();"
> what is wrong with that?
Hard to say without having a hand on one of these pdfs. Did you ever try the new 
non-sequential parser (use loadNonSeq instead of load )?

> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 252 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 34 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a
> DataFormatException
> Exception in thread "main" java.io.IOException
>          at org.apache.pdfbox.filter.FlateFilter.decode(
> FlateFilter.java:138)
>          at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
>          at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
>          at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
> COSStream.java:156)
>          at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
> PDFXrefStreamParser.java:61)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
> PDFParser.java:846)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:574)
>          at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>          at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.util.zip.DataFormatException: incorrect header check
>          at java.util.zip.Inflater.inflateBytes(Native Method)
>          at java.util.zip.Inflater.inflate(Inflater.java:238)
>          at java.util.zip.Inflater.inflate(Inflater.java:256)
>          at org.apache.pdfbox.filter.FlateFilter.decompress(
> FlateFilter.java:169)
>          at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
> )
>          ... 8 more
>
>
> the other pdf:
>
> 16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 4192 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 576 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 432 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 304 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 480 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 176 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 2096 is wrong. Fall back to reading
> stream until 'endstream'.
> 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 137440 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 137440 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
>          at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>          at test.TestGetTexts.main(TestGetTexts.java:20)
> Caused by: java.io.IOException: Push back buffer is full
>          at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
>          ... 3 more
>
>
>
> or:
>
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 8 is wrong. Fall back to reading stream
> until 'endstream'.
> 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> WARNING: Specified stream length 77788 is wrong. Fall back to reading
> stream until 'endstream'.
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
> : Could not push back 77788 bytes in order to reparse stream. Try
> increasing push back buffer using system property
> org.apache.pdfbox.baseParser.pushBackSize
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:546)
>          at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
> PDFParser.java:566)
>          at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>          at test.TestGetTexts.main(TestGetTexts.java:21)
> Caused by: java.io.IOException: Push back buffer is full
>          at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
> )
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:144)
>          at org.apache.pdfbox.io.PushBackInputStream.unread(
> PushBackInputStream.java:133)
>          at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
> BaseParser.java:542)
>          ... 3 more
>
> best regards
> Juraj Lonc


BR
Andreas Lehmkühler