You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Harper, Brad" <Br...@fiserv.com> on 2011/08/24 00:01:40 UTC

InputStream Being Closed by PDFParser

Hello:

Is there a way to load/parse input and retrieve a PDDocument *without*
having the input stream closed "automatically"? Or alternately, is there
a way to hook into parser's processing of the end-of-document *before*
the stream is closed.

I need to process a 'print stream', which is a set of valid PDF
documents concatenated into a single large file. 

I'd like to record the byte offsets into the large file for each of the
'sub-files' ... and was hoping to get this info in a single pass using
the position reported by the input stream's channel, but now I find that
the PDFParser closes its input stream when finished. 

I don't see a way to get visibility on a [still-opened] stream, unless I
sub-class PDFParser or write one-off code to scan the input file for
beginning-/ending-of-document markers during a separate pass. 

Any thoughts?

Regards,
Brad Harper

Re: InputStream Being Closed by PDFParser

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Aug 24, 2011 at 12:01 AM, Harper, Brad <Br...@fiserv.com> wrote:
> Is there a way to load/parse input and retrieve a PDDocument *without*
> having the input stream closed "automatically"?

One solution would be to use the CloseShieldInputStream decorator
class [1] from Commons IO.

[1] http://commons.apache.org/io/api-release/org/apache/commons/io/input/CloseShieldInputStream.html

BR,

Jukka Zitting