You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Steve Antoch <SA...@Yuzu.com> on 2015/02/19 01:14:48 UTC

COSParser: re-entering getLength() via parseObjectDynamically()

Hi-

I have a question regarding the limitation on entering getLength() for a second time.

I understand that it is possible to create a malicious pdf which which essentially goes into an infinite loop by having it parse nested streams that refer to each other.  I do not believe this to be the case with these files (they are from well-known corporate book publishers).

Obviously, pdfbox prohibits this nesting behavior by passing Boolean flags around and setting the inGetLength flag when it first enters then clearing it upon exit.

I have a several pdfs which open fine in Acrobat and Google Chrome (which is based on the pdfium engine), yet when I try to open them using pdfbox they throw the "Object must be defined and must not be compressed object"  error. 

By observation, it seems to me that pdfium seems to get around this issue by keeping a counter of recursion depth (they use 64 max) and allowing essentially a short-depth nesting in this way, but throwing an exception if the nesting gets too deep.

I have forked pdfbox up on Github and made those minor changes.

This seems to allow me to open the few  that I'd like for you to take a look at and comment on it if you would.

https://github.com/santoch/pdfbox/pull/1

Please let me know what you think-
Thanks-
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: COSParser: re-entering getLength() via parseObjectDynamically()

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.

Hi Steve,

would you have some samples of such PDFs to test with?

BR
Maruan

Am 19.02.2015 um 01:14 schrieb Steve Antoch <SA...@Yuzu.com>:

> Hi-
> 
> I have a question regarding the limitation on entering getLength() for a second time.
> 
> I understand that it is possible to create a malicious pdf which which essentially goes into an infinite loop by having it parse nested streams that refer to each other.  I do not believe this to be the case with these files (they are from well-known corporate book publishers).
> 
> Obviously, pdfbox prohibits this nesting behavior by passing Boolean flags around and setting the inGetLength flag when it first enters then clearing it upon exit.
> 
> I have a several pdfs which open fine in Acrobat and Google Chrome (which is based on the pdfium engine), yet when I try to open them using pdfbox they throw the "Object must be defined and must not be compressed object"  error. 
> 
> By observation, it seems to me that pdfium seems to get around this issue by keeping a counter of recursion depth (they use 64 max) and allowing essentially a short-depth nesting in this way, but throwing an exception if the nesting gets too deep.
> 
> I have forked pdfbox up on Github and made those minor changes.
> 
> This seems to allow me to open the few  that I'd like for you to take a look at and comment on it if you would.
> 
> https://github.com/santoch/pdfbox/pull/1
> 
> Please let me know what you think-
> Thanks-
> Steve
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>