You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Maxime GUERREIRO <ma...@gmail.com> on 2016/12/26 11:23:05 UTC

PDDocument.load, password and InputStream

Hello everyone,

I am using PDFBox 2.0.4 and I couldn't find the best way to do it.
I hope you can help me :-)

I have a simple application that accepts PDF files (among other formats),
and I currently pass it along (as argument to methods) as a File. In order
to cleanup my code, I wanted to pass it as an InputStream instead... but I
miss one feature in PDFBox 2.0 detecting if the file is password protected.
I think it was possible using the 1.8 version, as we instantiated the doc and
*then* we'd give it the password.

Sample code:
```
try (PDDocument ignore = PDDocument.load(inFile)) {
    return null; // not password protected
} catch (InvalidPasswordException ignore) {
}
```

The issue is that when I try to reuse inFile, it tells me it has been closed.
When wrapping it in a CloseShieldInputStream, the whole file is read... leading
to an unwanted memory consumption and/or a temporary file.

This was not an issue with File-s because I could re-recreate an InputStream
when I needed to.


~> Is there a way to check if a file is password-protected *and* if the given
password is the right one, without PDFBox reading it as a whole?

Thanks,
Maxime

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDDocument.load, password and InputStream

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 26.12.2016 um 14:03 schrieb Maxime GUERREIRO:
> Assuming "isEncrypted" throws the exception, do I need to create
> another PDDocument instance?

The exception is thrown by load(), not by isEncrypted(), if the password 
is incorrect.

Yes, you'd need to recall load() with the correct password.

If it is encrypted and the empty password was correct then you're ready 
and don't have to call again.

sample code in PDFDebugger:

     private void parseDocument( File file, String password )throws 
IOException
     {
         while (true)
         {
             try
             {
                 document = PDDocument.load(file, password);
             }
             catch (InvalidPasswordException ipe)
             {
                 // 
https://stackoverflow.com/questions/8881213/joptionpane-to-get-password
                 JPanel panel = new JPanel();
                 JLabel label = new JLabel("Password:");
                 JPasswordField pass = new JPasswordField(10);
                 panel.add(label);
                 panel.add(pass);
                 String[] options = new String[] {"OK", "Cancel"};
                 int option = JOptionPane.showOptionDialog(null, panel, 
"Enter password",
                          JOptionPane.NO_OPTION, JOptionPane.PLAIN_MESSAGE,
                          null, options, "");
                 if (option == 0)
                 {
                     password = new String(pass.getPassword());
                     continue;
                 }
                 throw ipe;
             }
             break;
         }

> If so, is there a way for me to prevent the memory/tempfile
> consumption caused by the first (useless) instance?

No.

Tilman

>
> Thanks,
> Maxime
>
> PS/ Thanks for the empty password tip, I was not aware of this



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDDocument.load, password and InputStream

Posted by Maxime GUERREIRO <ma...@me.com>.
Thanks for your answer,

> ... which is not a good idea as it will make everything slower. Because PDFBox makes an internal copy to allow random access.
I noticed this, and I'm asking for an alternative solution ;)

> That code has a memory leak as "ignore" isn't closed
It doesn't, it uses Java7's try-with-resources that closes it once the
try-catch block is done.

> I doubt that. Unless "inFile" is an input stream and not a file, as the name would suggest.
Both work, in this case it was indeed an "InputStream" - as you said,
it can't get closed if it's a file.

--

Assuming "isEncrypted" throws the exception, do I need to create
another PDDocument instance?
If so, is there a way for me to prevent the memory/tempfile
consumption caused by the first (useless) instance?

Thanks,
Maxime

PS/ Thanks for the empty password tip, I was not aware of this

On Mon, Dec 26, 2016 at 1:52 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> Am 26.12.2016 um 12:23 schrieb Maxime GUERREIRO:
>>
>> Hello everyone,
>>
>> I am using PDFBox 2.0.4 and I couldn't find the best way to do it.
>> I hope you can help me :-)
>>
>> I have a simple application that accepts PDF files (among other formats),
>> and I currently pass it along (as argument to methods) as a File. In order
>> to cleanup my code, I wanted to pass it as an InputStream instead... but I
>
>
> ... which is not a good idea as it will make everything slower. Because
> PDFBox makes an internal copy to allow random access.
>
>
>> miss one feature in PDFBox 2.0 detecting if the file is password
>> protected.
>> I think it was possible using the 1.8 version, as we instantiated the doc
>> and
>> *then* we'd give it the password.
>>
>> Sample code:
>> ```
>> try (PDDocument ignore = PDDocument.load(inFile)) {
>>      return null; // not password protected
>> } catch (InvalidPasswordException ignore) {
>> }
>> ```
>
>
> That code has a memory leak as "ignore" isn't closed
>
>>
>> The issue is that when I try to reuse inFile, it tells me it has been
>> closed.
>
>
> I doubt that. Unless "inFile" is an input stream and not a file, as the name
> would suggest.
>
>> When wrapping it in a CloseShieldInputStream, the whole file is read...
>> leading
>> to an unwanted memory consumption and/or a temporary file.
>>
>> This was not an issue with File-s because I could re-recreate an
>> InputStream
>> when I needed to.
>>
>>
>> ~> Is there a way to check if a file is password-protected *and* if the
>> given
>> password is the right one, without PDFBox reading it as a whole?
>
>
> |document.isEncrypted() == true|
>
>
> then it is encrypted and you gave the correct password (which can be the
> empty password!!!).
>
> If it is encrypted and you gave the wrong password then you'll get an
> InvalidPasswordException
>
> Tilman
>
>>
>> Thanks,
>> Maxime
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDDocument.load, password and InputStream

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 26.12.2016 um 12:23 schrieb Maxime GUERREIRO:
> Hello everyone,
>
> I am using PDFBox 2.0.4 and I couldn't find the best way to do it.
> I hope you can help me :-)
>
> I have a simple application that accepts PDF files (among other formats),
> and I currently pass it along (as argument to methods) as a File. In order
> to cleanup my code, I wanted to pass it as an InputStream instead... but I

... which is not a good idea as it will make everything slower. Because 
PDFBox makes an internal copy to allow random access.


> miss one feature in PDFBox 2.0 detecting if the file is password protected.
> I think it was possible using the 1.8 version, as we instantiated the doc and
> *then* we'd give it the password.
>
> Sample code:
> ```
> try (PDDocument ignore = PDDocument.load(inFile)) {
>      return null; // not password protected
> } catch (InvalidPasswordException ignore) {
> }
> ```

That code has a memory leak as "ignore" isn't closed

>
> The issue is that when I try to reuse inFile, it tells me it has been closed.

I doubt that. Unless "inFile" is an input stream and not a file, as the 
name would suggest.

> When wrapping it in a CloseShieldInputStream, the whole file is read... leading
> to an unwanted memory consumption and/or a temporary file.
>
> This was not an issue with File-s because I could re-recreate an InputStream
> when I needed to.
>
>
> ~> Is there a way to check if a file is password-protected *and* if the given
> password is the right one, without PDFBox reading it as a whole?

|document.isEncrypted() == true|


then it is encrypted and you gave the correct password (which can be the 
empty password!!!).

If it is encrypted and you gave the wrong password then you'll get an 
InvalidPasswordException

Tilman

>
> Thanks,
> Maxime
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>