You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Takashi Komatsubara <ta...@gmail.com> on 2009/09/01 01:52:18 UTC

Re: Do we should be able to extract text from ownter-password protected pdf file?

Adam,

Sorry If I am wrong ... but just let me explain some.

Owner-password protected PDF file, I could extract the text.
Use-password protected PDF file, I could "NOT" extract the text.

When you open the owner-password protected pdf file, we can see the content 
without specifing "password".
That's the point.

Takashi.


----- Original Message ----- 
From: <Ad...@swmc.com>
To: <pd...@incubator.apache.org>
Sent: Tuesday, September 01, 2009 3:17 AM
Subject: Re: Do we should be able to extract text from ownter-password 
protected pdf file?


I tested you patch and confirmed that this does NOT work for encrypted
files.  Here's the stacktrace:

Exception in thread "main"
org.apache.pdfbox.exceptions.CryptographyException: Error: The supplied
password does not match either the owner or user password in the document.
        at
org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:231)
        at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1014)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:184)

In case my line numbers are off, line 184 is: document.openProtection( sdm
); which happens before the lines which were commented out by your patch.

I believe you're saying that the text can be extracted from password
protected, non-encrypted files.  If it's possible to password protect PDFs
without using encryption, that's news to me.   I'm not sure what the point
would be of password protecting something if you're not going to encrypt
it, since that would only give a false sense of security, not any actual
security.  So, I just wanted to clear that up so people don't read your
post and think that all PDF security is completely broken.  When I first
read it, I thought you were implying that any password protected document
could be read without the password.

As for whether we "should" be able to do this or not, I'd say the
ExtractText program which comes with PDFBox should respect the permissions
by default, and perhaps have an option to extract password protected,
unencrypted documents (without a password).  I'm not sure what one would
call that option... -bypassPassword ?

--Adam




"Takashi Komatsubara" <ta...@gmail.com>
08/31/2009 04:05
Please respond to
pdfbox-dev@incubator.apache.org


To
<pd...@incubator.apache.org>
cc

Subject
Do we should be able to extract text from ownter-password protected pdf
file?






Hi team,

Technically, we can do extract text from "Owner" password protected pdf
file
without specifing "owner" password. Right?

Do we should be able to do that ? or not.

The reason why I'm asking is I am using the PDFBox for audting the content

of the pdf file.
So, whether the user want to make "text extract" permission disabled or
not,
I need to look into the content of the "owner password" protected pdf
file.

Old PDFbox could do this.

What do you think?

Takashi




?  Click here to submit conditions

This email and any content within or attached hereto from  Sun West Mortgage 
Company, Inc.  is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. 
If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or the taking of any action in reliance on 
the contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any 
other personal or financial information in the content of the email. Should 
you have any questions, please call  (800) 453 7884.   =