You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/10/02 20:02:34 UTC

[jira] Closed: (PDFBOX-256) Error decrypting document

     [ https://issues.apache.org/jira/browse/PDFBOX-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-256.
-------------------------------------

    Resolution: Not A Problem

PDFBox respects the missing permission to extract the text. I guess this is not a problem. Set to closed

> Error decrypting document
> -------------------------
>
>                 Key: PDFBOX-256
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-256
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Priority: Minor
>         Attachments: PDFBOX256-ELERAP_100_cfl.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1682201
> Originally submitted by nobody on 2007-03-16 08:15.
> I get the following exception: 
> WARNING: IOException while extracting full-text of file:/home/sintek/papers/baseweb/ECRA/ELERAP_100_cfl.pdf
> java.io.IOException: Error decrypting document, details: Error: The supplied password does not match either the owner or user password in the document.
>         at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:208)
>         at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
> [...]
> when trying to extract the text from the attached PDF. It has no password as far as I can tell, it opens fine in acrobat, gpdf, pdftotext, etc. pdfinfo tells me it's encrypted though. 
> Any ideas? 
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1682201&file_id=220882
> ELERAP_100_cfl.pdf (application/pdf), 9371 bytes
> "encrypted" PDF file
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> I noticed something regarding this, the PDDocument.decrypt("") does not set the encryption dictionary to null. This results in PDFTextStripper to try and decrypt the document again (I think it looks at the encryptionDictionary?). If allready decrypted, this results in a simular error as stated in this ticket.
> I now decrypt with the following code: 
> pdDoc.decrypt( passWord ); //password is mostly an empty string
> pdDoc.setEncryptionDictionary(null);
> pdDoc.getDocument().getTrailer().setItem("Encrypt",null);
> That goes fine.
> [comment on SourceForge]
> Originally sent by gromgull.
> Logged In: YES 
> user_id=185674
> Originator: NO
> Ah - indeed it is. I was confused since both acrobat and gpdf, etc. were able to show the content without prompting me for a password. So you can encrypte PDFs for text-extraction? Isn't that a hopeless idea? Oh well :) 
> [comment on SourceForge]
> Originally sent by ng_aldridge.
> Logged In: YES 
> user_id=1111818
> Originator: NO
> This file *is* encrypted. I just loaded it up into Acrobat and it's secured with a password.
> [comment on SourceForge]
> Originally sent by gromgull.
> Logged In: YES 
> user_id=185674
> Originator: NO
> Ah yes - that was me reporting that. Sorry for not logging in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.