You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Peter Nordquist (JIRA)" <ji...@apache.org> on 2011/01/31 18:44:28 UTC

[jira] Created: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
--------------------------------------------------------------------------------

                 Key: PDFBOX-953
                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.4.0, 1.3.1
         Environment: Java: jdk1.6.0_20
OS: Windows 7, RHEL 5.5
            Reporter: Peter Nordquist


>From the command line version of PDFBox, this exception is printed out:

ExtractText failed with the following exception:
java.lang.ArrayIndexOutOfBoundsException
        at java.lang.System.arraycopy(Native Method)
        at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
        at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
        at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
        at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
        at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)

The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Martijn Brinkers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997168#comment-12997168 ] 

Martijn Brinkers commented on PDFBOX-953:
-----------------------------------------

The encryption revision of the document is 6. According to this posting http://forums.adobe.com/thread/763902?tstart=0 this is not yet documented (at least not publicly). We have to wait until it has been documented.

> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: New Feature
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "F. Schmitt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221921#comment-13221921 ] 

F. Schmitt commented on PDFBOX-953:
-----------------------------------

The documentation can be found here:
http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/adobe_supplement_iso32000.pdf
(Chapter 3.5 - Encryption)
The revision of the standard security handler was extended to number 5.

I would also like to see this feature. Or at least an exception, that revision 5 is not yet supported would be cool.
Thanks.
                
> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: New Feature
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Ralf Hauser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502595#comment-13502595 ] 

Ralf Hauser commented on PDFBOX-953:
------------------------------------

see also PDFBOX-135 and PDFBOX-1450
                
> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: New Feature
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Peter Nordquist (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Nordquist updated PDFBOX-953:
-----------------------------------

    Attachment: lorem-ipsum-256AES.pdf

Attached an example pdf with generated text inside.  Permissions password is 'changeit'

> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-953:
--------------------------------------

    Issue Type: New Feature  (was: Bug)

It's not a bug, it's a missing feature

> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: New Feature
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] Commented: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988833#comment-12988833 ] 

Andreas Lehmkühler commented on PDFBOX-953:
-------------------------------------------

That feature seems to be that new that I even can't open it using acrobat reader 9.4.1

> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] Issue Comment Edited: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Peter Nordquist (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989293#comment-12989293 ] 

Peter Nordquist edited comment on PDFBOX-953 at 2/1/11 6:02 PM:
----------------------------------------------------------------

Yes, sorry I didn't put that in the original description but when securing this PDF via Adobe Acrobat X Pro it does say that it can only be opened by Adobe Acrobat X and later.

Tested with:
Mac OS X 10.6.6 with Adobe Reader X 10.0.0
Windows 7 Enterprise 64-bit with Adobe Acrobat X Pro 10.0.0

      was (Author: peter.nordquist@pnl.gov):
    Yes, sorry I didn't put that in the original description but when securing this PDF via Adobe Acrobat X Pro it does say that it can only be opened by Adobe Acrobat X and later
  
> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

Posted by "Peter Nordquist (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989293#comment-12989293 ] 

Peter Nordquist commented on PDFBOX-953:
----------------------------------------

Yes, sorry I didn't put that in the original description but when securing this PDF via Adobe Acrobat X Pro it does say that it can only be opened by Adobe Acrobat X and later

> PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-953
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-953
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.4.0
>         Environment: Java: jdk1.6.0_20
> OS: Windows 7, RHEL 5.5
>            Reporter: Peter Nordquist
>         Attachments: lorem-ipsum-256AES.pdf
>
>
> From the command line version of PDFBox, this exception is printed out:
> ExtractText failed with the following exception:
> java.lang.ArrayIndexOutOfBoundsException
>         at java.lang.System.arraycopy(Native Method)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeEncryptedKey(StandardSecurityHandler.java:571)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.computeUserPassword(StandardSecurityHandler.java:608)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.isUserPassword(StandardSecurityHandler.java:792)
>         at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:189)
>         at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1091)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:190)
>         at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> The document I was using was encrypted using Adobe Acrobat X Pro and has only Page Extraction disabled inside of it.  It was encrypted only with a permissions password.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira