You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Klink (Jira)" <ji...@apache.org> on 2022/10/25 17:42:00 UTC

[jira] [Commented] (PDFBOX-5533) Store password from PDF document in a byte array

    [ https://issues.apache.org/jira/browse/PDFBOX-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623950#comment-17623950 ] 

Michael Klink commented on PDFBOX-5533:
---------------------------------------

If you use {{byte}} arrays, then the users have to do the conversion from {{String}} themselves.

This sounds trivial but it is not: The exact conversion to apply depends on the encryption algorithm used. For example, for the current (revision 6 as defined in ISO 32000-2) encryption, _the UTF-8 password string shall be generated from Unicode input by processing the input string with the SASLprep (Internet RFC 4013) profile of stringprep (Internet RFC 3454) using the Normalize and BiDi options, and then converting to a UTF-8 representation._

I doubt most users will follow that routine, most will simply call {{getBytes()}} and run into errors in internationalized contexts.

Switching from {{String}} to {{char[]}}, on the other hand, would leave the conversion to bytes in PDFBox, allowing for proper conversion.

> Store password from PDF document in a byte array
> ------------------------------------------------
>
>                 Key: PDFBOX-5533
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5533
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 2.0.27
>            Reporter: Aleksandr Beliakov
>            Priority: Minor
>
> Hello,
>  
> I would like to propose a security improvement regarding storing and handling a provided user-password when opening a protected PDF document.
> Currently the class [COSParser|https://github.com/apache/pdfbox/blob/2.0.27/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java#L98] stores the password as a String object, which is not the best practice.
> The problem is that sensitive data (such as passwords) stored in memory can be leaked if it is stored in a managed String object. String objects are not pinned, so the garbage collector can relocate these objects at will and leave several copies in memory. These objects are not encrypted by default, so anyone that can read the process' memory will be able to see the contents. Furthermore, if the process' memory gets swapped out to disk, the unencrypted contents of the string will be written to a swap file. Lastly, since String objects are immutable, removing the value of a String from memory can only be done by the CLR garbage collector.
>  
> Therefore, it would be preferable to handle all user-passwords as a byte[] or char[] array instead of String, which can be cleaned after the use. You may also see that when passing a password to JDK classes, the password is converted to an array of characters (e.g. [here|https://github.com/apache/pdfbox/blob/2.0.27/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java#L2979]).
>  
> To avoid unnecessary transformations and improve the security, it would be good to handle all passwords as an array starting from [PDDocumentload(...)|https://github.com/apache/pdfbox/blob/2.0.27/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java#L1030] method(s).
>  
> For backward compatibility, you may keep the old constructors and methods.
>  
> Thank you for your nice job!
>  
> Best regards,
> Aleksandr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org