You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/03/01 07:28:00 UTC

[jira] [Created] (PDFBOX-4477) Large encrypted file takes day to be parsed

Tilman Hausherr created PDFBOX-4477:
---------------------------------------

             Summary: Large encrypted file takes day to be parsed
                 Key: PDFBOX-4477
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4477
             Project: PDFBox
          Issue Type: Bug
          Components: Crypto, Parsing
    Affects Versions: 2.0.14
            Reporter: Tilman Hausherr
             Fix For: 2.0.15, 3.0.0 PDFBox


As reported by [~slavago] in TIKA-2832. File is confidential but I have it. Initial findings:
- File is AES256 encrypted with empty user password
- File has about 1000 objects
- File is a tagged PDF
- HashMap in SecurityHandler grows to 100000?!
- Using an IdentityHashMap speeds up the process dramatically, and it may also be a better solution that what was done in PDFBOX-4453

Todo:
- Read description of IdentityHashMap again
- Find out why the HashMap grows so much. Could it be that identical objects are stored twice? Or does the file have many direct objects?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org