You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Martijn Brinkers (JIRA)" <ji...@apache.org> on 2010/11/21 22:09:17 UTC

[jira] Commented: (PDFBOX-858) Metadata extraction broken on some PDF files

    [ https://issues.apache.org/jira/browse/PDFBOX-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934351#action_12934351 ] 

Martijn Brinkers commented on PDFBOX-858:
-----------------------------------------

You should decrypt the document before parsing:

            if (document.isEncrypted())
            {
                try {
                    /*
                     * Try to decrypt with standard password
                     */
                    document.decrypt(null);
                }
                catch (CryptographyException e) {
                    // handle
                }
                catch (InvalidPasswordException e) {
                    // handle
                }
            }

The meta info seems to be encrypted with the default password. The meta info can be read by me when the PDF is decrypted.

> Metadata extraction broken on some PDF files
> --------------------------------------------
>
>                 Key: PDFBOX-858
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-858
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1
>            Reporter: Patrik Stenmark
>         Attachments: 2001Derivatives and Public Debt Mngt.pdf, RethinkingTheFinancialNetwork.pdf
>
>
> On certain PDF files (examples attached), the metadata extraction seems to be broken. Preview (on Mac OS X) and Acrobat Reader is able to read the metadata, but PDFbox gives complete jibberish: 
> Author=è'ÿÆ??kÔ7??ÕªG?
> I've tried both the version included in Tika 0.7 (1.0.0 I believe) and r1021264 from SVN. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.