You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/12/02 19:14:13 UTC

[jira] Commented: (PDFBOX-858) Metadata extraction broken on some PDF files

    [ https://issues.apache.org/jira/browse/PDFBOX-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966214#action_12966214 ] 

Andreas Lehmkühler commented on PDFBOX-858:
-------------------------------------------

Martijn is correct, both pdfs are encrypted and the metadata will be available after decrypting using the current trunk.

The ExtractMetadata [1] example was improved in revision 1041509. It'll try to use the document information if there isn't any metadata within the catalogue.

> Metadata extraction broken on some PDF files
> --------------------------------------------
>
>                 Key: PDFBOX-858
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-858
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1
>            Reporter: Patrik Stenmark
>         Attachments: 2001Derivatives and Public Debt Mngt.pdf, RethinkingTheFinancialNetwork.pdf
>
>
> On certain PDF files (examples attached), the metadata extraction seems to be broken. Preview (on Mac OS X) and Acrobat Reader is able to read the metadata, but PDFbox gives complete jibberish: 
> Author=è'ÿÆ??kÔ7??ÕªG?
> I've tried both the version included in Tika 0.7 (1.0.0 I believe) and r1021264 from SVN. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.