You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by Ingo Feltes <in...@itemis.de> on 2009/09/08 13:06:26 UTC

PDFParser fails to decyrpt metadata (patch included)

Hi folks,

the fix for TIKA-267 seems to work fine for the content, but Tika still
fails to decrypt the meta data of those PDFs. The meta data seems to be
still encrypted. If you switch the order of processing text and extracting
meta data the meta data is decrypted correctly.

Cheers,

Ingo

Index: PDFParser.java
===================================================================
--- PDFParser.java	(revision 812208)
+++ PDFParser.java	(working copy)
@@ -63,9 +63,9 @@
                     // Ignore
                 }
             }
+            PDF2XHTML.process(pdfDocument, handler, metadata);
+            extractMetadata(pdfDocument, metadata);
             metadata.set(Metadata.CONTENT_TYPE, "application/pdf");
-            extractMetadata(pdfDocument, metadata);
-            PDF2XHTML.process(pdfDocument, handler, metadata);
         } finally {
             pdfDocument.close();
         }