You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Gabriel Miklos (JIRA)" <ji...@apache.org> on 2010/03/23 01:27:27 UTC
[jira] Created: (TIKA-389) Garbled metadata when dealing with
encrypted PDF files.
Garbled metadata when dealing with encrypted PDF files.
-------------------------------------------------------
Key: TIKA-389
URL: https://issues.apache.org/jira/browse/TIKA-389
Project: Tika
Issue Type: Bug
Components: metadata, parser
Affects Versions: 0.6
Environment: Windows 7 64-bit
Reporter: Gabriel Miklos
Priority: Minor
The code exhibiting this issue is very simple:
InputStream input = new FileInputStream(file);
ContentHandler textHandler = new BodyContentHandler();
tikaParser.parse(input, textHandler, metadata);
input.close();
System.out.println(metadata);
The output:
title=?a???▬÷&▼??♂?ŢjK???ž?↑M?A→<═]1
=╬\bK Author=═g?═?♦ Content-Type=application/pdf creator=?k?═?♦Ý`;Ý?)??/¶???Ě?3n
Î☼46ËO
Other than that, the extracted text is 100% correct.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.