You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Gabriel Miklos (JIRA)" <ji...@apache.org> on 2010/03/23 01:27:27 UTC

[jira] Created: (TIKA-389) Garbled metadata when dealing with encrypted PDF files.

Garbled metadata when dealing with encrypted PDF files.
-------------------------------------------------------

                 Key: TIKA-389
                 URL: https://issues.apache.org/jira/browse/TIKA-389
             Project: Tika
          Issue Type: Bug
          Components: metadata, parser
    Affects Versions: 0.6
         Environment: Windows 7 64-bit
            Reporter: Gabriel Miklos
            Priority: Minor


The code exhibiting this issue is very simple:

        InputStream input = new FileInputStream(file);
        ContentHandler textHandler = new BodyContentHandler();
        tikaParser.parse(input, textHandler, metadata);
        input.close();
        System.out.println(metadata);

The output:
title=?a???▬÷&▼??♂?ŢjK???ž?↑M?A→<═]1
=╬\bK Author=═g?═?♦ Content-Type=application/pdf creator=?k?═?♦Ý`;Ý?)??/¶???Ě?3n
Î☼46ËO

Other than that, the extracted text is 100% correct.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.