You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/07/24 17:17:38 UTC

[jira] [Created] (TIKA-1376) Improve embedded file name extraction in PDFParser

Tim Allison created TIKA-1376:
---------------------------------

             Summary: Improve embedded file name extraction in PDFParser
                 Key: TIKA-1376
                 URL: https://issues.apache.org/jira/browse/TIKA-1376
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Tim Allison
            Assignee: Tim Allison
            Priority: Trivial
             Fix For: 1.6


When we extract embedded files from PDFs, we are currently using the key in the PDEmbeddedFilesNameTreeNode as the file name that we store as the value of Metadata.RESOURCE_NAME_KEY in the embedded document's  metadata.

I think we should try to get the file name from PDComplexFileSpecification's getFilename() first.  If that is null, then we should fall back to the key value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)