You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/07/24 17:17:38 UTC
[jira] [Created] (TIKA-1376) Improve embedded file name extraction
in PDFParser
Tim Allison created TIKA-1376:
---------------------------------
Summary: Improve embedded file name extraction in PDFParser
Key: TIKA-1376
URL: https://issues.apache.org/jira/browse/TIKA-1376
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Trivial
Fix For: 1.6
When we extract embedded files from PDFs, we are currently using the key in the PDEmbeddedFilesNameTreeNode as the file name that we store as the value of Metadata.RESOURCE_NAME_KEY in the embedded document's metadata.
I think we should try to get the file name from PDComplexFileSpecification's getFilename() first. If that is null, then we should fall back to the key value.
--
This message was sent by Atlassian JIRA
(v6.2#6252)