You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2023/04/11 10:52:00 UTC

[jira] [Created] (TIKA-4012) Improve extraction of embedded documents in PDFs

Tim Allison created TIKA-4012:
---------------------------------

             Summary: Improve extraction of embedded documents in PDFs
                 Key: TIKA-4012
                 URL: https://issues.apache.org/jira/browse/TIKA-4012
             Project: Tika
          Issue Type: New Feature
            Reporter: Tim Allison


We're currently processing the EmbeddedFiles entry in the name tree and annotations to look for file spec dictionaries. Unfortunately, PDFs may embed files in lots of other places.  The newly free 2.0 spec makes this abundantly and painfully clear. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)