You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "matcha007 (Jira)" <ji...@apache.org> on 2021/08/17 08:54:00 UTC

[jira] [Created] (TIKA-3526) i cant extract content from attachments in the document

matcha007 created TIKA-3526:
-------------------------------

             Summary: i cant extract content from attachments in the document
                 Key: TIKA-3526
                 URL: https://issues.apache.org/jira/browse/TIKA-3526
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.20
            Reporter: matcha007


office series documents contain office series document attachment. Can the contents of the attachments be extracted as shown in the table below

 
|| ||doc||docx||xls||xlsx||ppt||pptx||
|txt|(/)|(/)|(/)|(/)|(x)|(/)|
|pdf|(/)|(/)|(/)|(/)|(x)|(/)|
|xml|(/)|(/)|(/)|(/)|(x)|(/)|
|doc|(/)|(/)|(/)|(/)|(x)|(/)|
|docx|(x)|(/)|(/)|(/)|(x)|(/)|
|xls|(/)|(/)|(/)|(/)|(x)|(/)|
|xlsx|(/)|(/)|(x)|(x)|(x)|(x)|
|ppt|(/)|(/)|(/)|(/)|(x)|(/)|
|pptx|(/)|(/)|(/)|(/)|(x)|(/)|

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)