You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Claudia Mickiewicz (JIRA)" <ji...@apache.org> on 2019/01/07 12:43:00 UTC
[jira] [Created] (TIKA-2807) .docx text extract leaves out rich
text content-control inside of a text box
Claudia Mickiewicz created TIKA-2807:
----------------------------------------
Summary: .docx text extract leaves out rich text content-control inside of a text box
Key: TIKA-2807
URL: https://issues.apache.org/jira/browse/TIKA-2807
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.20
Reporter: Claudia Mickiewicz
Attachments: test-document.docx
When parsing a Microsoft Word .docx, Rich Text Content Control nested inside of a Text Box remain unextracted.
I have attached a .docx file that can be tested against.
"_rich-text-content-control_inside-text-box_" remains unextracted while "rich-text-content-control " and "_simple text_" are extracted without any problem. **
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)