You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Steve Gullion (JIRA)" <ji...@apache.org> on 2016/07/15 23:49:20 UTC
[jira] [Created] (TIKA-2036) Deleted Text from Word File Shows Up
in Extract
Steve Gullion created TIKA-2036:
-----------------------------------
Summary: Deleted Text from Word File Shows Up in Extract
Key: TIKA-2036
URL: https://issues.apache.org/jira/browse/TIKA-2036
Project: Tika
Issue Type: Bug
Components: core
Affects Versions: 1.13
Environment: Windows, under TikaOnDotNet
Reporter: Steve Gullion
A .docx file, with "track changes" on, includes deleted text. In this case, there are two overlapping deletions:
9. [DELETED:This Agreement shall be governed by and construed in accordance with [INSERTED, THEN DELETED:Arizona] New York law] (Intentionally omitted.)
The text should only include "9. (Intentionally omitted)". However, the output is "9. This Agreement shall be governed and construed in accordance with New York law." So it recognizes "Arizona" as deleted, but not the rest of it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)