You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Steve Gullion (JIRA)" <ji...@apache.org> on 2016/07/15 23:49:20 UTC

[jira] [Created] (TIKA-2036) Deleted Text from Word File Shows Up in Extract

Steve Gullion created TIKA-2036:
-----------------------------------

             Summary: Deleted Text from Word File Shows Up in Extract
                 Key: TIKA-2036
                 URL: https://issues.apache.org/jira/browse/TIKA-2036
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 1.13
         Environment: Windows, under TikaOnDotNet
            Reporter: Steve Gullion


A .docx file, with "track changes" on, includes deleted text. In this case, there are two overlapping deletions:

9.	[DELETED:This Agreement shall be governed by and construed in accordance with [INSERTED, THEN DELETED:Arizona] New York law] (Intentionally omitted.)

The text should only include "9. (Intentionally omitted)". However, the output is "9. This Agreement shall be governed and construed in accordance with New York law." So it recognizes "Arizona" as deleted, but not the rest of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)