You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/01/18 14:26:26 UTC

[jira] [Resolved] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

     [ https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-2077.
-------------------------------
    Resolution: Not A Problem

Hahaha.  Turns out if you extend the text box in MSWord downwards, you'll see the AAAAAAA.  I don't think this is a problem...er, we have the same behavior as MSWord.  The AAAA... really is in the document.

> Special character extracted as AAAAAAAA in docx file extraction
> ---------------------------------------------------------------
>
>                 Key: TIKA-2077
>                 URL: https://issues.apache.org/jira/browse/TIKA-2077
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Akash Sudhakar
>         Attachments: TestData.docx
>
>
> During docx file extraction using tika 1.13, special character is extracted as AAAAAAAA.
> How to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)