You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Theodor Sjöstedt (JIRA)" <ji...@apache.org> on 2014/09/25 17:37:35 UTC

[jira] [Created] (TIKA-1428) Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character

Theodor Sjöstedt created TIKA-1428:
--------------------------------------

             Summary: Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character
                 Key: TIKA-1428
                 URL: https://issues.apache.org/jira/browse/TIKA-1428
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.6, 1.4
            Reporter: Theodor Sjöstedt
            Priority: Minor


Footnotes from {{.doc}} documents are extracted, but the references to the footnotes are replaced by the Unicode Replacement Character (�).

I have tried this in 1.4 and 1.6.

In 1.4, both reference in text and reference at footnote have been replaced.
In 1.6, reference in text has disappeared completely.
See attached image for original document, 1.4 Formatted text, and 1.6 Formatted text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)