You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Steve Gullion (JIRA)" <ji...@apache.org> on 2015/03/25 19:51:56 UTC

[jira] [Updated] (TIKA-1440) Auto-Paragraph numbers not extracted from Word Document

     [ https://issues.apache.org/jira/browse/TIKA-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Gullion updated TIKA-1440:
--------------------------------
    Attachment: Tika test 2003.doc
                Tika Test.docx

Very simple documents. More complex available if needed.

> Auto-Paragraph numbers not extracted from Word Document 
> --------------------------------------------------------
>
>                 Key: TIKA-1440
>                 URL: https://issues.apache.org/jira/browse/TIKA-1440
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>         Environment: Windows 7, Windows Server 2008, Tomcat
>            Reporter: Steve Gullion
>            Priority: Minor
>              Labels: numbering, paragraph, word
>         Attachments: Tika Test.docx, Tika test 2003.doc
>
>
> When the text is extracted from a Microsoft Word document that uses automatic numbering, the text of the automatic numbers is not extracted. As the numbers can be critical to the meaning of the document (as in the case of cross-references), they should be calculated and extracted if at all possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)