You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/03/25 19:16:52 UTC

[jira] [Commented] (TIKA-1440) Auto-Paragraph numbers not extracted from Word Document

    [ https://issues.apache.org/jira/browse/TIKA-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380393#comment-14380393 ] 

Tim Allison commented on TIKA-1440:
-----------------------------------

Able to post a mock-up document and expected output?  Can't tell if we'll be able to do this at the Tika level or if we'll need mods to POI.

> Auto-Paragraph numbers not extracted from Word Document 
> --------------------------------------------------------
>
>                 Key: TIKA-1440
>                 URL: https://issues.apache.org/jira/browse/TIKA-1440
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>         Environment: Windows 7, Windows Server 2008, Tomcat
>            Reporter: Steve Gullion
>            Priority: Minor
>              Labels: numbering, paragraph, word
>
> When the text is extracted from a Microsoft Word document that uses automatic numbering, the text of the automatic numbers is not extracted. As the numbers can be critical to the meaning of the document (as in the case of cross-references), they should be calculated and extracted if at all possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)