You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Johannes Wirkkala Westlund (Jira)" <ji...@apache.org> on 2022/03/30 14:29:00 UTC

[jira] [Updated] (TIKA-3709) RuntimeException when parsing word (.doc) document

     [ https://issues.apache.org/jira/browse/TIKA-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johannes Wirkkala Westlund updated TIKA-3709:
---------------------------------------------
    Description: 
Hi,

I have a word file that throw the following error when I try to parse it with Tika:
{code:java}
Caused by: java.lang.IllegalArgumentException: This paragraph is not the first one in the table
    at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:810)
    at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:272)
    at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:255)
    at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:210)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:216)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:173)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
    ... 5 more {code}
I have attached the document with this issue.

Might be related to: https://issues.apache.org/jira/browse/TIKA-1251

  was:
Hi,

I have a word file that throw the following error when I try to parse it with Tika:


{code:java}
Caused by: java.lang.IllegalArgumentException: This paragraph is not the first one in the table
    at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:810)
    at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:272)
    at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:255)
    at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:210)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:216)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:173)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
    ... 5 more {code}

I have attached the document with this issue.


> RuntimeException when parsing word (.doc) document
> --------------------------------------------------
>
>                 Key: TIKA-3709
>                 URL: https://issues.apache.org/jira/browse/TIKA-3709
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Johannes Wirkkala Westlund
>            Priority: Minor
>         Attachments: Avtalsvillkor (1).doc
>
>
> Hi,
> I have a word file that throw the following error when I try to parse it with Tika:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: This paragraph is not the first one in the table
>     at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:810)
>     at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:272)
>     at org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:255)
>     at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:210)
>     at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:216)
>     at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:173)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
>     ... 5 more {code}
> I have attached the document with this issue.
> Might be related to: https://issues.apache.org/jira/browse/TIKA-1251



--
This message was sent by Atlassian Jira
(v8.20.1#820001)