You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/03/08 02:32:40 UTC

[jira] [Resolved] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

     [ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-1836.
-------------------------------
    Resolution: Fixed

> Convertion DOC->TXT failed due to POI issue
> -------------------------------------------
>
>                 Key: TIKA-1836
>                 URL: https://issues.apache.org/jira/browse/TIKA-1836
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11
>         Environment: Distributor ID:	Ubuntu
> Description:	Ubuntu 12.04.5 LTS
> Release:	12.04
> Codename:	precise
> java version "1.7.0_91"
> OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
>            Reporter: Jorge Spinsanti
>         Attachments: test.doc
>
>
> When we try to convert DOC -> TXT, I got the next stack trace:
> {code}
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1ddeedb6
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	... 15 more
> Caused by: java.lang.UnsupportedOperationException: Non-extended character Pascal strings are not supported right now. Please, contact POI developers for update.
> 	at org.apache.poi.hwpf.model.Sttb.fillFields(Sttb.java:82)
> 	at org.apache.poi.hwpf.model.Sttb.<init>(Sttb.java:61)
> 	at org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:52)
> 	at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53)
> 	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:361)
> 	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
> 	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
> 	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)