You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Konstantin Gribov (JIRA)" <ji...@apache.org> on 2016/03/29 21:05:25 UTC
[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to
POI issue
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216653#comment-15216653 ]
Konstantin Gribov commented on TIKA-1836:
-----------------------------------------
It seems to be a regression: http://stackoverflow.com/questions/36282432/tika-1-12-officeparser-error-unexpected-runtimeexception-from-org-apache-tika
[~tallison@apache.org], I can't find your fix for this issue in git, maybe it's lost somewhere =(
> Convertion DOC->TXT failed due to POI issue
> -------------------------------------------
>
> Key: TIKA-1836
> URL: https://issues.apache.org/jira/browse/TIKA-1836
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.11
> Environment: Distributor ID: Ubuntu
> Description: Ubuntu 12.04.5 LTS
> Release: 12.04
> Codename: precise
> java version "1.7.0_91"
> OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
> Reporter: Jorge Spinsanti
> Attachments: test.doc
>
>
> When we try to convert DOC -> TXT, I got the next stack trace:
> {code}
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1ddeedb6
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 15 more
> Caused by: java.lang.UnsupportedOperationException: Non-extended character Pascal strings are not supported right now. Please, contact POI developers for update.
> at org.apache.poi.hwpf.model.Sttb.fillFields(Sttb.java:82)
> at org.apache.poi.hwpf.model.Sttb.<init>(Sttb.java:61)
> at org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:52)
> at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53)
> at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:361)
> at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 22 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)