You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jorge Spinsanti (JIRA)" <ji...@apache.org> on 2016/12/22 13:24:58 UTC
[jira] [Commented] (TIKA-2225) Parse DOCX file due to
NullPointerException on POI code
[ https://issues.apache.org/jira/browse/TIKA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770044#comment-15770044 ]
Jorge Spinsanti commented on TIKA-2225:
---------------------------------------
I created an issue on POI too: https://bz.apache.org/bugzilla/show_bug.cgi?id=60484
> Parse DOCX file due to NullPointerException on POI code
> -------------------------------------------------------
>
> Key: TIKA-2225
> URL: https://issues.apache.org/jira/browse/TIKA-2225
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.14
> Reporter: Jorge Spinsanti
>
> I'm trying to get text from DOCX file but I got an exception due to NullPonterException on POI code. Stacktrace:
> {code}
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@4f5692fe
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 16 more
> Caused by: java.lang.NullPointerException
> at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:422)
> at org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:131)
> at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:286)
> at org.apache.tika.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:609)
> at org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:517)
> at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:346)
> at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:273)
> at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:179)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:169)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:130)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)