You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jorge Spinsanti (JIRA)" <ji...@apache.org> on 2016/12/22 13:24:58 UTC

[jira] [Commented] (TIKA-2225) Parse DOCX file due to NullPointerException on POI code

    [ https://issues.apache.org/jira/browse/TIKA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770044#comment-15770044 ] 

Jorge Spinsanti commented on TIKA-2225:
---------------------------------------

I created an issue on POI too: https://bz.apache.org/bugzilla/show_bug.cgi?id=60484

> Parse DOCX file due to NullPointerException on POI code
> -------------------------------------------------------
>
>                 Key: TIKA-2225
>                 URL: https://issues.apache.org/jira/browse/TIKA-2225
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: Jorge Spinsanti
>
> I'm trying to get text from DOCX file but I got an exception due to NullPonterException on POI code. Stacktrace:
> {code}
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@4f5692fe
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	... 16 more
> Caused by: java.lang.NullPointerException
> 	at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:422)
> 	at org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:131)
> 	at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:286)
> 	at org.apache.tika.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:609)
> 	at org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:517)
> 	at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:346)
> 	at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:273)
> 	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:179)
> 	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:169)
> 	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:130)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)