You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2013/03/19 16:45:22 UTC

[Bug 54725] New: NullPointerException parsing ms doc file

https://issues.apache.org/bugzilla/show_bug.cgi?id=54725

            Bug ID: 54725
           Summary: NullPointerException parsing ms doc file
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: ushi+apache@honkgong.info
    Classification: Unclassified

Created attachment 30079
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=30079&action=edit
Tika failed to parse this doc file.

I am comming here from:
https://issues.apache.org/jira/browse/TIKA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605364#comment-13605364

I get a NullPointerException parsing a ms doc file using tika.

% java -Djava.awt.headless=false -jar tika-app-1.3.jar -t < test.doc       
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2443906f
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:139)
    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:400)
    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112)
Caused by: java.lang.NullPointerException
    at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:48)
    at org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
    at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:79)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)

The test.doc file is attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54725] NullPointerException parsing ms doc file

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54725

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #1 from Nick Burch <ap...@gagravarr.org> ---
Fixed in r1614926.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org