You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2013/11/01 18:11:02 UTC
[Bug 55733] New: NullPointerException when attempting to parse a
Word document with no headers
https://issues.apache.org/bugzilla/show_bug.cgi?id=55733
Bug ID: 55733
Summary: NullPointerException when attempting to parse a Word
document with no headers
Product: POI
Version: 3.9
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: XWPF
Assignee: dev@poi.apache.org
Reporter: david.patrone@jhuapl.edu
Created attachment 30990
--> https://issues.apache.org/bugzilla/attachment.cgi?id=30990&action=edit
Two Word test files without headers - one throws NullPointerException, one
doesn't
I was given a programmatically generated Word document that did not contain any
headers. MS Word is able to open this, however I get a NullPointerException
when attempting with XWPFWordExtractor.getText(). Specifically:
java.lang.NullPointerException
at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.extractHeaders(XWPFWordExtractor.java:162)
at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.getText(XWPFWordExtractor.java:87)
at Test.testPrintDoc(Test.java:16)
at Test.main(Test.java:26)
Looking at the code, it looks like hfPolicy is passed in as null to
XWPFWordExtractor.extractHeaders() from XWPFWordExtractor.getText():
public String getText() {
StringBuffer text = new StringBuffer();
XWPFHeaderFooterPolicy hfPolicy = document.getHeaderFooterPolicy();
// Start out with all headers
extractHeaders(text, hfPolicy);
which says the headerFooterPolicy of the Document (from
Document.getHeaderFooterPolicy()) is never set in Document, and is the source
of the null propagated to cause the error.
I'd chalk it up to an invalid Word document, however MS Word can open the file.
If you open it in Word, don't make any changes but just re-save it out, it
still reports it doesn't have headers, but the new file can be read by
XWPFWordExtractor.getText() without the NullPointerException.
Example word documents without a header that throw the error and don't throw it
are attached. Here's the test code I used to print out what was in the file.
import java.io.FileInputStream;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class Test {
public static void testPrintDoc(String file) throws Exception {
FileInputStream fis = new FileInputStream(file);
System.err.println("Reading " + file);
try {
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor textExtractor = new XWPFWordExtractor(doc);
System.err.println(textExtractor.getText());
} finally {
fis.close();
}
}
public static void main(String[] args) {
try {
Test.testPrintDoc("noHeaders.docx");
} catch (Exception e) {
e.printStackTrace();
}
try {
Test.testPrintDoc("noHeaders_resaved.docx");
} catch (Exception e) {
e.printStackTrace();
}
}
}
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 55733] NullPointerException when attempting to parse a Word
document with no headers
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55733
david.patrone@jhuapl.edu changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |david.patrone@jhuapl.edu
OS| |All
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 55733] NullPointerException when attempting to parse a Word
document with no headers
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55733
Nick Burch <ap...@gagravarr.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #1 from Nick Burch <ap...@gagravarr.org> ---
Thanks for the test file. Fixed in r1538044.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org