You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Ahmed, Sana R (IS)" <Sa...@ngc.com> on 2009/08/24 22:01:07 UTC
POI WordExtractor Not Extracting Entire Document
Hi.
We are using poi 3.5 beta 6 in production to extract office documents. We came across a document where it did not extract completely. The extracted text appears to have left out a couple of paragraphs from the middle of the document.
Here is a link to the document. http://www.mediafire.com/?sharekey=2e6a7badb4ab32e07f7ec40ad
The following is the snippet of code we are using to extract the document.
WordExtractor we = new WordExtractor(new FileInputStream(args[0]));
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8"));
bw.write(we.getText().replaceAll("\n", System.getProperty("line.separator")));
bw.flush();
bw.close();
This is a major production problem, so please respond as soon as possible.
Thanks!