You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Satheesh.Subramaniam" <sa...@gmail.com> on 2009/07/18 08:44:15 UTC

Reading Microsoft Word Document in JAVA


im using follwing code to read word document in java using apache poi
package..

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import java.io.*;

public class readDoc
{
public static void main( String[] args )
{
String filesname = "Hello.doc";
POIFSFileSystem fs = null;
try
{
fs = new POIFSFileSystem(new FileInputStream(filesname;
//Couldn't close the braces at the end as my site did not allow it to close

HWPFDocument doc = new HWPFDocument(fs);

WordExtractor we = new WordExtractor(doc);

String[] paragraphs = we.getParagraphText();

System.out.println( "Word Document has " + paragraphs.length + " paragraphs"
);
for( int i=0; i<paragraphs .length; i++ ) {
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n","");
System.out.println( "Length:"+paragraphs[ i ].length());
}
}
catch(Exception e) {
e.printStackTrace();
}
}
}


but im getting exception that

java.io.IOException: Unable to read entire header; -1 bytes read; expected
512 bytes
at org.apache.poi.poifs.storage.HeaderBlockReader.<in
it>(HeaderBlockReader.java:78)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)


how to solve this issue.. please suggest me .. its urgent..
Add to satheeshtech's Reputation
-- 
View this message in context: http://www.nabble.com/Reading-Microsoft-Word-Document-in-JAVA-tp24545152p24545152.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org