You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by sarna <sa...@isi.edu> on 2009/07/01 20:54:31 UTC
Re: reading Tables
This works with one bug: ArrayList tables is null so just correcting to
ArrayList tables = new ArrayList(); solves the null exception.
MSB wrote:
>
> Right, managed to make some progress this evening. Have a look at the
> class below, it assembles an ArrayList of the tables in a Word document
> currently but you could easilly change that if you wanted to. I have added
> lots of comments - probably too many but if you have any questions, just
> drop a line to the forum.
>
> You will need to change this line ;
>
> inputFile = new File("C:\\temp\\table.doc");
>
> to point to the file you wish to process by the way.
>
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.Table;
> import org.apache.poi.hwpf.usermodel.Range;
>
> import java.io.*;
> import java.util.ArrayList;
>
> /**
> *
> * TEST/DEMONSTRATION CODE ONLY.
> *
> * @author Mark B.
> * @version 1.00 8th April 2009.
> */
> public class Main {
>
> /**
> */
> public static void main(String[] args) {
>
> BufferedInputStream bufIStream = null;
> FileInputStream fileIStream = null;
> File inputFile = null;
> HWPFDocument doc = null;
> Range range = null;
> Table table = null;
> ArrayList<Table> tables = null;
> Paragraph para = null;
> boolean inTable = false;
> int numParas = 0;
>
> try {
> tables = new ArrayList<Table>();
> inputFile = new File("C:\\temp\\table.doc");
> fileIStream = new FileInputStream(inputFile);
> bufIStream = new BufferedInputStream(fileIStream);
> //
> // Open a Word document.
> //
> doc = new HWPFDocument(bufIStream);
> //
> // Get the highest level Range object that represents the
> // contents of the document.
> //
> range = doc.getRange();
> //
> // Get the number of paragraphs
> //
> numParas = range.numParagraphs();
> //
> // Step through each Paragraph.
> //
> for(int i = 0; i < numParas; i++) {
> para = range.getParagraph(i);
> //
> // Is the Paragraph within a table?
> //
> if(para.isInTable()) {
> //
> // The inTable flag is used to ensure that a call is
> made
> // to the getTable() method of the Range class once
> only
> // when the first Paragraph that is within a table is
> // recovered. So......
> //
> if(!inTable) {
> //
> // Get the table and add it to an ArrayList for
> later
> // processing. You do not have to do this, it
> would
> // be possible to process the table here. There
> are
> // methods defined on the Table class that allow
> you
> // to get at the number of rows in the table and
> to
> // recover a reference to each row in turn. Once
> you
> // have a row, it is possible then to get at each
> cell
> // in turn. Look at the Table, TableRow and
> TableCell
> // classes.
> //
> table = range.getTable(para);
> tables.add(table);
> inTable = true;
> }
> }
> else {
> //
> // Set the flag false to indicate that all of the
> paragrphs
> // in the table have been processed. A single blank
> line is
> // sufficient to indicate the end of the tbale within
> the
> // Word document.
> //
> // This is also the place to deal with any non-table
> paragraphs.
> //
> inTable = false;
> }
> }
> //
> // This line simply prints out the number of tables found in
> the
> // document - usede for testing purposes here.
> //
> System.out.println("Found " + tables.size() + " tables in the
> document.");
> }
> catch(Exception ex) {
> System.out.println("Caught an: " + ex.getClass().getName());
> System.out.println("Message: " + ex.getMessage());
> System.out.println("Stacktrace follows:..............");
> ex.printStackTrace(System.out);
> }
> finally {
> if(bufIStream != null) {
> try {
> bufIStream.close();
> bufIStream = null;
> fileIStream = null;
> }
> catch(Exception ex) {
> // I G N O R E //
> }
> }
> }
>
> }
> }
>
>
>
> codaditasso@tiscali.it wrote:
>>
>> Hi
>> Anyone could post me the minimal code to get the list of all tables
>> in a word document. In particular i want to have a class from i could
>> extract the Object "Table" from each table in the document.
>> Greetings
>>
>> Enrico
>>
>>
>> Con Tutto Incluso chiami e navighi senza limiti e hai 4 mesi GRATIS.
>>
>> L'attivazione del servizio è gratis e non paghi più Telecom!
>>
>> L'offerta è valida solo se attivi entro il 07/04/09
>> http://abbonati.tiscali.it/promo/tuttoincluso/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>>
>
>
--
View this message in context: http://www.nabble.com/reading-Tables-tp22911444p24295285.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: reading Tables
Posted by MSB <ma...@tiscali.co.uk>.
Thanks for testing, and correcting(!!), that code.
Yours
Mark B
sarna wrote:
>
>
> This works with one bug: ArrayList tables is null so just correcting to
> ArrayList tables = new ArrayList(); solves the null exception.
>
>
> MSB wrote:
>>
>> Right, managed to make some progress this evening. Have a look at the
>> class below, it assembles an ArrayList of the tables in a Word document
>> currently but you could easilly change that if you wanted to. I have
>> added lots of comments - probably too many but if you have any questions,
>> just drop a line to the forum.
>>
>> You will need to change this line ;
>>
>> inputFile = new File("C:\\temp\\table.doc");
>>
>> to point to the file you wish to process by the way.
>>
>> import org.apache.poi.hwpf.HWPFDocument;
>> import org.apache.poi.hwpf.usermodel.Paragraph;
>> import org.apache.poi.hwpf.usermodel.Table;
>> import org.apache.poi.hwpf.usermodel.Range;
>>
>> import java.io.*;
>> import java.util.ArrayList;
>>
>> /**
>> *
>> * TEST/DEMONSTRATION CODE ONLY.
>> *
>> * @author Mark B.
>> * @version 1.00 8th April 2009.
>> */
>> public class Main {
>>
>> /**
>> */
>> public static void main(String[] args) {
>>
>> BufferedInputStream bufIStream = null;
>> FileInputStream fileIStream = null;
>> File inputFile = null;
>> HWPFDocument doc = null;
>> Range range = null;
>> Table table = null;
>> ArrayList<Table> tables = null;
>> Paragraph para = null;
>> boolean inTable = false;
>> int numParas = 0;
>>
>> try {
>> tables = new ArrayList<Table>();
>> inputFile = new File("C:\\temp\\table.doc");
>> fileIStream = new FileInputStream(inputFile);
>> bufIStream = new BufferedInputStream(fileIStream);
>> //
>> // Open a Word document.
>> //
>> doc = new HWPFDocument(bufIStream);
>> //
>> // Get the highest level Range object that represents the
>> // contents of the document.
>> //
>> range = doc.getRange();
>> //
>> // Get the number of paragraphs
>> //
>> numParas = range.numParagraphs();
>> //
>> // Step through each Paragraph.
>> //
>> for(int i = 0; i < numParas; i++) {
>> para = range.getParagraph(i);
>> //
>> // Is the Paragraph within a table?
>> //
>> if(para.isInTable()) {
>> //
>> // The inTable flag is used to ensure that a call is
>> made
>> // to the getTable() method of the Range class once
>> only
>> // when the first Paragraph that is within a table is
>> // recovered. So......
>> //
>> if(!inTable) {
>> //
>> // Get the table and add it to an ArrayList for
>> later
>> // processing. You do not have to do this, it
>> would
>> // be possible to process the table here. There
>> are
>> // methods defined on the Table class that allow
>> you
>> // to get at the number of rows in the table and
>> to
>> // recover a reference to each row in turn. Once
>> you
>> // have a row, it is possible then to get at each
>> cell
>> // in turn. Look at the Table, TableRow and
>> TableCell
>> // classes.
>> //
>> table = range.getTable(para);
>> tables.add(table);
>> inTable = true;
>> }
>> }
>> else {
>> //
>> // Set the flag false to indicate that all of the
>> paragrphs
>> // in the table have been processed. A single blank
>> line is
>> // sufficient to indicate the end of the tbale within
>> the
>> // Word document.
>> //
>> // This is also the place to deal with any non-table
>> paragraphs.
>> //
>> inTable = false;
>> }
>> }
>> //
>> // This line simply prints out the number of tables found in
>> the
>> // document - usede for testing purposes here.
>> //
>> System.out.println("Found " + tables.size() + " tables in the
>> document.");
>> }
>> catch(Exception ex) {
>> System.out.println("Caught an: " + ex.getClass().getName());
>> System.out.println("Message: " + ex.getMessage());
>> System.out.println("Stacktrace follows:..............");
>> ex.printStackTrace(System.out);
>> }
>> finally {
>> if(bufIStream != null) {
>> try {
>> bufIStream.close();
>> bufIStream = null;
>> fileIStream = null;
>> }
>> catch(Exception ex) {
>> // I G N O R E //
>> }
>> }
>> }
>>
>> }
>> }
>>
>>
>>
>> codaditasso@tiscali.it wrote:
>>>
>>> Hi
>>> Anyone could post me the minimal code to get the list of all tables
>>> in a word document. In particular i want to have a class from i could
>>> extract the Object "Table" from each table in the document.
>>> Greetings
>>>
>>> Enrico
>>>
>>>
>>> Con Tutto Incluso chiami e navighi senza limiti e hai 4 mesi GRATIS.
>>>
>>> L'attivazione del servizio è gratis e non paghi più Telecom!
>>>
>>> L'offerta è valida solo se attivi entro il 07/04/09
>>> http://abbonati.tiscali.it/promo/tuttoincluso/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>>
>>
>>
>
>
--
View this message in context: http://www.nabble.com/reading-Tables-tp22911444p24302009.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org