You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by sarna <sa...@isi.edu> on 2009/07/01 20:54:31 UTC

Re: reading Tables


This works with one bug:  ArrayList tables is null so just correcting to
ArrayList tables = new ArrayList(); solves the null exception.


MSB wrote:
> 
> Right, managed to make some progress this evening. Have a look at the
> class below, it assembles an ArrayList of the tables in a Word document
> currently but you could easilly change that if you wanted to. I have added
> lots of comments - probably too many but if you have any questions, just
> drop a line to the forum.
> 
> You will need to change this line ;
> 
> inputFile = new File("C:\\temp\\table.doc");
> 
> to point to the file you wish to process by the way.
> 
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.Table;
> import org.apache.poi.hwpf.usermodel.Range;
> 
> import java.io.*;
> import java.util.ArrayList;
> 
> /**
>  *
>  * TEST/DEMONSTRATION CODE ONLY.
>  * 
>  * @author Mark B.
>  * @version 1.00 8th April 2009.
>  */
> public class Main {
>     
>     /**
>      */
>     public static void main(String[] args) {
>         
>         BufferedInputStream bufIStream = null;
>         FileInputStream fileIStream = null;
>         File inputFile = null;
>         HWPFDocument doc = null;
>         Range range = null;
>         Table table = null;
>         ArrayList<Table> tables = null;
>         Paragraph para = null;
>         boolean inTable = false;
>         int numParas = 0;
>         
>         try {
>             tables = new ArrayList<Table>();
>             inputFile = new File("C:\\temp\\table.doc");
>             fileIStream = new FileInputStream(inputFile);
>             bufIStream = new BufferedInputStream(fileIStream);
>             //
>             // Open a Word document.
>             //
>             doc = new HWPFDocument(bufIStream);
>             //
>             // Get the highest level Range object that represents the
>             // contents of the document.
>             //
>             range = doc.getRange();
>             //
>             // Get the number of paragraphs
>             //
>             numParas = range.numParagraphs();
>             //
>             // Step through each Paragraph.
>             //
>             for(int i = 0; i < numParas; i++) {
>                 para = range.getParagraph(i);
>                 //
>                 // Is the Paragraph within a table?
>                 //
>                 if(para.isInTable()) {
>                     //
>                     // The inTable flag is used to ensure that a call is
> made
>                     // to the getTable() method of the Range class once
> only
>                     // when the first Paragraph that is within a table is
>                     // recovered. So......
>                     //
>                     if(!inTable) {
>                         //
>                         // Get the table and add it to an ArrayList for
> later
>                         // processing. You do not have to do this, it
> would
>                         // be possible to process the table here. There
> are
>                         // methods defined on the Table class that allow
> you
>                         // to get at the number of rows in the table and
> to
>                         // recover a reference to each row in turn. Once
> you
>                         // have a row, it is possible then to get at each
> cell
>                         // in turn. Look at the Table, TableRow and
> TableCell
>                         // classes.
>                         //
>                         table = range.getTable(para);
>                         tables.add(table);
>                         inTable = true;
>                     }
>                 }
>                 else {
>                     //
>                     // Set the flag false to indicate that all of the
> paragrphs
>                     // in the table have been processed. A single blank
> line is
>                     // sufficient to indicate the end of the tbale within
> the
>                     // Word document.
>                     //
>                     // This is also the place to deal with any non-table
> paragraphs.
>                     //
>                     inTable = false;
>                 }
>             }
>             //
>             // This line simply prints out the number of tables found in
> the
>             // document - usede for testing purposes here.
>             //
>             System.out.println("Found " + tables.size() + " tables in the
> document.");
>         }
>         catch(Exception ex) {
>             System.out.println("Caught an: " + ex.getClass().getName());
>             System.out.println("Message: " + ex.getMessage());
>             System.out.println("Stacktrace follows:..............");
>             ex.printStackTrace(System.out);
>         }
>         finally {
>             if(bufIStream != null) {
>                 try {
>                     bufIStream.close();
>                     bufIStream = null;
>                     fileIStream = null;
>                 }
>                 catch(Exception ex) {
>                     // I G N O R E //
>                 }
>             }
>         }
>         
>     }
> }
> 
> 
> 
> codaditasso@tiscali.it wrote:
>> 
>> Hi
>> Anyone could post me the minimal code to get the list of all tables 
>> in a word document. In particular i want to have a class from i could 
>> extract the Object "Table" from each table in the document.
>> Greetings
>> 
>> Enrico 
>> 
>> 
>> Con Tutto Incluso chiami e navighi senza limiti e hai 4 mesi GRATIS.
>> 
>> L'attivazione del servizio è gratis e non paghi più Telecom! 
>> 
>> L'offerta è valida solo se attivi entro il 07/04/09
>> http://abbonati.tiscali.it/promo/tuttoincluso/
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/reading-Tables-tp22911444p24295285.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: reading Tables

Posted by MSB <ma...@tiscali.co.uk>.
Thanks for testing, and correcting(!!), that code.

Yours

Mark B

sarna wrote:
> 
> 
> This works with one bug:  ArrayList tables is null so just correcting to
> ArrayList tables = new ArrayList(); solves the null exception.
> 
> 
> MSB wrote:
>> 
>> Right, managed to make some progress this evening. Have a look at the
>> class below, it assembles an ArrayList of the tables in a Word document
>> currently but you could easilly change that if you wanted to. I have
>> added lots of comments - probably too many but if you have any questions,
>> just drop a line to the forum.
>> 
>> You will need to change this line ;
>> 
>> inputFile = new File("C:\\temp\\table.doc");
>> 
>> to point to the file you wish to process by the way.
>> 
>> import org.apache.poi.hwpf.HWPFDocument;
>> import org.apache.poi.hwpf.usermodel.Paragraph;
>> import org.apache.poi.hwpf.usermodel.Table;
>> import org.apache.poi.hwpf.usermodel.Range;
>> 
>> import java.io.*;
>> import java.util.ArrayList;
>> 
>> /**
>>  *
>>  * TEST/DEMONSTRATION CODE ONLY.
>>  * 
>>  * @author Mark B.
>>  * @version 1.00 8th April 2009.
>>  */
>> public class Main {
>>     
>>     /**
>>      */
>>     public static void main(String[] args) {
>>         
>>         BufferedInputStream bufIStream = null;
>>         FileInputStream fileIStream = null;
>>         File inputFile = null;
>>         HWPFDocument doc = null;
>>         Range range = null;
>>         Table table = null;
>>         ArrayList<Table> tables = null;
>>         Paragraph para = null;
>>         boolean inTable = false;
>>         int numParas = 0;
>>         
>>         try {
>>             tables = new ArrayList<Table>();
>>             inputFile = new File("C:\\temp\\table.doc");
>>             fileIStream = new FileInputStream(inputFile);
>>             bufIStream = new BufferedInputStream(fileIStream);
>>             //
>>             // Open a Word document.
>>             //
>>             doc = new HWPFDocument(bufIStream);
>>             //
>>             // Get the highest level Range object that represents the
>>             // contents of the document.
>>             //
>>             range = doc.getRange();
>>             //
>>             // Get the number of paragraphs
>>             //
>>             numParas = range.numParagraphs();
>>             //
>>             // Step through each Paragraph.
>>             //
>>             for(int i = 0; i < numParas; i++) {
>>                 para = range.getParagraph(i);
>>                 //
>>                 // Is the Paragraph within a table?
>>                 //
>>                 if(para.isInTable()) {
>>                     //
>>                     // The inTable flag is used to ensure that a call is
>> made
>>                     // to the getTable() method of the Range class once
>> only
>>                     // when the first Paragraph that is within a table is
>>                     // recovered. So......
>>                     //
>>                     if(!inTable) {
>>                         //
>>                         // Get the table and add it to an ArrayList for
>> later
>>                         // processing. You do not have to do this, it
>> would
>>                         // be possible to process the table here. There
>> are
>>                         // methods defined on the Table class that allow
>> you
>>                         // to get at the number of rows in the table and
>> to
>>                         // recover a reference to each row in turn. Once
>> you
>>                         // have a row, it is possible then to get at each
>> cell
>>                         // in turn. Look at the Table, TableRow and
>> TableCell
>>                         // classes.
>>                         //
>>                         table = range.getTable(para);
>>                         tables.add(table);
>>                         inTable = true;
>>                     }
>>                 }
>>                 else {
>>                     //
>>                     // Set the flag false to indicate that all of the
>> paragrphs
>>                     // in the table have been processed. A single blank
>> line is
>>                     // sufficient to indicate the end of the tbale within
>> the
>>                     // Word document.
>>                     //
>>                     // This is also the place to deal with any non-table
>> paragraphs.
>>                     //
>>                     inTable = false;
>>                 }
>>             }
>>             //
>>             // This line simply prints out the number of tables found in
>> the
>>             // document - usede for testing purposes here.
>>             //
>>             System.out.println("Found " + tables.size() + " tables in the
>> document.");
>>         }
>>         catch(Exception ex) {
>>             System.out.println("Caught an: " + ex.getClass().getName());
>>             System.out.println("Message: " + ex.getMessage());
>>             System.out.println("Stacktrace follows:..............");
>>             ex.printStackTrace(System.out);
>>         }
>>         finally {
>>             if(bufIStream != null) {
>>                 try {
>>                     bufIStream.close();
>>                     bufIStream = null;
>>                     fileIStream = null;
>>                 }
>>                 catch(Exception ex) {
>>                     // I G N O R E //
>>                 }
>>             }
>>         }
>>         
>>     }
>> }
>> 
>> 
>> 
>> codaditasso@tiscali.it wrote:
>>> 
>>> Hi
>>> Anyone could post me the minimal code to get the list of all tables 
>>> in a word document. In particular i want to have a class from i could 
>>> extract the Object "Table" from each table in the document.
>>> Greetings
>>> 
>>> Enrico 
>>> 
>>> 
>>> Con Tutto Incluso chiami e navighi senza limiti e hai 4 mesi GRATIS.
>>> 
>>> L'attivazione del servizio è gratis e non paghi più Telecom! 
>>> 
>>> L'offerta è valida solo se attivi entro il 07/04/09
>>> http://abbonati.tiscali.it/promo/tuttoincluso/
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/reading-Tables-tp22911444p24302009.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org