You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Levi Strope <ls...@besttransport.com> on 2007/04/16 22:11:10 UTC

problem with HSSF eventusermodel

I'm trying to read in an XLS and output the contents of each row exactly
as they are read in by the file.
 
I have to do this with the evenusermodel because of memory limitations,
however it is proving to be very difficult.
 
In short, my problem is I need to keep track of how many columns there
are in a row, so that I can start a new line for the next row of data.
All I am doing is outputting this to the console.  If I use getLastCol()
it returns the correct number, BUT if the row has 1 less records there
is no way for me to tell it to go to the next row and my ouput of the
next row bleeds into the prior row.
 
Here is my output:
 
run:
SHID; Carrier; Origin City; Origin County; Origin State; Origin Zip;
Destination City; Destination County; Destination State;
2267.0; ADTS; IL; 600; MI;
2267.0; ADTS; IL; 600; MI;
2267.0; ADTS; IL; 600; 2267.0; ADTS; IL; 600; MI;
2267.0; ADTS; IL; 600; MI;
2267.0; ADTS; IL; 600; MI;
 
**********  Finished Processing File  ***********

As you can see, the 3rd row is longer than the rest.  The third row does
not contain information in the last column, and so the next row starts
there when it should be on a new line.  This would seem simple to
overcome but I'm finding it difficult to do so within the
eventusermodel.
 
I am not asking for anyone to give me code.  If anyone has ideas on how
I could get around this I would love to hear them.
 
Here is my code:
 
[BEGIN CODE]
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hssf.eventusermodel.*;
import org.apache.poi.hssf.record.*;
import java.io.*;
 
/**
 *
 * @author lstrope
 */
public class PoiXLStest implements HSSFListener {
    private SSTRecord sstrec;
    int rowRecLen[] = new int[RowRecord.MAX_ROW_NUMBER];
    int rowNum = 0;
 
    
    public PoiXLStest()
    {
    }
    
    public PoiXLStest(InputStream in)
    {   
        PoiXLStest noargs = new PoiXLStest();
        HSSFRequest req = new HSSFRequest();
        req.addListener(noargs, SSTRecord.sid);
        req.addListener(noargs, LabelSSTRecord.sid);
        req.addListener(noargs, RowRecord.sid);
        req.addListener(noargs, NumberRecord.sid);
        HSSFEventFactory factory = new HSSFEventFactory();
       
        try{
            
            factory.processEvents(req, in);
            
        }
        catch(IOException E){
            System.out.println("Problem in constructor");
        }
    }
    
    public void processRecord(Record record)  
    {
        short sidVal;
        sidVal = record.getSid();
        
        if(sidVal == RowRecord.sid){
            RowRecord rowRec = (RowRecord) record;
            rowRecLen[rowRec.getRowNumber()] = rowRec.getLastCol();  //
Setting array to hold the row at it's physical position with its Column
length.
        }
        
        if(sidVal == SSTRecord.sid){
            sstrec = (SSTRecord) record;
        }
        
        if(sidVal == LabelSSTRecord.sid){
            LabelSSTRecord lrec = (LabelSSTRecord) record;
            if(lrec.getColumn() < (rowRecLen[lrec.getRow()] - 1)){
//using array to determine the max columns for comparison
                System.out.print(sstrec.getString(lrec.getSSTIndex()) +
"; ");
            }
            else{
                System.out.print(sstrec.getString(lrec.getSSTIndex()) +
";\n");  //if it is the last column start a new line.
            }
            
        }
        
        if(sidVal == NumberRecord.sid){
            NumberRecord nrec = (NumberRecord) record;
            if(nrec.getColumn() < rowRecLen[nrec.getRow()] -1){
                System.out.print(nrec.getValue() + "; ");
            }
            else{
                System.out.print(nrec.getValue() + ";\n");
            }
        }
    }
    
     public static void main(String[] args) throws IOException,
FileNotFoundException
     {  
        POIFSFileSystem wbook = new POIFSFileSystem(new
FileInputStream("c:\\test.xls"));
        InputStream docIn = wbook.createDocumentInputStream("Workbook");
//if the file has a 'read only recommendation' this will fail.
        PoiXLStest start = new PoiXLStest(docIn);
        
        docIn.close();
        
        System.out.println("\n**********  Finished Processing File
***********");
        
     }
}    
[END CODE]

RE: problem with HSSF eventusermodel

Posted by Levi Strope <ls...@besttransport.com>.
Thank you for the quick response.

I can achieve very close to what I want if I add a record listener for
blank record types.  This is preferable because I need to reproduce the
data in a delimited format for insertion into a Topic, so my output now
looks like this:

SHID; Carrier; Origin City; Origin County; Origin State; Origin Zip;
Destination City; Destination County; Destination State;
2267.0; ADTS; ; ; IL; 600; ; ; MI;
2267.0; ADTS; ; ; IL; 600; ; ; MI;
2267.0; ADTS; ; ; IL; 600; ; ; ;
2267.0; ADTS; ; ; IL; 600; ; ; MI;
2267.0; ADTS; ; ; IL; 600; ; ; MI;
2267.0; ADTS; ; ; IL; 600; ; ; MI;
; ; ; ; ; ; ; 

This is great!!!  It gives me what I need and it is pretty clean.
However, I'd like to draw your attention to the last line.  In this
situation the last line is a group of cells that I applied a format but
no data to.

I agree with you, the XLS format is <insert another word for horrible>.
I do think that pre-processing the excel file as another format is
prefered, but it adds an extra step with a lot of overhead.  We are
trying to implement a solution using poi as part of our asynchronus
messaging system, as such if there is any way to limit the amount of
overhead or synchronus processing of the file that would be great.

If I can just find a way to not to process that last row(s) I'll have a
good working solution.

-Levi  
 

-----Original Message-----
From: Andrew C. Oliver [mailto:acoliver@buni.org] 
Sent: Monday, April 16, 2007 4:20 PM
To: POI Users List
Subject: Re: problem with HSSF eventusermodel

Theoretically col and row records can appear anywhere in that section
out of order (but don't when coming from excel).  You have to either
write them in some kind of better format to disk -- such as a temporary
db file -- and then read that (to avoid storing it all in memory) in a
guaranteed sort...  or keep it in memory in a map/list structure of some
sort.  Which gets you back into the memory problem.  XLS format BLOWS!

-Andy

Levi Strope wrote:
> I'm trying to read in an XLS and output the contents of each row 
> exactly as they are read in by the file.
>  
> I have to do this with the evenusermodel because of memory 
> limitations, however it is proving to be very difficult.
>  
> In short, my problem is I need to keep track of how many columns there

> are in a row, so that I can start a new line for the next row of data.
> All I am doing is outputting this to the console.  If I use 
> getLastCol() it returns the correct number, BUT if the row has 1 less 
> records there is no way for me to tell it to go to the next row and my

> ouput of the next row bleeds into the prior row.
>  
> Here is my output:
>  
> run:
> SHID; Carrier; Origin City; Origin County; Origin State; Origin Zip; 
> Destination City; Destination County; Destination State; 2267.0; ADTS;

> IL; 600; MI; 2267.0; ADTS; IL; 600; MI; 2267.0; ADTS; IL; 600; 2267.0;

> ADTS; IL; 600; MI; 2267.0; ADTS; IL; 600; MI; 2267.0; ADTS; IL; 600; 
> MI;
>  
> **********  Finished Processing File  ***********
>
> As you can see, the 3rd row is longer than the rest.  The third row 
> does not contain information in the last column, and so the next row 
> starts there when it should be on a new line.  This would seem simple 
> to overcome but I'm finding it difficult to do so within the 
> eventusermodel.
>  
> I am not asking for anyone to give me code.  If anyone has ideas on 
> how I could get around this I would love to hear them.
>  
> Here is my code:
>  
> [BEGIN CODE]
> import org.apache.poi.poifs.filesystem.*;
> import org.apache.poi.hssf.eventusermodel.*;
> import org.apache.poi.hssf.record.*;
> import java.io.*;
>  
> /**
>  *
>  * @author lstrope
>  */
> public class PoiXLStest implements HSSFListener {
>     private SSTRecord sstrec;
>     int rowRecLen[] = new int[RowRecord.MAX_ROW_NUMBER];
>     int rowNum = 0;
>  
>     
>     public PoiXLStest()
>     {
>     }
>     
>     public PoiXLStest(InputStream in)
>     {   
>         PoiXLStest noargs = new PoiXLStest();
>         HSSFRequest req = new HSSFRequest();
>         req.addListener(noargs, SSTRecord.sid);
>         req.addListener(noargs, LabelSSTRecord.sid);
>         req.addListener(noargs, RowRecord.sid);
>         req.addListener(noargs, NumberRecord.sid);
>         HSSFEventFactory factory = new HSSFEventFactory();
>        
>         try{
>             
>             factory.processEvents(req, in);
>             
>         }
>         catch(IOException E){
>             System.out.println("Problem in constructor");
>         }
>     }
>     
>     public void processRecord(Record record)  
>     {
>         short sidVal;
>         sidVal = record.getSid();
>         
>         if(sidVal == RowRecord.sid){
>             RowRecord rowRec = (RowRecord) record;
>             rowRecLen[rowRec.getRowNumber()] = rowRec.getLastCol();  
> // Setting array to hold the row at it's physical position with its 
> Column length.
>         }
>         
>         if(sidVal == SSTRecord.sid){
>             sstrec = (SSTRecord) record;
>         }
>         
>         if(sidVal == LabelSSTRecord.sid){
>             LabelSSTRecord lrec = (LabelSSTRecord) record;
>             if(lrec.getColumn() < (rowRecLen[lrec.getRow()] - 1)){ 
> //using array to determine the max columns for comparison
>                 System.out.print(sstrec.getString(lrec.getSSTIndex()) 
> + "; ");
>             }
>             else{
>                 System.out.print(sstrec.getString(lrec.getSSTIndex()) 
> + ";\n");  //if it is the last column start a new line.
>             }
>             
>         }
>         
>         if(sidVal == NumberRecord.sid){
>             NumberRecord nrec = (NumberRecord) record;
>             if(nrec.getColumn() < rowRecLen[nrec.getRow()] -1){
>                 System.out.print(nrec.getValue() + "; ");
>             }
>             else{
>                 System.out.print(nrec.getValue() + ";\n");
>             }
>         }
>     }
>     
>      public static void main(String[] args) throws IOException, 
> FileNotFoundException
>      {  
>         POIFSFileSystem wbook = new POIFSFileSystem(new 
> FileInputStream("c:\\test.xls"));
>         InputStream docIn = 
> wbook.createDocumentInputStream("Workbook");
> //if the file has a 'read only recommendation' this will fail.
>         PoiXLStest start = new PoiXLStest(docIn);
>         
>         docIn.close();
>         
>         System.out.println("\n**********  Finished Processing File 
> ***********");
>         
>      }
> }    
> [END CODE]
>
>   
> ----------------------------------------------------------------------
> --
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


--
>From Windows/Exchange to Linux/Meldware
Buni Meldware Communication Suite
Email, Calendaring, ease of configuration/administration http://buni.org


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: problem with HSSF eventusermodel

Posted by "Andrew C. Oliver" <ac...@buni.org>.
Theoretically col and row records can appear anywhere in that section 
out of order (but don't when coming from excel).  You have to either 
write them in some kind of better format to disk -- such as a temporary 
db file -- and then read that (to avoid storing it all in memory) in a 
guaranteed sort...  or keep it in memory in a map/list structure of some 
sort.  Which gets you back into the memory problem.  XLS format BLOWS!

-Andy

Levi Strope wrote:
> I'm trying to read in an XLS and output the contents of each row exactly
> as they are read in by the file.
>  
> I have to do this with the evenusermodel because of memory limitations,
> however it is proving to be very difficult.
>  
> In short, my problem is I need to keep track of how many columns there
> are in a row, so that I can start a new line for the next row of data.
> All I am doing is outputting this to the console.  If I use getLastCol()
> it returns the correct number, BUT if the row has 1 less records there
> is no way for me to tell it to go to the next row and my ouput of the
> next row bleeds into the prior row.
>  
> Here is my output:
>  
> run:
> SHID; Carrier; Origin City; Origin County; Origin State; Origin Zip;
> Destination City; Destination County; Destination State;
> 2267.0; ADTS; IL; 600; MI;
> 2267.0; ADTS; IL; 600; MI;
> 2267.0; ADTS; IL; 600; 2267.0; ADTS; IL; 600; MI;
> 2267.0; ADTS; IL; 600; MI;
> 2267.0; ADTS; IL; 600; MI;
>  
> **********  Finished Processing File  ***********
>
> As you can see, the 3rd row is longer than the rest.  The third row does
> not contain information in the last column, and so the next row starts
> there when it should be on a new line.  This would seem simple to
> overcome but I'm finding it difficult to do so within the
> eventusermodel.
>  
> I am not asking for anyone to give me code.  If anyone has ideas on how
> I could get around this I would love to hear them.
>  
> Here is my code:
>  
> [BEGIN CODE]
> import org.apache.poi.poifs.filesystem.*;
> import org.apache.poi.hssf.eventusermodel.*;
> import org.apache.poi.hssf.record.*;
> import java.io.*;
>  
> /**
>  *
>  * @author lstrope
>  */
> public class PoiXLStest implements HSSFListener {
>     private SSTRecord sstrec;
>     int rowRecLen[] = new int[RowRecord.MAX_ROW_NUMBER];
>     int rowNum = 0;
>  
>     
>     public PoiXLStest()
>     {
>     }
>     
>     public PoiXLStest(InputStream in)
>     {   
>         PoiXLStest noargs = new PoiXLStest();
>         HSSFRequest req = new HSSFRequest();
>         req.addListener(noargs, SSTRecord.sid);
>         req.addListener(noargs, LabelSSTRecord.sid);
>         req.addListener(noargs, RowRecord.sid);
>         req.addListener(noargs, NumberRecord.sid);
>         HSSFEventFactory factory = new HSSFEventFactory();
>        
>         try{
>             
>             factory.processEvents(req, in);
>             
>         }
>         catch(IOException E){
>             System.out.println("Problem in constructor");
>         }
>     }
>     
>     public void processRecord(Record record)  
>     {
>         short sidVal;
>         sidVal = record.getSid();
>         
>         if(sidVal == RowRecord.sid){
>             RowRecord rowRec = (RowRecord) record;
>             rowRecLen[rowRec.getRowNumber()] = rowRec.getLastCol();  //
> Setting array to hold the row at it's physical position with its Column
> length.
>         }
>         
>         if(sidVal == SSTRecord.sid){
>             sstrec = (SSTRecord) record;
>         }
>         
>         if(sidVal == LabelSSTRecord.sid){
>             LabelSSTRecord lrec = (LabelSSTRecord) record;
>             if(lrec.getColumn() < (rowRecLen[lrec.getRow()] - 1)){
> //using array to determine the max columns for comparison
>                 System.out.print(sstrec.getString(lrec.getSSTIndex()) +
> "; ");
>             }
>             else{
>                 System.out.print(sstrec.getString(lrec.getSSTIndex()) +
> ";\n");  //if it is the last column start a new line.
>             }
>             
>         }
>         
>         if(sidVal == NumberRecord.sid){
>             NumberRecord nrec = (NumberRecord) record;
>             if(nrec.getColumn() < rowRecLen[nrec.getRow()] -1){
>                 System.out.print(nrec.getValue() + "; ");
>             }
>             else{
>                 System.out.print(nrec.getValue() + ";\n");
>             }
>         }
>     }
>     
>      public static void main(String[] args) throws IOException,
> FileNotFoundException
>      {  
>         POIFSFileSystem wbook = new POIFSFileSystem(new
> FileInputStream("c:\\test.xls"));
>         InputStream docIn = wbook.createDocumentInputStream("Workbook");
> //if the file has a 'read only recommendation' this will fail.
>         PoiXLStest start = new PoiXLStest(docIn);
>         
>         docIn.close();
>         
>         System.out.println("\n**********  Finished Processing File
> ***********");
>         
>      }
> }    
> [END CODE]
>
>   
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


-- 
>From Windows/Exchange to Linux/Meldware
Buni Meldware Communication Suite
Email, Calendaring, ease of configuration/administration
http://buni.org


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/