You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Andy Ch <an...@hotmail.com> on 2008/03/20 17:15:01 UTC

identify excel sheet in eventusermodel

hi, new to this forum here
i am reading a large excel file which the usermodel does not support (got
the infamous out of heap memory) which forces me to use the eventusermodel.
after digging the api, and example i find no way of identifying which sheet
the record come from (even though the row/col can be identified by getters
supplied).

for example of eventusermodel  http://poi.apache.org/hssf/how-to.html
http://poi.apache.org/hssf/how-to.html 

for apache poi api  http://poi.apache.org/apidocs/index.html
http://poi.apache.org/apidocs/index.html 

thanks for reading and replying, appreciate it


p.s. i tried dumping the sheet records and data record out thinking that the
eventusermodel api it would process the records in pecific order that allows
identification of the sheet. but apparently there is no order to what gets
processed first (easy to tell because all the sheet records gets dumped
first).
-- 
View this message in context: http://www.nabble.com/identify-excel-sheet-in-eventusermodel-tp16182268p16182268.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: identify excel sheet in eventusermodel

Posted by rr...@us.imshealth.com.
I believe that increase the heap for Eclipse, not for the JVM that
Eclipse runs when running/debugging a program.  Depending on how your
launching your app, you may need to specify it elsewhere.  For example,
in the Run or Debug dialog, add the -Xmx option to the VM Args field.

-Russ

-----Original Message-----
From: Andy Ch [mailto:andy_chien@hotmail.com] 
Sent: Thursday, March 20, 2008 2:34 PM
To: user@poi.apache.org
Subject: Re: identify excel sheet in eventusermodel


> Are you aware that you can increase the heap size java uses? Just pass

> -Xmx to java (java -X gives you full details)

i use eclipse as IDE attmpeted to play with 2 memory settings below,
given maxmium i can (512mb and 1g) the excel file i am trying to open is
about 25mb. is this normal? to put in context here's a link to explain
what the following command does. it should be the same with what you
proposed
http://wiki.eclipse.org/FAQ_How_do_I_increase_the_heap_size_available_to
_Eclipse%3F
http://wiki.eclipse.org/FAQ_How_do_I_increase_the_heap_size_available_to
_Eclipse%3F
eclipse -vmargs -XX:MaxPermSize=<memory size> eclipse -vmargs
-Xmx<memory size>


> If memory serves correct, you need to keep track of the sheet start 
> records as they go past. All the cells for a given sheet go together,
so 
> they'll belong to the most recent sheet you've seen

could you be more specific at what you mean by "sheet start record", i
dumped out "BoundSheetRecord" however these appears to be the indices
only
and they get dumped out first before any of the data cell did. which
makes
it not possible to infer sheet name of a given cell based on the
sequence
they are listed. also i reviewed the api (specifically
org.apache.poi.hssf.record ) and did not find the sheet start record
similar
to what you proposed. (dump code is the same as on the poi example page
posted earlier). 

api here  http://poi.apache.org/apidocs/index.html
http://poi.apache.org/apidocs/index.html 

dump sample
=================
New sheet named: Sheet1
New sheet named: Sheet2
New sheet named: Sheet3
New sheet named: Sheet4
String table value 0 = Value1
String table value 1 = Value2
String table value 2 = Value3
String table value 3 = Value4
-- 
View this message in context:
http://www.nabble.com/identify-excel-sheet-in-eventusermodel-tp16182268p
16185818.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: identify excel sheet in eventusermodel

Posted by Nick Burch <ni...@torchbox.com>.
On Thu, 20 Mar 2008, Andy Ch wrote:
> > If memory serves correct, you need to keep track of the sheet start
> > records as they go past. All the cells for a given sheet go together,
> > so they'll belong to the most recent sheet you've seen
>
> could you be more specific at what you mean by "sheet start record", i
> dumped out "BoundSheetRecord" however these appears to be the indices
> only and they get dumped out first before any of the data cell did.

Sounds about right. The metadata on the sheet gets stored in a different
bit of the record stream to the sheet's data

If memory serves, you get an EOFRecord then a BOFRecord between each
sheet's data. (You get them in a few other places too, but if you get one
then a little while later some cells, it's one sheet, then another and
some cells is the next)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: identify excel sheet in eventusermodel

Posted by Andy Ch <an...@hotmail.com>.
> Are you aware that you can increase the heap size java uses? Just pass 
> -Xmx to java (java -X gives you full details)

i use eclipse as IDE attmpeted to play with 2 memory settings below, given
maxmium i can (512mb and 1g) the excel file i am trying to open is about
25mb. is this normal? to put in context here's a link to explain what the
following command does. it should be the same with what you proposed
http://wiki.eclipse.org/FAQ_How_do_I_increase_the_heap_size_available_to_Eclipse%3F
http://wiki.eclipse.org/FAQ_How_do_I_increase_the_heap_size_available_to_Eclipse%3F 
eclipse -vmargs -XX:MaxPermSize=<memory size>
eclipse -vmargs -Xmx<memory size>


> If memory serves correct, you need to keep track of the sheet start 
> records as they go past. All the cells for a given sheet go together, so 
> they'll belong to the most recent sheet you've seen

could you be more specific at what you mean by "sheet start record", i
dumped out "BoundSheetRecord" however these appears to be the indices only
and they get dumped out first before any of the data cell did. which makes
it not possible to infer sheet name of a given cell based on the sequence
they are listed. also i reviewed the api (specifically
org.apache.poi.hssf.record ) and did not find the sheet start record similar
to what you proposed. (dump code is the same as on the poi example page
posted earlier). 

api here  http://poi.apache.org/apidocs/index.html
http://poi.apache.org/apidocs/index.html 

dump sample
=================
New sheet named: Sheet1
New sheet named: Sheet2
New sheet named: Sheet3
New sheet named: Sheet4
String table value 0 = Value1
String table value 1 = Value2
String table value 2 = Value3
String table value 3 = Value4
-- 
View this message in context: http://www.nabble.com/identify-excel-sheet-in-eventusermodel-tp16182268p16185818.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: identify excel sheet in eventusermodel

Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 24 Mar 2008, Andy Ch wrote:
> BoundSheetRecord does have method that gets the BOFRecord offset, if i 
> obtain the offset of each sheet, then sort the offset. can i safely 
> assume that the BOFRecord i encountered would be in that sequential 
> order as the sorted (ascending) offset?

That should be fine. The other option is to track the byte count as you 
make your way along, and when you get to the offset specified in the 
BoundSheetRecord, you're there. However, sorting by the offset then 
matching that way seems to me as likely to be ok.

(Or, the other way around - scan along to the BOFRecord tracking the bytes 
so far. When you find one, loop over the BoundSheetRecords until you get 
to one that specifies the offset you're at, then you know what sheet you 
have)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: identify excel sheet in eventusermodel

Posted by Andy Ch <an...@hotmail.com>.

Nick Burch wrote:
> 
> If memory serves correct, you need to keep track of the sheet start 
> records as they go past. All the cells for a given sheet go together, so 
> they'll belong to the most recent sheet you've seen
> 
> If memory serves, you get an EOFRecord then a BOFRecord between each
> sheet's data. (You get them in a few other places too, but if you get one
> then a little while later some cells, it's one sheet, then another and
> some cells is the next)
> 

quick thanks to nick and russ. i scanned the accessor methods (getters) of
BOFRecord and EOFRecord as well as the excel field specification. EOFRecord
and BOFRecord does helps differentiate between sheet but i cannot find a way
to associate sheet name to the corresponding BOFRecord.

BoundSheetRecord does have method that gets the BOFRecord offset, if i
obtain the offset of each sheet, then sort the offset. can i safely assume
that the BOFRecord i encountered would be in that sequential order as the
sorted (ascending) offset? is this how i keep track of sheets in
eventusermodel? my reading from poi sorce suggest this way it would work but
i'd like some confirmation because this is a tiny bit different from the
earlier proposal (which almost seems to suggest that metadata can be
obtained from  the BOF/EOFRecord). thanks.
-- 
View this message in context: http://www.nabble.com/identify-excel-sheet-in-eventusermodel-tp16182268p16259943.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: identify excel sheet in eventusermodel

Posted by Nick Burch <ni...@torchbox.com>.
On Thu, 20 Mar 2008, Andy Ch wrote:
> i am reading a large excel file which the usermodel does not support 
> (got the infamous out of heap memory)

Are you aware that you can increase the heap size java uses? Just pass 
-Xmx to java (java -X gives you full details)

> which forces me to use the eventusermodel. after digging the api, and 
> example i find no way of identifying which sheet the record come from 
> (even though the row/col can be identified by getters supplied).

If memory serves correct, you need to keep track of the sheet start 
records as they go past. All the cells for a given sheet go together, so 
they'll belong to the most recent sheet you've seen

(Sheet names etc are defined seperately from where their data comes)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org