You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by victorbelo <vi...@hotmail.com> on 2007/11/03 02:36:24 UTC

Breaking up workbook in byte chunks

I'm using POI in a reporting application, which several users can produce
large workbooks (30 columns with 2 worksheet each with up to 60,000 rows).
I'm currently chopping the output significantly due to memory limitations. 
I need to use POI (or JExcel) to produce chunks of a workbook at a time
(let's say 5MB at a time) in order to limit memory usage and avoid
OutOfMemory errors. These chunks would be saved to the database into
multiple blobs, and later streamed back piece by piece when the workbook is
downloaded. My first thought was to use the getBytes method from the
workbook class, save to a blob when > 5MB, then create a new workbook, and
repeat the process. I'm not sure if all pieces streamed together would still
able to produce a workbook.

I checked several options in the API, but I was not able to find any other
solutions. Anyone has ever implemented something similar? Any good ideas to
accomplish this with POI?
-- 
View this message in context: http://www.nabble.com/Breaking-up-workbook-in-byte-chunks-tf4741619.html#a13559094
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Breaking up workbook in byte chunks

Posted by Anthony Andrews <py...@yahoo.com>.
I could very well be wrong but I do not think that you can build a workbook incrementally as it where. As I understand the API, it is necessary to construct the entire workbook in memory and to then call the write method once this has been accomplished.

One further thought did occur to me however, but I believe it would require a lot of extra work and I am not at all sure it would even work. It would seem logical to use Java's object serialization feature; for example, you could create each row/cell and serialize that object. Then when the write method was called, it could de-serialize each row/cell and write it away. However, the write method as provided will not do anything like this sort of thing and I would guess that you would be looking at creating your own classes to wrap HSSF's own and offer this sort of functionality. Even then, it may still be the case that you run into memory issues at write time as the complete worksheet will still have to be built in memory I guess. So, on second thoughts, discard this idea - sorry, it is quite early on a Sunday morning here.

Just as a final thought, which version(s) of Excel do you have to support? Are you lucky enough to just be supporting the latest version? If so, could you confine yourself to parsing the xml version of the spreadsheet file rather that the .xls version. Finally, is there anyway to increase the amount of memory available to Websphere? I have been forced to use the xmx and xms switches before. 

victorbelo <vi...@hotmail.com> wrote: 
First of all, thanks for your reply.
My main issue is to control memory usage. Let's say, if I try to create a 30
columns 25,000 rows Excel workbook with POI, my Websphere server crashes
with an OutofMemory error. I tried to create a workbook using a
FileOutputStream but the results are the same. Using JExcel, I can go up to
50,000 rows. I need to find a way to reduce the amount of memory used by POI
without cutting down the size of the workbook. Either writing it to disk to
release memory after a certain number of rows, or something else. However,
when I tried to write the workbook to a FileOutputStream after few thousand
rows, I still ran out of memory. How can I release the memory used by a POI
workbook? I know I cannot control how the JVM will claim more memory, but I
need to at least keep writing parts of the workbook to the disk or database,
and release the data for the garbage collector.    

As far as saving the workbook to the database, I could break up the workbook
data bytes into pieces, and zip them if needed. 

Any ideas would be really appreciated! Thanks a lot guys!

Thanks.

Anthony Andrews wrote:
> 
> Are you saying that you want to read one large workbook and separate it
> into a series of smaller workbooks such that each of these smaller
> workbooks is a 'valid' file from Excel's point of view - i.e. that each
> smaller file can be opened successfully using Excel?
> 
> If the answer is yes then it may be worthwhile your looking to use POI's
> alternative architecture - perhaps using the eventmodel API to scavenge
> data from the large file and then pass this to another class that will
> create and save away a 'smaller' .xls file. Take advantage of Java's
> multi-threading capabilities and this may solve the problem.
> 
> If the answer to the original question was 'no' then what is to prevent
> you from using a file archive tool such as PKZip or WINRar to 'chop' the
> large file into smaller chunks? Java itself includes classes to read and
> write zip files and this process would divorce you from needing to use an
> additional API to parse the Excel files. I do not know if zip files can be
> stored into a database as a binary large object but I would guess that it
> is possible. You would need somhow to keep track of all of the chunks that
> would be required to re-create the original file and simply use the zip
> tool to re-assemble them into a large .xls file that could then be
> streamed to the user.
> 
> victorbelo  wrote: 
> I'm using POI in a reporting application, which several users can produce
> large workbooks (30 columns with 2 worksheet each with up to 60,000 rows).
> I'm currently chopping the output significantly due to memory limitations. 
> I need to use POI (or JExcel) to produce chunks of a workbook at a time
> (let's say 5MB at a time) in order to limit memory usage and avoid
> OutOfMemory errors. These chunks would be saved to the database into
> multiple blobs, and later streamed back piece by piece when the workbook
> is
> downloaded. My first thought was to use the getBytes method from the
> workbook class, save to a blob when > 5MB, then create a new workbook, and
> repeat the process. I'm not sure if all pieces streamed together would
> still
> able to produce a workbook.
> 
> I checked several options in the API, but I was not able to find any other
> solutions. Anyone has ever implemented something similar? Any good ideas
> to
> accomplish this with POI?
> -- 
> View this message in context:
> http://www.nabble.com/Breaking-up-workbook-in-byte-chunks-tf4741619.html#a13559094
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 

-- 
View this message in context: http://www.nabble.com/Breaking-up-workbook-in-byte-chunks-tf4741619.html#a13566821
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Breaking up workbook in byte chunks

Posted by victorbelo <vi...@hotmail.com>.
First of all, thanks for your reply.
My main issue is to control memory usage. Let's say, if I try to create a 30
columns 25,000 rows Excel workbook with POI, my Websphere server crashes
with an OutofMemory error. I tried to create a workbook using a
FileOutputStream but the results are the same. Using JExcel, I can go up to
50,000 rows. I need to find a way to reduce the amount of memory used by POI
without cutting down the size of the workbook. Either writing it to disk to
release memory after a certain number of rows, or something else. However,
when I tried to write the workbook to a FileOutputStream after few thousand
rows, I still ran out of memory. How can I release the memory used by a POI
workbook? I know I cannot control how the JVM will claim more memory, but I
need to at least keep writing parts of the workbook to the disk or database,
and release the data for the garbage collector.    

As far as saving the workbook to the database, I could break up the workbook
data bytes into pieces, and zip them if needed. 

Any ideas would be really appreciated! Thanks a lot guys!

Thanks.

Anthony Andrews wrote:
> 
> Are you saying that you want to read one large workbook and separate it
> into a series of smaller workbooks such that each of these smaller
> workbooks is a 'valid' file from Excel's point of view - i.e. that each
> smaller file can be opened successfully using Excel?
> 
> If the answer is yes then it may be worthwhile your looking to use POI's
> alternative architecture - perhaps using the eventmodel API to scavenge
> data from the large file and then pass this to another class that will
> create and save away a 'smaller' .xls file. Take advantage of Java's
> multi-threading capabilities and this may solve the problem.
> 
> If the answer to the original question was 'no' then what is to prevent
> you from using a file archive tool such as PKZip or WINRar to 'chop' the
> large file into smaller chunks? Java itself includes classes to read and
> write zip files and this process would divorce you from needing to use an
> additional API to parse the Excel files. I do not know if zip files can be
> stored into a database as a binary large object but I would guess that it
> is possible. You would need somhow to keep track of all of the chunks that
> would be required to re-create the original file and simply use the zip
> tool to re-assemble them into a large .xls file that could then be
> streamed to the user.
> 
> victorbelo <vi...@hotmail.com> wrote: 
> I'm using POI in a reporting application, which several users can produce
> large workbooks (30 columns with 2 worksheet each with up to 60,000 rows).
> I'm currently chopping the output significantly due to memory limitations. 
> I need to use POI (or JExcel) to produce chunks of a workbook at a time
> (let's say 5MB at a time) in order to limit memory usage and avoid
> OutOfMemory errors. These chunks would be saved to the database into
> multiple blobs, and later streamed back piece by piece when the workbook
> is
> downloaded. My first thought was to use the getBytes method from the
> workbook class, save to a blob when > 5MB, then create a new workbook, and
> repeat the process. I'm not sure if all pieces streamed together would
> still
> able to produce a workbook.
> 
> I checked several options in the API, but I was not able to find any other
> solutions. Anyone has ever implemented something similar? Any good ideas
> to
> accomplish this with POI?
> -- 
> View this message in context:
> http://www.nabble.com/Breaking-up-workbook-in-byte-chunks-tf4741619.html#a13559094
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 

-- 
View this message in context: http://www.nabble.com/Breaking-up-workbook-in-byte-chunks-tf4741619.html#a13566821
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Breaking up workbook in byte chunks

Posted by Anthony Andrews <py...@yahoo.com>.
Are you saying that you want to read one large workbook and separate it into a series of smaller workbooks such that each of these smaller workbooks is a 'valid' file from Excel's point of view - i.e. that each smaller file can be opened successfully using Excel?

If the answer is yes then it may be worthwhile your looking to use POI's alternative architecture - perhaps using the eventmodel API to scavenge data from the large file and then pass this to another class that will create and save away a 'smaller' .xls file. Take advantage of Java's multi-threading capabilities and this may solve the problem.

If the answer to the original question was 'no' then what is to prevent you from using a file archive tool such as PKZip or WINRar to 'chop' the large file into smaller chunks? Java itself includes classes to read and write zip files and this process would divorce you from needing to use an additional API to parse the Excel files. I do not know if zip files can be stored into a database as a binary large object but I would guess that it is possible. You would need somhow to keep track of all of the chunks that would be required to re-create the original file and simply use the zip tool to re-assemble them into a large .xls file that could then be streamed to the user.

victorbelo <vi...@hotmail.com> wrote: 
I'm using POI in a reporting application, which several users can produce
large workbooks (30 columns with 2 worksheet each with up to 60,000 rows).
I'm currently chopping the output significantly due to memory limitations. 
I need to use POI (or JExcel) to produce chunks of a workbook at a time
(let's say 5MB at a time) in order to limit memory usage and avoid
OutOfMemory errors. These chunks would be saved to the database into
multiple blobs, and later streamed back piece by piece when the workbook is
downloaded. My first thought was to use the getBytes method from the
workbook class, save to a blob when > 5MB, then create a new workbook, and
repeat the process. I'm not sure if all pieces streamed together would still
able to produce a workbook.

I checked several options in the API, but I was not able to find any other
solutions. Anyone has ever implemented something similar? Any good ideas to
accomplish this with POI?
-- 
View this message in context: http://www.nabble.com/Breaking-up-workbook-in-byte-chunks-tf4741619.html#a13559094
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com