You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@poi.apache.org by no...@kaigrabfelder.de on 2016/05/10 16:59:27 UTC

How to "clone" values of a workbook

Hi there,

because the creation of workbook instances is pretty costly (about 200ms 
to 15 seconds for the workbooks we are dealing with) we implemented a 
pooling of workbook instances which seems to work pretty well.

Since we are only performing limited write-operations on the workbook 
instances (basically only calling Cell.setCellType() and 
Cell.setCellValue()) we need a way of "cloning" the original workbook 
values when retrieving a workbook instance from the pool, remembering 
them and "reseting" the workbook instance when returning it to the pool 
by setting all those remembered values to the workbook instance.

I'm wondering if there is already a functionality like this in poi or if 
somebody already did sth. similar? Otherwhise I'll have to write sth. by 
my own (which shouldn't be too hard) - but I thought that I'll ask 
anyway.

Performing a deep clone of the complete workbook is not an option as 
this would make the whole pooling idea pointless.

Best Regards

Kai

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: How to "clone" values of a workbook

Posted by Kai Grabfelder <no...@kaigrabfelder.de>.

Hi Blake,

thank you - so I'm not the only one doing such crazy things ;-). With deep clone I meant writing the workbook
to some stream and reading it again). And you are right if there would be a (not resource intensive) clone I
could you cache one instance and clone it after loading from cache.

Best Regards

Kai

Blake Watson schrieb am 10.05.16 um 22:18:
>>>because the creation of workbook instances is pretty costly (about 200ms
> to 15 seconds for the workbooks we are dealing with) we implemented a
> pooling of workbook instances which seems to work pretty well.<<
> 
> This is what I did, as well.
> 
>>>Since we are only performing limited write-operations on the workbook
> instances (basically only calling Cell.setCellType() and
> Cell.setCellValue()) we need a way of "cloning" the original workbook
> values when retrieving a workbook instance from the pool, remembering them
> and "reseting" the workbook instance when returning it to the pool by
> setting all those remembered values to the workbook instance.<<
> 
> I, too, ran into this situation.
> 
>>>I'm wondering if there is already a functionality like this in poi or if
> somebody already did sth. similar? Otherwhise I'll have to write sth. by my
> own (which shouldn't be too hard) - but I thought that I'll ask anyway.<<
> 
> I don't think there is. In my case, only certain values are settable
> (specifically, named ranges that are not formulae), so I capture those
> values when I load the spreadsheet and "reset" them when I've gotten the
> results I wanted. So it's: set-all-inputs, recalculate as needed,
> retrieve-all-outputs, reset-all-inputs.
> 
>>>Performing a deep clone of the complete workbook is not an option as this
> would make the whole pooling idea pointless.
> 
> Ah, but if you a deep clone feature, you wouldn't need to pool (presuming
> cloning was not as resource intensive as loading a workbook is).
> \u200b
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: How to "clone" values of a workbook

Posted by Blake Watson <bl...@pnmac.com>.

>>because the creation of workbook instances is pretty costly (about 200ms
to 15 seconds for the workbooks we are dealing with) we implemented a
pooling of workbook instances which seems to work pretty well.<<

This is what I did, as well.

>>Since we are only performing limited write-operations on the workbook
instances (basically only calling Cell.setCellType() and
Cell.setCellValue()) we need a way of "cloning" the original workbook
values when retrieving a workbook instance from the pool, remembering them
and "reseting" the workbook instance when returning it to the pool by
setting all those remembered values to the workbook instance.<<

I, too, ran into this situation.

>>I'm wondering if there is already a functionality like this in poi or if
somebody already did sth. similar? Otherwhise I'll have to write sth. by my
own (which shouldn't be too hard) - but I thought that I'll ask anyway.<<

I don't think there is. In my case, only certain values are settable
(specifically, named ranges that are not formulae), so I capture those
values when I load the spreadsheet and "reset" them when I've gotten the
results I wanted. So it's: set-all-inputs, recalculate as needed,
retrieve-all-outputs, reset-all-inputs.

>>Performing a deep clone of the complete workbook is not an option as this
would make the whole pooling idea pointless.

Ah, but if you a deep clone feature, you wouldn't need to pool (presuming
cloning was not as resource intensive as loading a workbook is).

Re: How to "clone" values of a workbook

Posted by Kai Grabfelder <no...@kaigrabfelder.de>.

Hi Mark,

we are generating a WEB UI (tables) based on data in the calculated workbook. Basically we are iterating over
all cells in all rows of all sheets and transforming them to HTML.

We did some performance analysis and the bad part was always the instantiation of the workbook - depending on
the formulas used in the workbook (e.g. workbooks using VLOOKUP are pretty bad for example).

I like your idea of not reusing the instances but just creating new instances (in advance) - we'll have to
discuss that - thx ;-)

Best Regards

Kai

Murphy, Mark schrieb am 12.05.16 um 16:09:
> Hmm... I'm not real sure what you mean by "read the data from excel and transform them to an output format" are you using POI or an alternate tool to do that? It seems to me that this might be the bottleneck rather than the create the workbook part. But if "create a new instance of the workbook" is the problem, maybe you just need to preload some workbooks, and when you use one, spawn a thread to load another from the database. That way you don't have to go through the trouble of doing a reset, and the create happens before the request is made. I am saying this based on the assumption that your pooling solution eventually has all the documents loaded in the pool at least once. If this is the case you might as well preload them and avoid the long wait for the requester of the first document of each type.
> 
> -----Original Message-----
> From: Kai Grabfelder [mailto:nospam@kaigrabfelder.de] 
> Sent: Wednesday, May 11, 2016 6:05 PM
> To: POI Users List
> Subject: Re: How to "clone" values of a workbook
> 
> Hi Mark,
> 
> in our system we have a limited number of workbooks that we mainly use for calculation. The process withouth pooling looks like this
> 
> - create a new instance of the workbook (one of about 20 different workbook stored in the database)
> - fill in some data (entered by the user)
> - recalculate formula (incl. creating a new formula evaluator)
> - read the data from excel and transform them to an output format (the workbook in the database is not changed it's kind of a template and the main purpose of poi/excel is to perform calculations)
> 
> The creation of the workbook takes 200ms till 15 seconds which is too slow for the required use case (it's online processing in a web application).
> 
> The process with pooling looks like this:
> 
> - ask the pool to either acquire an instance of the workbook
> - the pool checks if there is an idle instance of the workbook available in the pool
> -- if not a new instance of the workbook (and the formula evaluator) is created and put in the pool
> -- if yes the idle instance is returned
> - fill in some data (entered by the user)
> - recalculate formula (without creating a new formula evaluator
> - read the data from excel and transform them to an output format
> - ==> "reset" the instance to it's state when acquired from the pool <== I'm currently thinking how to implement this - I guess I'll have to "remember" all changes to the workbook and "revert" them at this step.
> - return the instance to the pool
> 
> I think SXSSF is not an option as imho it doesn't support formula evaluation. In any way memory is not a big constraint here - the issue is the time that the system requires for instantiating the workbook instances.
> 
> Best Regards
> 
> Kai
> 
> Murphy, Mark schrieb am 11.05.16 um 16:19:
>> I am trying to understand just what it is you are trying to do
>> 
>>>> because the creation of workbook instances is pretty costly (about 200ms to 15 seconds for the workbooks we are dealing with) we implemented a pooling of workbook instances which seems to work pretty well.
>> 
>> What do you mean by this? How does it work?
>> 
>>>> Since we are only performing limited write-operations on the workbook instances (basically only calling Cell.setCellType() and Cell.setCellValue()) we need a way of "cloning" the original workbook values when retrieving a workbook instance from the pool, remembering them and "reseting" the workbook instance when returning it to the pool by setting all those remembered values to the workbook instance.
>> 
>> Are you holding a group of workbooks in memory, then using them as templates to write new workbooks with changed values? Or are you changing the values and saving it with a new file name?
>> 
>>>> I'm wondering if there is already a functionality like this in poi or if somebody already did sth. similar? Otherwhise I'll have to write sth. by my own (which shouldn't be too hard) - but I thought that I'll ask anyway.
>> 
>>>> Performing a deep clone of the complete workbook is not an option as this would make the whole pooling idea pointless.
>> 
>> It may be possible to write your new workbook using SXSSF which does not hold everything in memory so long. You can specify how many rows to keep in memory to help control the amount of memory used.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional 
>> commands, e-mail: user-help@poi.apache.org
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: How to "clone" values of a workbook

Posted by Blake Watson <bl...@pnmac.com>.

I am doing exactly what Kai is, with the specification that the user input
and output can come from web representations (or RESTfully) of the
spreadsheet, so POI isn't exactly involved in those. (I used the To-HTML
example from Ken Arnold—and what a treat to run into one of the fathers of
Rogue here!—as a jumping off point.) And because it's a web app, it has an
unknown and variable number of users. The page updates dynamically, as
well, so with each cell entered (or even each key pressed!), the input data
is sent to the server, the spreadsheet is populated, and the results sent
back. Each action is done on a "fresh" copy; there's no persistent state
for any given user.

For most of the spreadsheets, it doesn't really matter. They're under 100K
and it's a trivial thing to load, parse and process things of that size.
One of the spreadsheets is 4MB (taking up around 500MB in memory!) which
takes POI some time to parse—10 seconds!*—which is too much time to wait
for a recalc (which only starts after the 10s parsing). You can anticipate
the need, as you say, and preload, but it becomes a lot trickier (and
ultimately not much different from caching and pooling) with multiple users
or (as happens with a web interface) the user makes requests in short
order, such that the previous request may not have finished by the time the
next request comes in.

That's one reason a clone would be useful—presuming, of course, that
cloning would be much faster than loading and parsing.

*N.B. this spreadsheet will load nigh instantly in Excel, but on the other
hand it's a thing of beauty that we're loading up a massive data-intensive,
calculation heavy workbook without Excel.

RE: How to "clone" values of a workbook

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.

Hmm... I'm not real sure what you mean by "read the data from excel and transform them to an output format" are you using POI or an alternate tool to do that? It seems to me that this might be the bottleneck rather than the create the workbook part. But if "create a new instance of the workbook" is the problem, maybe you just need to preload some workbooks, and when you use one, spawn a thread to load another from the database. That way you don't have to go through the trouble of doing a reset, and the create happens before the request is made. I am saying this based on the assumption that your pooling solution eventually has all the documents loaded in the pool at least once. If this is the case you might as well preload them and avoid the long wait for the requester of the first document of each type.

-----Original Message-----
From: Kai Grabfelder [mailto:nospam@kaigrabfelder.de] 
Sent: Wednesday, May 11, 2016 6:05 PM
To: POI Users List
Subject: Re: How to "clone" values of a workbook

Hi Mark,

in our system we have a limited number of workbooks that we mainly use for calculation. The process withouth pooling looks like this

- create a new instance of the workbook (one of about 20 different workbook stored in the database)
- fill in some data (entered by the user)
- recalculate formula (incl. creating a new formula evaluator)
- read the data from excel and transform them to an output format (the workbook in the database is not changed it's kind of a template and the main purpose of poi/excel is to perform calculations)

The creation of the workbook takes 200ms till 15 seconds which is too slow for the required use case (it's online processing in a web application).

The process with pooling looks like this:

- ask the pool to either acquire an instance of the workbook
- the pool checks if there is an idle instance of the workbook available in the pool
-- if not a new instance of the workbook (and the formula evaluator) is created and put in the pool
-- if yes the idle instance is returned
- fill in some data (entered by the user)
- recalculate formula (without creating a new formula evaluator
- read the data from excel and transform them to an output format
- ==> "reset" the instance to it's state when acquired from the pool <== I'm currently thinking how to implement this - I guess I'll have to "remember" all changes to the workbook and "revert" them at this step.
- return the instance to the pool

I think SXSSF is not an option as imho it doesn't support formula evaluation. In any way memory is not a big constraint here - the issue is the time that the system requires for instantiating the workbook instances.

Best Regards

Kai

Murphy, Mark schrieb am 11.05.16 um 16:19:
> I am trying to understand just what it is you are trying to do
> 
>>> because the creation of workbook instances is pretty costly (about 200ms to 15 seconds for the workbooks we are dealing with) we implemented a pooling of workbook instances which seems to work pretty well.
> 
> What do you mean by this? How does it work?
> 
>>> Since we are only performing limited write-operations on the workbook instances (basically only calling Cell.setCellType() and Cell.setCellValue()) we need a way of "cloning" the original workbook values when retrieving a workbook instance from the pool, remembering them and "reseting" the workbook instance when returning it to the pool by setting all those remembered values to the workbook instance.
> 
> Are you holding a group of workbooks in memory, then using them as templates to write new workbooks with changed values? Or are you changing the values and saving it with a new file name?
> 
>>> I'm wondering if there is already a functionality like this in poi or if somebody already did sth. similar? Otherwhise I'll have to write sth. by my own (which shouldn't be too hard) - but I thought that I'll ask anyway.
> 
>>> Performing a deep clone of the complete workbook is not an option as this would make the whole pooling idea pointless.
> 
> It may be possible to write your new workbook using SXSSF which does not hold everything in memory so long. You can specify how many rows to keep in memory to help control the amount of memory used.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional 
> commands, e-mail: user-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: How to "clone" values of a workbook

Posted by Kai Grabfelder <no...@kaigrabfelder.de>.

Hi Mark,

in our system we have a limited number of workbooks that we mainly use for calculation. The process withouth
pooling looks like this

- create a new instance of the workbook (one of about 20 different workbook stored in the database)
- fill in some data (entered by the user)
- recalculate formula (incl. creating a new formula evaluator)
- read the data from excel and transform them to an output format (the workbook in the database is not changed
it's kind of a template and the main purpose of poi/excel is to perform calculations)

The creation of the workbook takes 200ms till 15 seconds which is too slow for the required use case (it's
online processing in a web application).

The process with pooling looks like this:

- ask the pool to either acquire an instance of the workbook
- the pool checks if there is an idle instance of the workbook available in the pool
-- if not a new instance of the workbook (and the formula evaluator) is created and put in the pool
-- if yes the idle instance is returned
- fill in some data (entered by the user)
- recalculate formula (without creating a new formula evaluator
- read the data from excel and transform them to an output format
- ==> "reset" the instance to it's state when acquired from the pool <== I'm currently thinking how to
implement this - I guess I'll have to "remember" all changes to the workbook and "revert" them at this step.
- return the instance to the pool

I think SXSSF is not an option as imho it doesn't support formula evaluation. In any way memory is not a big
constraint here - the issue is the time that the system requires for instantiating the workbook instances.

Best Regards

Kai

Murphy, Mark schrieb am 11.05.16 um 16:19:
> I am trying to understand just what it is you are trying to do
> 
>>> because the creation of workbook instances is pretty costly (about 200ms to 15 seconds for the workbooks we are dealing with) we implemented a pooling of workbook instances which seems to work pretty well.
> 
> What do you mean by this? How does it work?
> 
>>> Since we are only performing limited write-operations on the workbook instances (basically only calling Cell.setCellType() and Cell.setCellValue()) we need a way of "cloning" the original workbook values when retrieving a workbook instance from the pool, remembering them and "reseting" the workbook instance when returning it to the pool by setting all those remembered values to the workbook instance.
> 
> Are you holding a group of workbooks in memory, then using them as templates to write new workbooks with changed values? Or are you changing the values and saving it with a new file name?
> 
>>> I'm wondering if there is already a functionality like this in poi or if somebody already did sth. similar? Otherwhise I'll have to write sth. by my own (which shouldn't be too hard) - but I thought that I'll ask anyway.
> 
>>> Performing a deep clone of the complete workbook is not an option as this would make the whole pooling idea pointless.
> 
> It may be possible to write your new workbook using SXSSF which does not hold everything in memory so long. You can specify how many rows to keep in memory to help control the amount of memory used.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

RE: How to "clone" values of a workbook

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.

I am trying to understand just what it is you are trying to do

>> because the creation of workbook instances is pretty costly (about 200ms to 15 seconds for the workbooks we are dealing with) we implemented a pooling of workbook instances which seems to work pretty well.

What do you mean by this? How does it work?

>> Since we are only performing limited write-operations on the workbook instances (basically only calling Cell.setCellType() and Cell.setCellValue()) we need a way of "cloning" the original workbook values when retrieving a workbook instance from the pool, remembering them and "reseting" the workbook instance when returning it to the pool by setting all those remembered values to the workbook instance.

Are you holding a group of workbooks in memory, then using them as templates to write new workbooks with changed values? Or are you changing the values and saving it with a new file name?

>> I'm wondering if there is already a functionality like this in poi or if somebody already did sth. similar? Otherwhise I'll have to write sth. by my own (which shouldn't be too hard) - but I thought that I'll ask anyway.

>> Performing a deep clone of the complete workbook is not an option as this would make the whole pooling idea pointless.

It may be possible to write your new workbook using SXSSF which does not hold everything in memory so long. You can specify how many rows to keep in memory to help control the amount of memory used.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org