You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Marek Slama <ms...@email.cz> on 2016/07/07 08:41:38 UTC

Lazy loading of sheets in SS user model?

Hi,

I have workbook with ~ 10 sheets. Some of them are quite big. I need to 
update just one. Is it possible to load given sheet on request ie. not to 
load whole workbook model? (provided I have no links/dependency between 
sheets). Is it possible in principle? I want to ask before I try to dig into
code. Then on write I would save only sheets which are loaded. (main reason 
for this is memoru usage and also performance as creating complete model 
takes long).

I read some questions/discussions regarding this but it was lazy loading on 
row level. I think on sheet level it might be easier as sheets are separated
in xlsx file.

Thanks

Marek


Re: Lazy loading of sheets in SS user model?

Posted by Javen O'Neal <ja...@gmail.com>.
If developed, this would be a really valuable feature to the community. At
my day job, we can't load xlsx files larger than 4MB because they bloat to
1GB in the Java heap, the maximum that Oracle JRE 7 can allocate.

If you search through the mailing lists 6-12 months back, you'll find a
similar question, and the response was to use the lower level OPCPackage
classes to read in the files, but then you're operating with CTSheets
instead of XSSFSheets.

Another possibility is to keep a fixed number of sheets in memory, similar
to how SXSSFSheet keeps a fixed number of rows in memory.

Let us know what you're able to come up with, even if it's just a hack. We
can post the hack in the documentation or example code and hopefully make
progress towards a general purpose solution.
On Jul 7, 2016 2:34 AM, "Markus Kirsten" <mk...@gmail.com> wrote:

Hi,
There’s a parseSheet method in XSSFWorkbook (the relevant class if you’d
like to read/update xlsx). You could possibly subclass XSSFWorkbook and
then override parseSheet to not call onDocumentRead() for the sheets you
are not interested in based on their names. I haven’t tried it myself, and
the community possibly have a much neater solution, but you might want to
spend a few minutes to see if you can get it to work.

See code for the (default) implementation of parseSheet:

        public void parseSheet(Map<String, XSSFSheet> shIdMap, CTSheet
ctSheet) {
                XSSFSheet sh = shIdMap.get(ctSheet.getId());
                if(sh == null) {
                        logger.log(POILogger.WARN, "Sheet with name " +
ctSheet.getName() + " and r:id " + ctSheet.getId()+ " was defined, but
didn't exist in package, skipping");
                        return;
                }
                sh.sheet = ctSheet;
                sh.onDocumentRead();
                sheets.add(sh);
        }


Best of luck,
Markus


> On 07 Jul 2016, at 10:41, Marek Slama <ms...@email.cz> wrote:
>
> Hi,
>
> I have workbook with ~ 10 sheets. Some of them are quite big. I need to
> update just one. Is it possible to load given sheet on request ie. not to
> load whole workbook model? (provided I have no links/dependency between
> sheets). Is it possible in principle? I want to ask before I try to dig
into
> code. Then on write I would save only sheets which are loaded. (main
reason
> for this is memoru usage and also performance as creating complete model
> takes long).
>
> I read some questions/discussions regarding this but it was lazy loading
on
> row level. I think on sheet level it might be easier as sheets are
separated
> in xlsx file.
>
> Thanks
>
> Marek
>

Re: Lazy loading of sheets in SS user model?

Posted by Markus Kirsten <mk...@gmail.com>.
Hi,
There’s a parseSheet method in XSSFWorkbook (the relevant class if you’d like to read/update xlsx). You could possibly subclass XSSFWorkbook and then override parseSheet to not call onDocumentRead() for the sheets you are not interested in based on their names. I haven’t tried it myself, and the community possibly have a much neater solution, but you might want to spend a few minutes to see if you can get it to work.

See code for the (default) implementation of parseSheet:

	public void parseSheet(Map<String, XSSFSheet> shIdMap, CTSheet ctSheet) {
		XSSFSheet sh = shIdMap.get(ctSheet.getId());
		if(sh == null) {
			logger.log(POILogger.WARN, "Sheet with name " + ctSheet.getName() + " and r:id " + ctSheet.getId()+ " was defined, but didn't exist in package, skipping");
			return;
		}
		sh.sheet = ctSheet;
		sh.onDocumentRead();
		sheets.add(sh);
	}


Best of luck,
Markus


> On 07 Jul 2016, at 10:41, Marek Slama <ms...@email.cz> wrote:
> 
> Hi,
> 
> I have workbook with ~ 10 sheets. Some of them are quite big. I need to 
> update just one. Is it possible to load given sheet on request ie. not to 
> load whole workbook model? (provided I have no links/dependency between 
> sheets). Is it possible in principle? I want to ask before I try to dig into
> code. Then on write I would save only sheets which are loaded. (main reason 
> for this is memoru usage and also performance as creating complete model 
> takes long).
> 
> I read some questions/discussions regarding this but it was lazy loading on 
> row level. I think on sheet level it might be easier as sheets are separated
> in xlsx file.
> 
> Thanks
> 
> Marek
>