You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Murphy, Mark" <mu...@metalexmfg.com> on 2015/11/03 20:58:14 UTC

Drawing Borders is SLOW

I am sure you all know this. But the problem increases as the number of styles grows. In looking at the code, I am convinced that the problem can be found in the fact that when borders are drawn, the cell style is retrieved, the border is applied, and all styles are searched for a matching style. I one is not found, then a new one is created (even if it is not going to ultimately be used because this happens for each border segment individually), and if colors are included, the process will take 2 to 4 times as long because only one style property is applied at a time. You can see the exponential growth in time as borders are added around ranges.

If you were to create a setCellProperty(Cell, Workbook, Map<String, Object>) where the map is a set of cell style properties, you could then use the Map.putAll method to apply a whole set of border properties all in one step, and then search for, and create if necessary, the final style, rather than searching and creating each intermediate style. The benefit, if you discount the fact that the number of styles is growing all the time you are drawing borders is that you are going to get up to a 75% time reduction, and another 50% beyond that if border colors are being used. Now empirical evidence suggests that I end up with 70 to 80% unused cell styles due to my border drawing activities. And since the real time in drawing borders is in searching for a style match, if we simply avoid creating all those intermediate styles, we will see additional gains.

A second method of speeding things up would be to hash and cache the styles so that instead of comparing the entire map, we just compare hash values.

Maybe a combination of the two methods would be the best of both worlds.

Am I crazy?

RE: Drawing Borders is SLOW

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.
Would you prefer just overriding setCellStyleProperty with a secondary set of parameters, or creating a new method named setCellStyleProperties? In either case, setCellStyleProperty would change to call the new method for a single property.

-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org] 
Sent: Thursday, November 05, 2015 9:48 AM
To: POI Users List
Subject: RE: Drawing Borders is SLOW

On Thu, 5 Nov 2015, Murphy, Mark wrote:
> My thought was to build the borders on their own, then once the 
> borders are created, apply them to the cell styles in a single step. 
> This would require a few new objects.
>
> You are going to have to change 
> poi.apache.ss.util.CellUtil.setCellStyleProperty

Oh, you're using CellUtil! That does change things

I don't see why we couldn't add a method onto CellUtil which took muliple properties, and found/created a style with all of those on in one go

That's different to working on the CellStyle objects directly, which is what I thought you were doing


Please feel free to create an enhancement in bugzilla for this (stressing 
it's for CellUtil not CellStyle, to avoid others getting confused), then 
even better work up a patch if you can!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Drawing Borders is SLOW

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 5 Nov 2015, Murphy, Mark wrote:
> My thought was to build the borders on their own, then once the borders 
> are created, apply them to the cell styles in a single step. This would 
> require a few new objects.
>
> You are going to have to change poi.apache.ss.util.CellUtil.setCellStyleProperty

Oh, you're using CellUtil! That does change things

I don't see why we couldn't add a method onto CellUtil which took muliple 
properties, and found/created a style with all of those on in one go

That's different to working on the CellStyle objects directly, which is 
what I thought you were doing


Please feel free to create an enhancement in bugzilla for this (stressing 
it's for CellUtil not CellStyle, to avoid others getting confused), then 
even better work up a patch if you can!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Drawing Borders is SLOW

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.
My thought was to build the borders on their own, then once the borders are created, apply them to the cell styles in a single step. This would require a few new objects. A CellBorder object to hold the row index, the cell index, and the border properties (including color) for the cell. And a collection object to hold the CellBorder objects. The collection is built one property at a time using a method named createBorder(int row, int col, int border, int color) or createBorder(int row, int col, int border) border uses the BORDER_XXXXX constants, color uses the COLOR_XXXX constants. When the borders are completely built, loop through the CellBorders, and apply each to the appropriate CellStyle (creating the row and/or cell if necessary) But you will want to apply all the properties to the CellStyle at once.

You are going to have to change poi.apache.ss.util.CellUtil.setCellStyleProperty


	/**
	 * This method attempt to find an already existing CellStyle that matches what you want the
	 * style to be. If it does not find the style, then it creates a new one. If it does create a
	 * new one, then it applies the propertyName and propertyValue to the style. This is necessary
	 * because Excel has an upper limit on the number of Styles that it supports.
	 *
	 *@param workbook The workbook that is being worked with.
	 *@param propertyName The name of the property that is to be changed.
	 *@param propertyValue The value of the property that is to be changed.
	 *@param cell The cell that needs it's style changes
	 */
	public static void setCellStyleProperty(Cell cell, Workbook workbook, String propertyName,
			Object propertyValue) {
		CellStyle originalStyle = cell.getCellStyle();
		CellStyle newStyle = null;
		Map<String, Object> values = getFormatProperties(originalStyle);
		values.put(propertyName, propertyValue);

		// index seems like what index the cellstyle is in the list of styles for a workbook.
		// not good to compare on!
		short numberCellStyles = workbook.getNumCellStyles();

		for (short i = 0; i < numberCellStyles; i++) {
			CellStyle wbStyle = workbook.getCellStyleAt(i);
			Map<String, Object> wbStyleMap = getFormatProperties(wbStyle);

			if (wbStyleMap.equals(values)) {
				newStyle = wbStyle;
				break;
			}
		}

		if (newStyle == null) {
			newStyle = workbook.createCellStyle();
			setFormatProperties(newStyle, workbook, values);
		}

		cell.setCellStyle(newStyle);
	}


Override the method:

	/**
	 * This method attempt to find an already existing CellStyle that matches what you want the
	 * style to be. If it does not find the style, then it creates a new one. If it does create a
	 * new one, then it applies the properties to the style. This is necessary
	 * because Excel has an upper limit on the number of Styles that it supports.
	 *
	 *@param workbook The workbook that is being worked with.
	 *@param properties A HashMap containing property, value pairs.
	 *@param cell The cell that needs it's style changes
	 */
	public static void setCellStyleProperty(Cell cell, Workbook workbook, Map<String, Object> properties) {
		CellStyle originalStyle = cell.getCellStyle();
		CellStyle newStyle = null;
		Map<String, Object> values = getFormatProperties(originalStyle);
		values.putAll(properties);

		// index seems like what index the cellstyle is in the list of styles for a workbook.
		// not good to compare on!
		short numberCellStyles = workbook.getNumCellStyles();

		for (short i = 0; i < numberCellStyles; i++) {
			CellStyle wbStyle = workbook.getCellStyleAt(i);
			Map<String, Object> wbStyleMap = getFormatProperties(wbStyle);

			if (wbStyleMap.equals(values)) {
				newStyle = wbStyle;
				break;
			}
		}

		if (newStyle == null) {
			newStyle = workbook.createCellStyle();
			setFormatProperties(newStyle, workbook, values);
		}

		cell.setCellStyle(newStyle);
	}


	/**
	 * This method attempt to find an already existing CellStyle that matches what you want the
	 * style to be. If it does not find the style, then it creates a new one. If it does create a
	 * new one, then it applies the propertyName and propertyValue to the style. This is necessary
	 * because Excel has an upper limit on the number of Styles that it supports.
	 *
	 *@param workbook The workbook that is being worked with.
	 *@param propertyName The name of the property that is to be changed.
	 *@param propertyValue The value of the property that is to be changed.
	 *@param cell The cell that needs it's style changes
	 */
	public static void setCellStyleProperty(Cell cell, Workbook workbook, String propertyName,
			Object propertyValue) {
		HashMap<String, Object> property = new HashMap(propertyName, propertyValue);
		setCellStyleProperty(cell, workbook, property); 
	}

I think the above change is the key. But it needs a lot of support built around it. The classes and methods to build the borders, and the method to apply those borders to the styles by calling setCellStyleProperty(Cell, Workbook, Map). This will prevent creating all those intermediate styles.


-----Original Message-----
From: Dominik Stadler [mailto:dominik.stadler@gmx.at] 
Sent: Wednesday, November 04, 2015 3:25 PM
To: POI Users List
Subject: Re: Drawing Borders is SLOW

Hi,

I think there is a related bug entry at
https://bz.apache.org/bugzilla/show_bug.cgi?id=54593 where we did some investigation and discussed caching things, but we could not create a complete fix yet... Ideas/Patches welcome!

Thanks... Dominik.

On Wed, Nov 4, 2015 at 3:18 PM, Murphy, Mark <mu...@metalexmfg.com> wrote:
> I do create a bunch of styles at the front, but they do not have borders. I would need 3 styles for column headers (just because of borders), and the data portion of the table would require at least 9 styles, just because of borders. And that assumes that the data in each cell is formatted the same way. Each different type of formatting, number formats, special highlighting, bolding, alignment would require multiple styles, and the program complexity needed to add table data location awareness to the mix (just to ensure the correct border goes with the correct cell) is unnecessary. The logic is much simpler, and you have to preload fewer styles, if you just leave the borders off, and draw them after the spreadsheet is built. Unfortunately that is a very slow option. It is POI that creates unnecessary styles due to its one property at a time processing of borders.
>
> What's easier to read and maintain (this is RPG that I am calling the POI methods from):
>
>     for rowNum = 1 to something;
>          row = ssSheet_createRow(sheet: rowNum);
>          ss_num(row: 1: customerNumber: sty_num);
>          ss_text(row: 2: customerName: sty_text);
>          ss_text(row: 3: status: sty_code);
>          ss_date(row: 4: createDate: *MDY): sty_date);
>     endfor;
>
>     ss_drawBorders(book: sheet: 1: rowNum-1: 1: 4:
>                     BORDER_THIN: EXTENT_INSIDE);
>     ss_drawBorders(book: sheet: 1: rowNum-1: 1: 4:
>                     BORDER_MEDIUM: EXTENT_OUTSIDE);
>
> Or
>
>     for rowNum = 1 to something;
>          row = ssSheet_createRow(sheet: rowNum);
>          if rowNum = 1;
>              style = sty_num_tl;
>          else;
>              style = sty_num_l;
>          endif;
>          ss_num (row: 1: customerNumber: style);
>          if rowNum = 1;
>              style = sty_text_t;
>          else;
>              style = sty_text;
>          endif;
>          ss_text(row: 2: customerName: sty_text);
>          if rowNum = 1;
>              style = sty_code_t;
>          else;
>              style = sty_code;
>          endif;
>          ss_text(row: 3: status: sty_code);
>          if rowNum = 1;
>              style = sty_date_tr;
>          else;
>              style = sty_date_r;
>          endif;
>          ss_date(row: 4: createDate: *MDY): sty_date);
>     endfor;
>
>     // Still need more code to replace the styles on the last row of 
> cells to get the borders right
>
> This is the simplest of examples. Most of my spreadsheets are far more complex with some having styles set based on the data, is the part discontinued, is the process late, etc. This just makes for nested if's in trying to decide which style should be applied to each cell. The first code is simpler, and if all the border properties could be set for a cell all at once, much quicker. Just a single search, just a single setCellStyleProperty, and no unused intermediate styles created.
>
> -----Original Message-----
> From: Nick Burch [mailto:apache@gagravarr.org]
> Sent: Tuesday, November 03, 2015 3:59 PM
> To: POI Users List
> Subject: Re: Drawing Borders is SLOW
>
> On Tue, 3 Nov 2015, Murphy, Mark wrote:
>> I am sure you all know this. But the problem increases as the number 
>> of styles grows. In looking at the code, I am convinced that the 
>> problem can be found in the fact that when borders are drawn, the 
>> cell style is retrieved, the border is applied, and all styles are 
>> searched for a matching style. I one is not found, then a new one is 
>> created
>
> This is the bit where I'm loosing you. Surely you create a dozen or so styles at the start of creating your workbook, with the various colours and borders that you want, then you simply apply them to your cells as you work through creating your workbook. You shouldn't need to be creating styles as you go, adding various bits of borders in to them.
>
> Styles in Excel are, due to how the file format works, scoped at the workbook level and not the cell level. You shouldn't therefore be creating styles as you go, or you'll run out of available styles!
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional 
> commands, e-mail: user-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional 
> commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org


Re: Drawing Borders is SLOW

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

I think there is a related bug entry at
https://bz.apache.org/bugzilla/show_bug.cgi?id=54593 where we did some
investigation and discussed caching things, but we could not create a
complete fix yet... Ideas/Patches welcome!

Thanks... Dominik.

On Wed, Nov 4, 2015 at 3:18 PM, Murphy, Mark <mu...@metalexmfg.com> wrote:
> I do create a bunch of styles at the front, but they do not have borders. I would need 3 styles for column headers (just because of borders), and the data portion of the table would require at least 9 styles, just because of borders. And that assumes that the data in each cell is formatted the same way. Each different type of formatting, number formats, special highlighting, bolding, alignment would require multiple styles, and the program complexity needed to add table data location awareness to the mix (just to ensure the correct border goes with the correct cell) is unnecessary. The logic is much simpler, and you have to preload fewer styles, if you just leave the borders off, and draw them after the spreadsheet is built. Unfortunately that is a very slow option. It is POI that creates unnecessary styles due to its one property at a time processing of borders.
>
> What's easier to read and maintain (this is RPG that I am calling the POI methods from):
>
>     for rowNum = 1 to something;
>          row = ssSheet_createRow(sheet: rowNum);
>          ss_num(row: 1: customerNumber: sty_num);
>          ss_text(row: 2: customerName: sty_text);
>          ss_text(row: 3: status: sty_code);
>          ss_date(row: 4: createDate: *MDY): sty_date);
>     endfor;
>
>     ss_drawBorders(book: sheet: 1: rowNum-1: 1: 4:
>                     BORDER_THIN: EXTENT_INSIDE);
>     ss_drawBorders(book: sheet: 1: rowNum-1: 1: 4:
>                     BORDER_MEDIUM: EXTENT_OUTSIDE);
>
> Or
>
>     for rowNum = 1 to something;
>          row = ssSheet_createRow(sheet: rowNum);
>          if rowNum = 1;
>              style = sty_num_tl;
>          else;
>              style = sty_num_l;
>          endif;
>          ss_num (row: 1: customerNumber: style);
>          if rowNum = 1;
>              style = sty_text_t;
>          else;
>              style = sty_text;
>          endif;
>          ss_text(row: 2: customerName: sty_text);
>          if rowNum = 1;
>              style = sty_code_t;
>          else;
>              style = sty_code;
>          endif;
>          ss_text(row: 3: status: sty_code);
>          if rowNum = 1;
>              style = sty_date_tr;
>          else;
>              style = sty_date_r;
>          endif;
>          ss_date(row: 4: createDate: *MDY): sty_date);
>     endfor;
>
>     // Still need more code to replace the styles on the last row of cells to get the borders right
>
> This is the simplest of examples. Most of my spreadsheets are far more complex with some having styles set based on the data, is the part discontinued, is the process late, etc. This just makes for nested if's in trying to decide which style should be applied to each cell. The first code is simpler, and if all the border properties could be set for a cell all at once, much quicker. Just a single search, just a single setCellStyleProperty, and no unused intermediate styles created.
>
> -----Original Message-----
> From: Nick Burch [mailto:apache@gagravarr.org]
> Sent: Tuesday, November 03, 2015 3:59 PM
> To: POI Users List
> Subject: Re: Drawing Borders is SLOW
>
> On Tue, 3 Nov 2015, Murphy, Mark wrote:
>> I am sure you all know this. But the problem increases as the number
>> of styles grows. In looking at the code, I am convinced that the
>> problem can be found in the fact that when borders are drawn, the cell
>> style is retrieved, the border is applied, and all styles are searched
>> for a matching style. I one is not found, then a new one is created
>
> This is the bit where I'm loosing you. Surely you create a dozen or so styles at the start of creating your workbook, with the various colours and borders that you want, then you simply apply them to your cells as you work through creating your workbook. You shouldn't need to be creating styles as you go, adding various bits of borders in to them.
>
> Styles in Excel are, due to how the file format works, scoped at the workbook level and not the cell level. You shouldn't therefore be creating styles as you go, or you'll run out of available styles!
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Drawing Borders is SLOW

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.
I do create a bunch of styles at the front, but they do not have borders. I would need 3 styles for column headers (just because of borders), and the data portion of the table would require at least 9 styles, just because of borders. And that assumes that the data in each cell is formatted the same way. Each different type of formatting, number formats, special highlighting, bolding, alignment would require multiple styles, and the program complexity needed to add table data location awareness to the mix (just to ensure the correct border goes with the correct cell) is unnecessary. The logic is much simpler, and you have to preload fewer styles, if you just leave the borders off, and draw them after the spreadsheet is built. Unfortunately that is a very slow option. It is POI that creates unnecessary styles due to its one property at a time processing of borders.

What's easier to read and maintain (this is RPG that I am calling the POI methods from):

    for rowNum = 1 to something;
         row = ssSheet_createRow(sheet: rowNum);
         ss_num(row: 1: customerNumber: sty_num);
         ss_text(row: 2: customerName: sty_text);
         ss_text(row: 3: status: sty_code);
         ss_date(row: 4: createDate: *MDY): sty_date);
    endfor;

    ss_drawBorders(book: sheet: 1: rowNum-1: 1: 4:
                    BORDER_THIN: EXTENT_INSIDE);
    ss_drawBorders(book: sheet: 1: rowNum-1: 1: 4:
                    BORDER_MEDIUM: EXTENT_OUTSIDE);

Or

    for rowNum = 1 to something;
         row = ssSheet_createRow(sheet: rowNum);
         if rowNum = 1;
             style = sty_num_tl;
         else;
             style = sty_num_l;
         endif;
         ss_num (row: 1: customerNumber: style);
         if rowNum = 1;
             style = sty_text_t;
         else;
             style = sty_text;
         endif;         
         ss_text(row: 2: customerName: sty_text);
         if rowNum = 1;
             style = sty_code_t;
         else;
             style = sty_code;
         endif;   
         ss_text(row: 3: status: sty_code);
         if rowNum = 1;
             style = sty_date_tr;
         else;
             style = sty_date_r;
         endif;
         ss_date(row: 4: createDate: *MDY): sty_date);
    endfor;

    // Still need more code to replace the styles on the last row of cells to get the borders right

This is the simplest of examples. Most of my spreadsheets are far more complex with some having styles set based on the data, is the part discontinued, is the process late, etc. This just makes for nested if's in trying to decide which style should be applied to each cell. The first code is simpler, and if all the border properties could be set for a cell all at once, much quicker. Just a single search, just a single setCellStyleProperty, and no unused intermediate styles created.

-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org] 
Sent: Tuesday, November 03, 2015 3:59 PM
To: POI Users List
Subject: Re: Drawing Borders is SLOW

On Tue, 3 Nov 2015, Murphy, Mark wrote:
> I am sure you all know this. But the problem increases as the number 
> of styles grows. In looking at the code, I am convinced that the 
> problem can be found in the fact that when borders are drawn, the cell 
> style is retrieved, the border is applied, and all styles are searched 
> for a matching style. I one is not found, then a new one is created

This is the bit where I'm loosing you. Surely you create a dozen or so styles at the start of creating your workbook, with the various colours and borders that you want, then you simply apply them to your cells as you work through creating your workbook. You shouldn't need to be creating styles as you go, adding various bits of borders in to them.

Styles in Excel are, due to how the file format works, scoped at the workbook level and not the cell level. You shouldn't therefore be creating styles as you go, or you'll run out of available styles!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Drawing Borders is SLOW

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 3 Nov 2015, Murphy, Mark wrote:
> I am sure you all know this. But the problem increases as the number of 
> styles grows. In looking at the code, I am convinced that the problem 
> can be found in the fact that when borders are drawn, the cell style is 
> retrieved, the border is applied, and all styles are searched for a 
> matching style. I one is not found, then a new one is created

This is the bit where I'm loosing you. Surely you create a dozen or so 
styles at the start of creating your workbook, with the various colours 
and borders that you want, then you simply apply them to your cells as you 
work through creating your workbook. You shouldn't need to be creating 
styles as you go, adding various bits of borders in to them.

Styles in Excel are, due to how the file format works, scoped at the 
workbook level and not the cell level. You shouldn't therefore be creating 
styles as you go, or you'll run out of available styles!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org