You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2019/03/05 20:33:35 UTC

[Bug 63234] New: XSSFFont.equals, hashCode very slow; revised code presented

https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

            Bug ID: 63234
           Summary: XSSFFont.equals, hashCode very slow; revised code
                    presented
           Product: POI
           Version: 4.0.x-dev
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: XSSF
          Assignee: dev@poi.apache.org
          Reporter: dmgauntt@uab.edu
  Target Milestone: ---

Created attachment 36475
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36475&action=edit
XSSFFont.java revised to speed hashCode, toString, and equals by ~1000

I recently discovered that when writing Excel workbooks, 75% of my CPU time was
occupied by the method XSSFFont.equals.

I worked around this by the following:
1) Add instance fields _hashCode and _text to XSSFFont, initially set to null.
2) When either hashCode() or toString() are called and _hashCode==null, then
both _hashCode and _text are calculated and saved, and the value of _hashCode
or _text are returned.
3) Set the value of each of these fields to null whenever a setter (or
getCTFont) is called.
3) When equals is called, it uses the saved values of _hashCode and _text.

The result is significant speed improvement of more than x1000.  The following
method calculates the time required to calculate font.equals(font), and the
time required to calculate font.oldEquals(font) (where oldEquals is the
original equals method).  The results on my iMac are

equals: timePerCall=139 ns
oldEquals: timePerCall=216113 ns

The test code is below; my revised XSSFFont.java source is attached.

        private static void testXSSFFontEquals(XSSFWorkbook workbook) {
                final int numTests = 1000;
                final XSSFFont font = workbook.getFontAt(0);

                long start = System.nanoTime();
                for (int n = 0; n < numTests; ++n) {
                        font.equals(font);
                }
                long end = System.nanoTime();

                System.out.println(String.format("\r\nequals: timePerCall=%d
ns", (end - start) / numTests));

                start = System.nanoTime();
                for (int n = 0; n < numTests; ++n) {
                        font.oldEquals(font);
                }
                end = System.nanoTime();
                System.out.println(String.format("oldEquals: timePerCall=%d
ns", (end - start) / numTests));

        }

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #10 from PJ Fanning <fa...@yahoo.com> ---
The attachment is not a patch file - it is a very out of date java file.

I also wonder why we need to optimise the equals and hashCode. I ran some of
the XSSFWorkbook unit tests and found no evidence of repeated calls to
XSSFFont.equals.

I think it would be better to track down the use case where the excessive calls
to equals are made and to try to avoid repeated equals calls if possible.

In theory, if we have a use case for calling equals a lot on our XSSF classes,
then we have a potential issue on many other XSSF classes.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #5 from David Gauntt <dm...@uab.edu> ---
I found a bug in my proposed code for XSSFFont.equals.  If hashCode() had not
been called before calling equals(), a NullPointException would have been
thrown.

The code below replaces the reference to _hashCode with a call to hashCode()

        @Override
        public boolean equals(Object o) {
                if (this == o) {
                        return true;
                }
                if (!(o instanceof XSSFFont)) {
                        return false;
                }

                //if (_hashCode != o.hashCode()) {

                if (hashCode() != o.hashCode()) {
                        return false;
                }

                XSSFFont cf = (XSSFFont) o;
                return toString().equals(cf.toString());
        }

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #8 from Dominik Stadler <do...@gmx.at> ---
Sorry, but this is still not better, there are still many changes, it seems
long-lines are cut, e.g.

     * By default, Microsoft Office Excel 2007 uses the Calibri font in font
size 11

becomes 

     * By default, Microsoft Office Excel 2007 uses the Calibri font in font
     * size 11

in your file. 

Also things like

     * @param font CTFont

become 

     * @param font
     *            CTFont


These make reviewing the patch hard. 

Maybe you can adjust your IDE settings to not reformat the code and then send
the changes as a minimal patch, see
http://poi.apache.org/devel/guidelines.html#SubmittingPatches for how you can
create those.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #1 from PJ Fanning <fa...@yahoo.com> ---
Comment on attachment 36475
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36475
XSSFFont.java revised to speed hashCode, toString, and equals by ~1000

Wouldn't it be better to have:


        private void updateHashCode() {
                _text = _ctFont.toString();
                _hashCode = _text.hashCode();
        }

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

David Gauntt <dm...@uab.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #36483|0                           |1
        is obsolete|                            |

--- Comment #9 from David Gauntt <dm...@uab.edu> ---
Created attachment 36487
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36487&action=edit
Patch directly from version 4.0.1

This version of the patch was generated as follows:

1) Download poi source into Eclipse using git.
2) Create a branch at tag REL_4_0_1
3) Make changes in XSSFFont.java
4) Commit changes

I have compared the code to tag REL_4_0_1, and verified that the only changes
are those associated with the bug fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

David Gauntt <dm...@uab.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #2 from David Gauntt <dm...@uab.edu> ---
I hang my head in shame; yes, of course that would be better.  The whole point
of this suggestion is to avoid repeating the same calculation over and over.

I have made the change in my code.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

David Gauntt <dm...@uab.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #36475|0                           |1
        is obsolete|                            |

--- Comment #7 from David Gauntt <dm...@uab.edu> ---
Created attachment 36483
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36483&action=edit
XSSFFont.java with code-style changes reverted

This is the same as the original patch, but with the code style changes
reverted and one bug fixed.

• Every tab character in the file has been replaced with 4 spaces

• The references to _hashCode in equals() has been replaced by calls to
hashCode()

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #6 from Dominik Stadler <do...@gmx.at> ---
Unfortunately your changed file contains many unrelated code-style changes, are
you able to produce a patch with only the changes that are necessary?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #3 from PJ Fanning <fa...@yahoo.com> ---
Thanks David. I'm still a little bit worried that the old code and the new code
are a little flawed in that the hashCode logic is meant to match the equals
logic, ie if you change state that means the equals state will change should
mean that the hashCode changes.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 63234] XSSFFont.equals, hashCode very slow; revised code presented

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=63234

--- Comment #4 from David Gauntt <dm...@uab.edu> ---
PJ,

Thank you for taking the time to look at this.

The only time that the behavior of equals() changes is when the value of _text
changes.  The only time that _text changes is when updateHashCode() is called,
and the only time that happens is when hashCode() or toString() are called
after there is a change of state.

Thus, the only time that the behavior of equals() changes is when _hashCode has
been recalculated, and the only time that _hashCode is recalculated is after a
change of state.

Does that address your concern?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org