You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2011/05/09 08:02:00 UTC

DO NOT REPLY [Bug 51171] New: [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

             Bug #: 51171
           Summary: [PATCH] Performance opening large XLS-Files
                    (InternalWorkbook.getStyleRecord)
           Product: POI
           Version: 3.8-dev
          Platform: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: marcel.may.de@gmail.com
    Classification: Unclassified


Created attachment 26970
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26970
Patch against SVN trunk

Opening ~5M XLS-Files takes about half a second and high CPU load.
By a quick profiling I discovered a 55-million-times loop for getting the user
style name for my XLS-files.

This patch against trunk fixes the number of loops significantly, by fixing the
'iterative search' causing the high loop count.

I'll attach the profiling details in a moment.
As a summary, for my test cases I measured an average speedup =  old_time /
new_time = ~ 1.3 .

Would be great if the patch could be included in the next 3.8 release.

Cheers,
Marcel

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #8 from Marcel May <ma...@gmail.com> 2011-05-26 06:42:34 UTC ---
Created attachment 27067
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27067
Improves SharedValueManager.findFormulaGroup

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #6 from Marcel May <ma...@gmail.com> 2011-05-25 14:26:48 UTC ---
Created attachment 27064
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27064
Profiled hotspots (opening of XLS file using current trunk v1127506)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #10 from Marcel May <ma...@gmail.com> 2011-05-26 06:49:41 UTC ---
Created attachment 27069
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27069
Shows profiled calltree after second patch for
SharedValueManager.findFormulaGroup

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #2 from Marcel May <ma...@gmail.com> 2011-05-09 06:09:29 UTC ---
Created attachment 26972
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26972
Including the patch, fixing the previous top hotspots (these were all related,
up to the SharedValueManger thing).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #7 from Marcel May <ma...@gmail.com> 2011-05-25 14:28:48 UTC ---
Created attachment 27065
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27065
Profiled cpu calltree (opening of XLS file using current trunk v1127506)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

Marcel May <ma...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marcel.may.de@gmail.com

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #9 from Marcel May <ma...@gmail.com> 2011-05-26 06:47:56 UTC ---
Created attachment 27068
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27068
Shows profiled hotspots after second patch for
SharedValueManager.findFormulaGroup

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #4 from Yegor Kozlov <ye...@dinom.ru> 2011-05-15 18:54:35 UTC ---
Fixed in r1103502, but in a diffrent way.

The real problem was that HSSFCell.setCellStyle was called for every cell when
constructing a workbook. This method is expensive and designed for assigning
styles to individual cell and applying it to workbook scope causes performance
issues. It appears that we don't need to call setCellStyle in the HSSFCell
constructior at all, this line remained from all times (POI-3.5 or earlier) and
does not make any sense in POI-3.8. So, I removed it.  

This fix should boost performance of opening .xls files even greater than ~1.3. 

Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #1 from Marcel May <ma...@gmail.com> 2011-05-09 06:06:44 UTC ---
Created attachment 26971
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26971
CPU Hotspots, before to the patch. Notice the 55m inv count on
WorkbookRecordList

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

Marcel May <ma...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|                            |All

--- Comment #3 from Marcel May <ma...@gmail.com> 2011-05-09 06:18:32 UTC ---
The profiling screenshots attached show opening a single XLS file once:

new HSSFWorkbook(new FileInputStream("...."));

You can see that the top hotspots (first 7 are related) got fixed when you
compare before/after the patch. Looping 55 million times does not occur
anymore.

Another micro bench mark including warmup phase followed by timed loops shows
similiar results (speedup of ~1.3, or only ~75% of original time on average).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51171] [PATCH] Performance opening large XLS-Files (InternalWorkbook.getStyleRecord)

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51171

--- Comment #5 from Marcel May <ma...@gmail.com> 2011-05-25 14:22:29 UTC ---
(In reply to comment #4)
> Fixed in r1103502, but in a diffrent way.
> 
> The real problem was that HSSFCell.setCellStyle was called for every cell when
> constructing a workbook. This method is expensive and designed for assigning
> styles to individual cell and applying it to workbook scope causes performance
> issues. It appears that we don't need to call setCellStyle in the HSSFCell
> constructior at all, this line remained from all times (POI-3.5 or earlier) and
> does not make any sense in POI-3.8. So, I removed it.  
> 
> This fix should boost performance of opening .xls files even greater than ~1.3. 
> 
> Yegor

Thanks, Yegor - your fix was even better :-)

I profiled again against latest version 1127506 and noticed another hotspot in
the SharedValueManager class. Here's an 8 million times iteration doing
'findFormularGroup'.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org