You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2015/05/05 18:23:34 UTC

[Bug 57893] New: XSSFSheet.getMergedRegion(int) takes O(n^2) time

https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

            Bug ID: 57893
           Summary: XSSFSheet.getMergedRegion(int) takes O(n^2) time
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: dev@poi.apache.org
          Reporter: cmb-apache@corefiling.co.uk

Created attachment 32715
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32715&action=edit
Sample code demonstrating slow performance

The attached Java compares the time taken to loop over sheet.getMergedRegion(i)
for all i, versus the same operation by direct access to the deprecated
CTMergeCells.getMergeCellArray().

The sample input I will attach, many-merges.xlsx, has 50k merged regions and I
get 8000ms vs 80ms. The real-world input I found this with has 250k merged
regions and takes ~300 seconds vs ~300ms.

The reason seems to be:

XSSFSheet.getMergedRegion(int)
-> CTMergeCells.getMergeCellArray(int)
-> Xobj.find_element_user(QName, int)

which does:

    for ( Xobj x = _firstChild ; x != null ; x = x._nextSibling )
      if (x.isElem() && x._name.equals( name ) && --i < 0)
        return x.getUser();

So for each mergeCell you access you must compare the element name of every
previous mergeCell, whereas the deprecated CTMergeCells.getMergeCellArray()
uses find_all_element_users(), avoiding O(n^2).

Would you be interested in a patch? The only thing is, it seems the change
should be in CTMergeCells in ooxml-schemas, which seems to be generated(?) and
I'm not sure what generates it, or why getMergeCellArray() (the only fast
method) is deprecated. getMergeCellList() exists but only hands out a wrapper
around the slow getMergeCellArray(int).

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

Chris Boyle <cm...@corefiling.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|XSSFSheet.getMergedRegion(i |[PATCH]
                   |nt) takes O(n^2) time       |XSSFSheet.getMergedRegion(i
                   |                            |nt) takes O(n^2) time

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

--- Comment #4 from Chris Boyle <cm...@corefiling.co.uk> ---
Created attachment 32724
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32724&action=edit
Patch

Here's a patch adding Sheet.getMergedRegions(), implemented with the deprecated
array getter as discussed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

Javen ONeal <ja...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |58350

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |PatchAvailable

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

David North <dt...@corefiling.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #6 from David North <dt...@corefiling.co.uk> ---
Patch applied. Fixed in r1690661

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

--- Comment #5 from Chris Boyle <cm...@corefiling.co.uk> ---
(This is a git patch, including 1 binary file for the test.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

Chris Boyle <cm...@corefiling.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #32716|0                           |1
        is obsolete|                            |

--- Comment #3 from Chris Boyle <cm...@corefiling.co.uk> ---
Created attachment 32723
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32723&action=edit
Sample input with 50000 merged regions

Previous sample input was smaller than stated (999 merges); here's one with the
intended size (50000 merges).

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

--- Comment #8 from David North <dt...@corefiling.co.uk> ---
Note: comment 7 was fixed under bug 58350

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

--- Comment #1 from Chris Boyle <cm...@corefiling.co.uk> ---
Created attachment 32716
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32716&action=edit
Sample input with 50000 merged regions

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

--- Comment #2 from Nick Burch <ap...@gagravarr.org> ---
We tried asking the XMLBeans project about the deprecation of the Array getter,
but never really got an answer... In other places where speed matters, we've
used the array getter with an @ignore for the deprecation. If that would help
here, we'd love a patch for it!

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

--- Comment #7 from Javen ONeal <ja...@gmail.com> ---
If there are no merged regions on a sheet, HSSF returns an empty List while
XSSF raises an IllegalStateException. I would assume that both would behave the
same: return an empty list and let the calling code check for empty list. No
merged regions doesn't seem exceptional, and asking a user to check if
numMergedRegions before all getMergedRegions seems unnecessary. My 2 cents.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57893] [PATCH] XSSFSheet.getMergedRegion(int) takes O(n^2) time

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

Javen O'Neal <on...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |58651

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org