You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2011/10/21 16:54:36 UTC
DO NOT REPLY [Bug 52069] New: Heap out of memory errors for large
xlsx files - even when using PipedReader to read file
https://issues.apache.org/bugzilla/show_bug.cgi?id=52069
Bug #: 52069
Summary: Heap out of memory errors for large xlsx files - even
when using PipedReader to read file
Product: POI
Version: unspecified
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: major
Priority: P2
Component: XSSF
AssignedTo: dev@poi.apache.org
ReportedBy: meghana.vishwanath@gmail.com
Classification: Unclassified
While parsing an xlsx file of about 4 MB using Apache Tika 0.9, I came across
this error. I am using PipedReader and PipedWriter to access the file content.
Hence, I believe that heap size allocation is not really a problem since I have
been running the same code with much larger files.
Looking at the memory consumption using a profiler, I found that instances of 2
classes - org.apache.xmlbeans.impl.store.Xobj$AttrXobj and Xobj$ElementXobj
seem to grow exponentially with file size. For the above mentioned file, there
were more than 1,600,000 objects of type Xobj$AttrXobj.
I am attaching the xlsx file which caused this error.
Note: this error also occurs for .docx files.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 52069] Heap out of memory errors for large xlsx
files - even when using PipedReader to read file
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52069
Nick Burch <ni...@alfresco.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |LATER
--- Comment #2 from Nick Burch <ni...@alfresco.com> 2011-10-21 15:02:56 UTC ---
Please re-try with Tika 0.10, this should be fixed there
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 52069] Heap out of memory errors for large xlsx
files - even when using PipedReader to read file
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52069
--- Comment #1 from Meghana <me...@gmail.com> 2011-10-21 14:56:39 UTC ---
Created attachment 27835
--> https://issues.apache.org/bugzilla/attachment.cgi?id=27835
Dodgy xlsx file
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org