You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2011/10/21 16:54:36 UTC

DO NOT REPLY [Bug 52069] New: Heap out of memory errors for large xlsx files - even when using PipedReader to read file

https://issues.apache.org/bugzilla/show_bug.cgi?id=52069

             Bug #: 52069
           Summary: Heap out of memory errors for large xlsx files - even
                    when using PipedReader to read file
           Product: POI
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: major
          Priority: P2
         Component: XSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: meghana.vishwanath@gmail.com
    Classification: Unclassified


While parsing an xlsx file of about 4 MB using Apache Tika 0.9, I came across
this error. I am using PipedReader and PipedWriter to access the file content.
Hence, I believe that heap size allocation is not really a problem since I have
been running the same code with much larger files. 

Looking at the memory consumption using a profiler, I found that instances of 2
classes - org.apache.xmlbeans.impl.store.Xobj$AttrXobj and Xobj$ElementXobj
seem to grow exponentially with file size. For the above mentioned file, there
were more than 1,600,000 objects of type Xobj$AttrXobj. 

I am attaching the xlsx file which caused this error. 

Note: this error also occurs for .docx files.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 52069] Heap out of memory errors for large xlsx files - even when using PipedReader to read file

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52069

Nick Burch <ni...@alfresco.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |LATER

--- Comment #2 from Nick Burch <ni...@alfresco.com> 2011-10-21 15:02:56 UTC ---
Please re-try with Tika 0.10, this should be fixed there

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 52069] Heap out of memory errors for large xlsx files - even when using PipedReader to read file

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52069

--- Comment #1 from Meghana <me...@gmail.com> 2011-10-21 14:56:39 UTC ---
Created attachment 27835
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27835
Dodgy xlsx file

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org