You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/02/03 10:35:47 UTC

[Bug 58963] New: OutOfMemoryError while reading some Excel files

https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

            Bug ID: 58963
           Summary: OutOfMemoryError while reading some Excel files
           Product: POI
           Version: 3.13-FINAL
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: XSSF
          Assignee: dev@poi.apache.org
          Reporter: zmircmircea@gmail.com

Created attachment 33518
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33518&action=edit
sample project + file to reproduce the error

Hi!

Today we received an Excel file which can't be added into the system, because
POI triggers OOM while trying to open it.
The xlsx file has ~300KB and the application -Xmx750m. It does the same with
-Xmx2750m so it's definitely not this.

This OOM happens with both 3.13 and 3.14-beta1.

I will attach a sample maven project + the problematic file called
"eu-triggers-oom.xlsx" available in /src/test/resources

To replicate the issue, just execute the test from class POIExcelOOMNGTest.
I will also a VisualVM memory usage screenshot while running the test.

Here is the stacktrace of the test project:
shouldNotThrowOOMWhileReadingExcel(poi.excel.oom.POIExcelOOMNGTest)  Time
elapsed: 136.287 sec  <<< FAILURE!
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
    at
org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:62)
    at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:465)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:173)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:278)
    at
poi.excel.oom.POIExcelOOMNGTest.shouldNotThrowOOMWhileReadingExcel(POIExcelOOMNGTest.java:12)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at
org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:60)
    ... 37 more
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.xmlbeans.impl.store.CharUtil.allocate(CharUtil.java:397)
    at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:506)
    at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:419)
    at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:489)
    at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:2927)
    at
org.apache.xmlbeans.impl.store.Cur$CurLoadContext.stripText(Cur.java:3130)
    at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:3143)
    at
org.apache.xmlbeans.impl.store.Locale$SaxHandler.characters(Locale.java:3291)
    at
org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportCdata(Piccolo.java:992)
    at
org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1290)
    at
org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
    at
org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4812)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
    at
org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at
org.openxmlformats.schemas.spreadsheetml.x2006.main.SstDocument$Factory.parse(Unknown
Source)
    at
org.apache.poi.xssf.model.SharedStringsTable.readFrom(SharedStringsTable.java:119)
    at
org.apache.poi.xssf.model.SharedStringsTable.<init>(SharedStringsTable.java:106)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at
org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:60)
    at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:465)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:173)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:278)
    at
poi.excel.oom.POIExcelOOMNGTest.shouldNotThrowOOMWhileReadingExcel(POIExcelOOMNGTest.java:12)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #5 from Mircea <zm...@gmail.com> ---
Created attachment 33690
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33690&action=edit
VisualVM memory sampler screenshot

I have just reproduced the error and took a screenshot from VisualVM's memory
sampler.
It seems that char[] is taking most of the memory.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #9 from Mircea <zm...@gmail.com> ---
I looked into that link but I couldn't find the working dist version.
I tried what's in build/dist, but no success. Apache POI classes were missing.

As you probably already have POI setup in your IDE, can you please send me (via
Dropbox, OneDrive etc) the working dist version?

Otherwise, what should I do to get it working?
I spent quite a lot of time taking the src zip and trying to fix all the
missing dependencies, but there are many which aren't there.

On the other side, I provided the test case XML and Java code (3 lines).
It's 1000x easier for one of POI developers to copy those 3 lines + the file in
a project compiled against the latest version of POI.
Then I wouldn't have to spend a lot of time trying to build POI myself.

Thanks a lot. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

Mircea <zm...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #11 from Mircea <zm...@gmail.com> ---
Great! Thanks for the clarification.
I built it with ant, then added the remaining jars.

It seems to work properly now. No more memory issue.
Most probably bug 57031 has taken care of it.

Thanks, Apache POI!

*** This bug has been marked as a duplicate of bug 57031 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #1 from Mircea <zm...@gmail.com> ---
Created attachment 33519
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33519&action=edit
VisualVM screenshot after the test failed - OOM

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #4 from Javen O'Neal <on...@apache.org> ---
Looking through eu-triggers-oom.xlsx, I didn't see any XML bombs likely to
bloat memory. The main memory consumer here is shared strings table containing
5700 unique values. The rest of the XML files are mostly empty.

To build up a test case to make sure there isn't anything else in the Excel
file, a unit test could be: read in a dictionary of random Norwegian words
(including æ, ø, å--if that matters), generate 5700 "sentences" composed of 10
random words each, and look at memory consumption.

The SharedStringsTable [1] uses an Array<CTRst> and a Map<String, Int> to store
the strings. The Map makes string lookup faster at the cost of increasing the
memory requirements, but both structures should be able to handle 6700 entries
without OOM'ing.

What kinds of objects did VisualVM indicate were consuming the largest amount
of memory?

[1]
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xssf/model/SharedStringsTable.java?view=markup

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #2 from Javen O'Neal <on...@apache.org> ---
By any chance have you used the same file on an older version of POI without
getting an OOM?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

Kai G <no...@kaigrabfelder.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nospam@kaigrabfelder.de

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #10 from Nick Burch <ap...@gagravarr.org> ---
(In reply to Mircea from comment #9)
> Otherwise, what should I do to get it working?
> I spent quite a lot of time taking the src zip and trying to fix all the
> missing dependencies, but there are many which aren't there.

Just run "ant jar" from a svn checkout / git checkout / source download, and
all the dependencies you need will be fetched for you on demand

Otherwise, the nightly builds are available from Jenkins at
https://builds.apache.org/job/POI/lastSuccessfulBuild/artifact/build/dist/ -
grab the POI jars from the bin, and add in any dependencies from the previous
full POI release's bin package

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #6 from Javen O'Neal <on...@apache.org> ---
> It seems that char[] is taking most of the memory.
Which would make sense because the shared strings table is disproportionately
large, and rest of the xml nodes are just arrays/pointers+strings.

Andi fixed some OOM's in the current trunk build on bug 57031. Read through
that bug to see if it's relevant to you, and if so see if you still get OOM's
on a 3.15 beta 1 nightly [1]

[1] https://builds.apache.org/job/POI/lastSuccessfulBuild/artifact/

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #3 from Mircea <zm...@gmail.com> ---
We received the file just today from a complaining user, then we tested it
ourselves with 3.13 and 3.14-beta1, but nothing else.
I am sorry.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

--- Comment #7 from Mircea <zm...@gmail.com> ---
I'll give it a try.
What should I write in my Maven pom.xml file in order to get the nightly build
instead of 3.13?
I didn't find the nightly builds Maven repository.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58963] OutOfMemoryError while reading some Excel files

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58963

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #8 from Dominik Stadler <do...@gmx.at> ---
Nightly builds are not available via Maven but only as manual download via
https://builds.apache.org/view/POI/job/POI/lastSuccessfulBuild/artifact/build/dist

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org