You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2003/05/05 17:59:46 UTC
DO NOT REPLY [Bug 19672] New: -
StringBuffer idiom in DeferredDocumentImpl causes large memory usage
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19672>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19672
StringBuffer idiom in DeferredDocumentImpl causes large memory usage
Summary: StringBuffer idiom in DeferredDocumentImpl causes large
memory usage
Product: Xerces2-J
Version: 2.4.0
Platform: PC
OS/Version: Windows NT/2K
Status: NEW
Severity: Normal
Priority: Other
Component: DOM
AssignedTo: xerces-j-dev@xml.apache.org
ReportedBy: Scott.Nygren@Thomson.com
We have a 3Meg document that uses over 1.5 Gig of memory to parse and causes an
OutOfMemory error on our webserver. I traced it down to the fact the document
has a text node at the beginning that is 16 K and then has 93,000 more text
nodes of much shorter length after it. Each of the text nodes is allocated 16K
of memory to store them even though they may only be a few characters. The
document that causes this is too big to include here, but the code below shows
the problem in abstract.
This problem is due to the way the Sun Windows JDK 1.4.1 treats memory between
Strings and StringBuffers (likely on other versions but I haven�t tested
them). When StringBuffer.toString is called a String is created with access to
the StringBuffer�s internal char array. Which in my problem case is 16K. Then
when the next StringBuffer method is called that changes the object (like
setLength) a new char array is created for the StringBuffer with the full
capacity (another 16K).
public class TestStringBuffer {
// run with java -Xms30m -Xmx30m TestStringBuffer
/** Main program entry point. */
public static void main(String argv[]) {
StringBuffer buf = new StringBuffer(10000);
String [] ans1 = new String[1000];
String [] ans2 = new String[1000];
Runtime rt = Runtime.getRuntime();
rt.gc();
long free1 = rt.freeMemory();
// all strings are allocated 10000
// uses over 10 Meg to store array
for (int i=0; i < ans1.length; i++) {
buf.setLength(0);
buf.append("a");
buf.append("b");
ans1[i] = buf.toString();
}
rt.gc();
long free2 = rt.freeMemory();
// uses about 60 K to store array
for (int i=0; i < ans2.length; i++) {
buf.setLength(0);
buf.append("a");
buf.append("b");
ans2[i] = buf.substring(0);
}
rt.gc();
long free3 = rt.freeMemory();
System.out.println("Loop 1 used (toString) "+(free1 - free2));
System.out.println("Loop 2 used (substring) "+(free2 - free3));
}
}
I was able to fix my problem by changing
org/apache/xerces/dom/DeferredDocumentImpl.getNodeValueString to use
value = fBufferStr.substring(0);
instead of
value = fBufferStr.toString();
wherever its referenced.
Also, org/apache/xerces/parsers/AbstractDOMParser uses the same idiom which may
be a problem, but I did not take the time to test it.
I am also going to submit a bug to Sun to recommend at least saying something
in the StringBuffer doc that Strings from toString could be very large.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org