You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2003/05/05 17:59:46 UTC
DO NOT REPLY [Bug 19672] New: - StringBuffer idiom in DeferredDocumentImpl causes large memory usage

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19672>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19672

StringBuffer idiom in DeferredDocumentImpl causes large memory usage

           Summary: StringBuffer idiom in DeferredDocumentImpl causes large
                    memory usage
           Product: Xerces2-J
           Version: 2.4.0
          Platform: PC
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: DOM
        AssignedTo: xerces-j-dev@xml.apache.org
        ReportedBy: Scott.Nygren@Thomson.com


We have a 3Meg document that uses over 1.5 Gig of memory to parse and causes an 
OutOfMemory error on our webserver.  I traced it down to the fact the document 
has a text node at the beginning that is 16 K and then has 93,000 more text 
nodes of much shorter length after it.  Each of the text nodes is allocated 16K 
of memory to store them even though they may only be a few characters.  The 
document that causes this is too big to include here, but the code below shows 
the problem in abstract.

This problem is due to the way the Sun Windows JDK 1.4.1 treats memory between 
Strings and StringBuffers (likely on other versions but I haven�t tested 
them).  When StringBuffer.toString is called a String is created with access to 
the StringBuffer�s internal char array.  Which in my problem case is 16K.  Then 
when the next StringBuffer method is called that changes the object (like 
setLength) a new char array is created for the StringBuffer with the full 
capacity (another 16K).

public class TestStringBuffer {
    // run with java -Xms30m -Xmx30m TestStringBuffer
    /** Main program entry point. */
    public static void main(String argv[]) {
        StringBuffer buf = new StringBuffer(10000);
        String [] ans1 = new String[1000];
        String [] ans2 = new String[1000];
        Runtime rt = Runtime.getRuntime();
        rt.gc();
        long free1 = rt.freeMemory();
        // all strings are allocated 10000
        // uses over 10 Meg to store array
        for (int i=0; i < ans1.length; i++) {
            buf.setLength(0);
            buf.append("a");
            buf.append("b");
            ans1[i] = buf.toString();
        }
        rt.gc();
        long free2 = rt.freeMemory();
        // uses about 60 K to store array
        for (int i=0; i < ans2.length; i++) {
            buf.setLength(0);
            buf.append("a");
            buf.append("b");
            ans2[i] = buf.substring(0);
        }
        rt.gc();
        long free3 = rt.freeMemory();
        System.out.println("Loop 1 used (toString) "+(free1 - free2));
        System.out.println("Loop 2 used (substring) "+(free2 - free3));
    }
}

I was able to fix my problem by changing 
org/apache/xerces/dom/DeferredDocumentImpl.getNodeValueString to use 
           value = fBufferStr.substring(0);
instead of 
           value = fBufferStr.toString();
wherever its referenced.

Also, org/apache/xerces/parsers/AbstractDOMParser uses the same idiom which may 
be a problem, but I did not take the time to test it.

I am also going to submit a bug to Sun to recommend at least saying something 
in the StringBuffer doc that Strings from toString could be very large.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org