You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ji...@apache.org on 2004/05/27 11:44:02 UTC
[jira] Commented: (XERCESJ-724) StringBuffer idiom in DeferredDocumentImpl causes large memory usage

The following comment has been added to this issue:

     Author: Tony Butterfield
    Created: Thu, 27 May 2004 2:42 AM
       Body:
>Also, org/apache/xerces/parsers/AbstractDOMParser uses the same idiom >which may be a problem, but I did not take the time to test it.

This is a problem too. It has the same consequences, text nodes are created which are backed by large (200Kb) StringBuffers. A fix is
to change the two references to fStringBuffer.setLength(0) to fStringBuffer = new StringBuffer();

This issue still exists on XercesJ 2.6.2, but a fix has gone into JDK1.4.2 which changes the behaviour of StringBuffer which resolves the problem on this platform, see:
http://developer.java.sun.com/developer/bugParade/bugs/4724129.html

---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESJ-724?page=comments#action_35743

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESJ-724

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESJ-724
    Summary: StringBuffer idiom in DeferredDocumentImpl causes large memory usage
       Type: Bug

     Status: Open

    Project: Xerces2-J
 Components: 
             DOM
   Versions:
             2.4.0

   Assignee: Xerces-J Developers Mailing List
   Reporter: Scott Nygren

    Created: Mon, 5 May 2003 3:59 PM
    Updated: Thu, 27 May 2004 2:42 AM
Environment: Operating System: Windows NT/2K
Platform: PC

Description:
We have a 3Meg document that uses over 1.5 Gig of memory to parse and causes an 
OutOfMemory error on our webserver.  I traced it down to the fact the document 
has a text node at the beginning that is 16 K and then has 93,000 more text 
nodes of much shorter length after it.  Each of the text nodes is allocated 16K 
of memory to store them even though they may only be a few characters.  The 
document that causes this is too big to include here, but the code below shows 
the problem in abstract.

This problem is due to the way the Sun Windows JDK 1.4.1 treats memory between 
Strings and StringBuffers (likely on other versions but I havent tested 
them).  When StringBuffer.toString is called a String is created with access to 
the StringBuffers internal char array.  Which in my problem case is 16K.  Then 
when the next StringBuffer method is called that changes the object (like 
setLength) a new char array is created for the StringBuffer with the full 
capacity (another 16K).

public class TestStringBuffer {
    // run with java -Xms30m -Xmx30m TestStringBuffer
    /** Main program entry point. */
    public static void main(String argv[]) {
        StringBuffer buf = new StringBuffer(10000);
        String [] ans1 = new String[1000];
        String [] ans2 = new String[1000];
        Runtime rt = Runtime.getRuntime();
        rt.gc();
        long free1 = rt.freeMemory();
        // all strings are allocated 10000
        // uses over 10 Meg to store array
        for (int i=0; i < ans1.length; i++) {
            buf.setLength(0);
            buf.append("a");
            buf.append("b");
            ans1[i] = buf.toString();
        }
        rt.gc();
        long free2 = rt.freeMemory();
        // uses about 60 K to store array
        for (int i=0; i < ans2.length; i++) {
            buf.setLength(0);
            buf.append("a");
            buf.append("b");
            ans2[i] = buf.substring(0);
        }
        rt.gc();
        long free3 = rt.freeMemory();
        System.out.println("Loop 1 used (toString) "+(free1 - free2));
        System.out.println("Loop 2 used (substring) "+(free2 - free3));
    }
}

I was able to fix my problem by changing 
org/apache/xerces/dom/DeferredDocumentImpl.getNodeValueString to use 
           value = fBufferStr.substring(0);
instead of 
           value = fBufferStr.toString();
wherever its referenced.

Also, org/apache/xerces/parsers/AbstractDOMParser uses the same idiom which may 
be a problem, but I did not take the time to test it.

I am also going to submit a bug to Sun to recommend at least saying something 
in the StringBuffer doc that Strings from toString could be very large.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org