You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Michael Heinrichs <mi...@klocwork.com> on 2005/10/19 21:51:25 UTC

Extremely high memory usage (and OutOfMemory) when using EXSLT str:tokenize or str:split

There appears to be a severe memory usage issue when using the EXSLT
functions str:tokenize or str:split as included in the latest version of
Xalan (2.7.0, and also 2.6.1).  Even for relatively small XML files,
when repeatedly using these functions, we are encountering OutOfMemory
exceptions even with large (2GB) maximum heap sizes.  The code for these
extension functions looks quite straightforward, so I suspect the
problem lies deeper.

I created a simple testcase to demonstrate the problem, please see the
details below.  In this particular example, I hit another exception (No
more DTM IDs are available) before I reach OutOfMemory, but I imagine
that if I tweak my example appropriately, I would hit OutOfMemory
instead.

If I use the stylesheet version of the EXSLT function, I don't encounter
this memory issue.

I searched the mailing list archives and bug databases, but didn't find
any references to this issue.

Thanks,

Mike

test.xml (vary the number of 'row' elements)
========
<?xml version="1.0" encoding="ISO-8859-1"?>
<document>
<row>1.2.3.4.5.6.7.8.9.0</row>
</document>
====================================

test.xsl
========
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:str="http://exslt.org/strings">

  <xsl:template match="row">
    <xsl:variable name="test" select="str:tokenize(.,'.')"/>
  </xsl:template>
</xsl:stylesheet>
====================================

Command-line:
> java -cp
"C:\tmp\xalan-j_2_7_0\xalan.jar;C:\tmp\xalan-j_2_7_0\serializer.jar;C:\t
mp\xalan-j_2_7_0\xml-apis.jar;C:\tmp\xalan-j_2_7_0\xercesImpl.jar"
-Xmx1024m org.apache.xalan.xslt.Process -IN test.xml -XSL test.xsl -OUT
test_out.xml

Test 1: test.xml with 1,000 rows; file size: 32KB
Result: Transformation completes successfully; maximum process size:
~163MB

Test 2: test.xml with 10,000 rows; file size: 312KB
Result: Transformation aborts with exception; maximum process size:
~900MB

file:///c:/tmp/test.xsl; Line #12; Column #61; XSLT Error
(javax.xml.transform.TransformerException): No more DTM IDs are
available Exception in thread "main" java.lang.RuntimeException: No more
DTM IDs are available
        at org.apache.xalan.xslt.Process.doExit(Process.java:1153)
        at org.apache.xalan.xslt.Process.main(Process.java:1126)

> java -version
java version "1.5.0_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_04-b05)
Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode, sharing)

Re: Extremely high memory usage (and OutOfMemory) when using EXSLT str:tokenize or str:split

Posted by John Gentilin <ge...@eyecatching.com>.
Mike,

Can you create a JIRA entry so this issue is discussed at the next bug 
triage and
is tracked properly. Include a zipped version of the large data file so 
someone can
just download and run.

Thank you
John G

Michael Heinrichs wrote:

>There appears to be a severe memory usage issue when using the EXSLT
>functions str:tokenize or str:split as included in the latest version of
>Xalan (2.7.0, and also 2.6.1).  Even for relatively small XML files,
>when repeatedly using these functions, we are encountering OutOfMemory
>exceptions even with large (2GB) maximum heap sizes.  The code for these
>extension functions looks quite straightforward, so I suspect the
>problem lies deeper.
>
>I created a simple testcase to demonstrate the problem, please see the
>details below.  In this particular example, I hit another exception (No
>more DTM IDs are available) before I reach OutOfMemory, but I imagine
>that if I tweak my example appropriately, I would hit OutOfMemory
>instead.
>
>If I use the stylesheet version of the EXSLT function, I don't encounter
>this memory issue.
>
>I searched the mailing list archives and bug databases, but didn't find
>any references to this issue.
>
>Thanks,
>
>Mike
>
>test.xml (vary the number of 'row' elements)
>========
><?xml version="1.0" encoding="ISO-8859-1"?>
><document>
><row>1.2.3.4.5.6.7.8.9.0</row>
></document>
>====================================
>
>test.xsl
>========
><?xml version="1.0"?>
><xsl:stylesheet version="1.0"
>   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>   xmlns:str="http://exslt.org/strings">
>
>  <xsl:template match="row">
>    <xsl:variable name="test" select="str:tokenize(.,'.')"/>
>  </xsl:template>
></xsl:stylesheet>
>====================================
>
>Command-line:
>  
>
>>java -cp
>>    
>>
>"C:\tmp\xalan-j_2_7_0\xalan.jar;C:\tmp\xalan-j_2_7_0\serializer.jar;C:\t
>mp\xalan-j_2_7_0\xml-apis.jar;C:\tmp\xalan-j_2_7_0\xercesImpl.jar"
>-Xmx1024m org.apache.xalan.xslt.Process -IN test.xml -XSL test.xsl -OUT
>test_out.xml
>
>Test 1: test.xml with 1,000 rows; file size: 32KB
>Result: Transformation completes successfully; maximum process size:
>~163MB
>
>Test 2: test.xml with 10,000 rows; file size: 312KB
>Result: Transformation aborts with exception; maximum process size:
>~900MB
>
>file:///c:/tmp/test.xsl; Line #12; Column #61; XSLT Error
>(javax.xml.transform.TransformerException): No more DTM IDs are
>available Exception in thread "main" java.lang.RuntimeException: No more
>DTM IDs are available
>        at org.apache.xalan.xslt.Process.doExit(Process.java:1153)
>        at org.apache.xalan.xslt.Process.main(Process.java:1126)
>
>  
>
>>java -version
>>    
>>
>java version "1.5.0_04"
>Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_04-b05)
>Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode, sharing)
>  
>


-- 
--------------------------------------
John Gentilin
Eye Catching Solutions Inc.
18314 Carlwyn Drive
Castro Valley CA 94546

    Contact Info
gentijo@eyecatching.com
Ca Office 1-510-881-4821
NJ Office 1-732-422-4917