You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by John Harrison <jh...@cas.org> on 2005/10/28 15:46:27 UTC
out of memory when using XSLT extensions

I'm running out of memory when processing largish node sets. I have XSLT 
code like this

<xsl:for-each select="data">
  <xsl:variable name="split_data" select="ext:split(., '_', 4)"/>
   ...
<xsl:for-each>

ext:split is an XSLT extension which calls the Java String.split method 
and returns a node set (see below). This code throws and OutOfMemory 
exception when 'data' has 400 nodes, which doesn't seem very much to me. 
The exception is thrown at the point where the XSLT transformer is 
trying to call my extension.

I'm pretty new to XSLT (and Java) so maybe I'm doing something stupid 
although I'm not sure what. Why does XSLT need so much memory to process 
a fairly small amount of data and how can I code this more efficiently?

/**
  * Split a string into tokens and return as a node set to XSLT.
  */
public static NodeIterator split(String str, String regex, int limit) {
     NodeIterator nodes = null;
     try {
         String[] tokens = str.split(regex, limit);
         StringBuffer xmlTokens = new StringBuffer();
         xmlTokens.append("<root>");
         for (int i = 0; i < tokens.length; ++i) {
             xmlTokens.append("<tok>");
             xmlTokens.append(escapeXmlChars(tokens[i]));
             xmlTokens.append("</tok>");
         }
         xmlTokens.append("</root>");
         nodes = topLevelNodes(xmlTokens.toString());
     } catch (Exception e) {
         // todo log the error
     }
     return nodes;
}

public static NodeIterator topLevelNodes(String str) throws
SAXException, IOException {
     DOMParser parser = new DOMParser();
     parser.parse(new InputSource(new StringReader(str)));
     DocumentImpl doc = (DocumentImpl) parser.getDocument();
     return doc.createNodeIterator(doc.getDocumentElement(),
             NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT,
             new TopLevelOnly(),
             false);
}

private static class TopLevelOnly implements NodeFilter {
     public short acceptNode(Node node) {
         Element root = node.getOwnerDocument().getDocumentElement();
         return node.getParentNode() == root ?
                 FILTER_ACCEPT : FILTER_REJECT;
     }
}

I'm aware I could use the EXSLT extensions to do this and it would 
probably be a whole lot more efficient than my own efforts but I would 
like to understand what is wrong with the code above.

Thanks,
John