You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2002/04/29 06:39:10 UTC

DO NOT REPLY [Bug 8612] New: - Performance Enhancement to Xalan distinct function

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=8612>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=8612

Performance Enhancement to Xalan distinct function

           Summary: Performance Enhancement to Xalan distinct function
           Product: XalanJ2
           Version: 2.3
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: org.apache.xalan.lib
        AssignedTo: xalan-dev@xml.apache.org
        ReportedBy: lbecker10@yahoo.com


In Extensions.java, the distinct function uses the Hashtable object to track 
unique nodes.  The Hashtable object synchronizes all access to instances of 
itself.  In Xalan 2.3.1, the current code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    Hashtable stringTable = new Hashtable();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (!stringTable.containsKey(key))
      {
        stringTable.put(key, currNode);
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    return dist;
  }

Since the Hashtable instance is used locally within the method, there really is 
not need to use an object that synchronizes access to its instance.  To improve 
performance, a HashSet should be used.  Furthermore, it is a good idea to 
manually clear the HashSet at the end of the method to ensure the HashSet 
instance is garbage collected.  The enhanced code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (stringSet.add(key))
      {
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    stringSet.clear();

    return dist;
  }


If you want to "completely" ensure the HashSet is garbage collected (due a 
TransformerException being thrown), the following enhanced code could be used 
instead of the above enhanced code:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    try
    {
      Node currNode = ni.nextNode();

      while (currNode != null)
      {
        String key = myContext.toString(currNode);

        if (stringSet.add(key))
        {
          dist.addElement(currNode);
        }
        currNode = ni.nextNode();
      }
    }
    finally
    {
      stringSet.clear();
    }

    return dist;
  }