You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ji...@apache.org on 2004/05/28 18:49:00 UTC

[jira] Created: (XERCESJ-970) Large comments are extremely slow to parse

Message:

  A new issue has been created in JIRA.

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESJ-970

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESJ-970
    Summary: Large comments are extremely slow to parse
       Type: Bug

     Status: Unassigned
   Priority: Minor

    Project: Xerces2-J
 Components: 
             XNI
   Versions:
             2.2.0
             2.2.1
             2.3.0
             2.4.0
             2.5.0
             2.6.0
             2.6.1
             2.6.2

   Assignee: 
   Reporter: Sean Griffin

    Created: Fri, 28 May 2004 9:48 AM
    Updated: Fri, 28 May 2004 9:48 AM
Environment: Windows XP running Java 1.4.2

Description:
Very large comments drastically increase the parsing time for both SAX and DOM implementations.  Running the sax.Counter and dom.Counter samples with a 410KB file where the entire thing is uncommented results in parse times in the 100ms to 300ms range.  However, if I comment out 95% of the file and run the same samples the parse times jump to between 40 and 50 seconds.  I ran the same samples using the Aelfred parser shipped with Saxon 7.9 and, while the file with the large comment was slower than without the comment, it jumped by only 100ms or so.

I briefly compared the code between the two parsers, and they don't look significantly different when it comes to handling comments.  The only main difference I noticed was around low/high byte character checks.  I suspect it is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


[jira] Commented: (XERCESJ-970) Large comments are extremely slow to parse

Posted by ji...@apache.org.
The following comment has been added to this issue:

     Author: Sean Griffin
    Created: Tue, 15 Jun 2004 2:48 PM
       Body:
You're right, the second time parsing was much faster.  I ran the parsing through a profiler and noticed that the problem is localized to the XMLEntityScanner.scanData(String, XMLStringBuffer) method.
---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESJ-970?page=comments#action_36188

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESJ-970

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESJ-970
    Summary: Large comments are extremely slow to parse
       Type: Bug

     Status: Unassigned
   Priority: Minor

    Project: Xerces2-J
 Components: 
             XNI
   Versions:
             2.2.0
             2.2.1
             2.3.0
             2.4.0
             2.5.0
             2.6.0
             2.6.1
             2.6.2

   Assignee: 
   Reporter: Sean Griffin

    Created: Fri, 28 May 2004 9:48 AM
    Updated: Tue, 15 Jun 2004 2:48 PM
Environment: Windows XP running Java 1.4.2

Description:
Very large comments drastically increase the parsing time for both SAX and DOM implementations.  Running the sax.Counter and dom.Counter samples with a 410KB file where the entire thing is uncommented results in parse times in the 100ms to 300ms range.  However, if I comment out 95% of the file and run the same samples the parse times jump to between 40 and 50 seconds.  I ran the same samples using the Aelfred parser shipped with Saxon 7.9 and, while the file with the large comment was slower than without the comment, it jumped by only 100ms or so.

I briefly compared the code between the two parsers, and they don't look significantly different when it comes to handling comments.  The only main difference I noticed was around low/high byte character checks.  I suspect it is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org