You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ji...@apache.org on 2004/05/28 18:49:00 UTC
[jira] Created: (XERCESJ-970) Large comments are extremely slow to parse
Message:
A new issue has been created in JIRA.
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESJ-970
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESJ-970
Summary: Large comments are extremely slow to parse
Type: Bug
Status: Unassigned
Priority: Minor
Project: Xerces2-J
Components:
XNI
Versions:
2.2.0
2.2.1
2.3.0
2.4.0
2.5.0
2.6.0
2.6.1
2.6.2
Assignee:
Reporter: Sean Griffin
Created: Fri, 28 May 2004 9:48 AM
Updated: Fri, 28 May 2004 9:48 AM
Environment: Windows XP running Java 1.4.2
Description:
Very large comments drastically increase the parsing time for both SAX and DOM implementations. Running the sax.Counter and dom.Counter samples with a 410KB file where the entire thing is uncommented results in parse times in the 100ms to 300ms range. However, if I comment out 95% of the file and run the same samples the parse times jump to between 40 and 50 seconds. I ran the same samples using the Aelfred parser shipped with Saxon 7.9 and, while the file with the large comment was slower than without the comment, it jumped by only 100ms or so.
I briefly compared the code between the two parsers, and they don't look significantly different when it comes to handling comments. The only main difference I noticed was around low/high byte character checks. I suspect it is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
[jira] Commented: (XERCESJ-970) Large comments are extremely slow to parse
Posted by ji...@apache.org.
The following comment has been added to this issue:
Author: Sean Griffin
Created: Tue, 15 Jun 2004 2:48 PM
Body:
You're right, the second time parsing was much faster. I ran the parsing through a profiler and noticed that the problem is localized to the XMLEntityScanner.scanData(String, XMLStringBuffer) method.
---------------------------------------------------------------------
View this comment:
http://issues.apache.org/jira/browse/XERCESJ-970?page=comments#action_36188
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESJ-970
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESJ-970
Summary: Large comments are extremely slow to parse
Type: Bug
Status: Unassigned
Priority: Minor
Project: Xerces2-J
Components:
XNI
Versions:
2.2.0
2.2.1
2.3.0
2.4.0
2.5.0
2.6.0
2.6.1
2.6.2
Assignee:
Reporter: Sean Griffin
Created: Fri, 28 May 2004 9:48 AM
Updated: Tue, 15 Jun 2004 2:48 PM
Environment: Windows XP running Java 1.4.2
Description:
Very large comments drastically increase the parsing time for both SAX and DOM implementations. Running the sax.Counter and dom.Counter samples with a 410KB file where the entire thing is uncommented results in parse times in the 100ms to 300ms range. However, if I comment out 95% of the file and run the same samples the parse times jump to between 40 and 50 seconds. I ran the same samples using the Aelfred parser shipped with Saxon 7.9 and, while the file with the large comment was slower than without the comment, it jumped by only 100ms or so.
I briefly compared the code between the two parsers, and they don't look significantly different when it comes to handling comments. The only main difference I noticed was around low/high byte character checks. I suspect it is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org