You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2007/08/21 21:25:31 UTC

[jira] Assigned: (XERCESJ-1264) Reduce performance penalty for using an EOFException to signal the end of the document.

     [ https://issues.apache.org/jira/browse/XERCESJ-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Glavassevich reassigned XERCESJ-1264:
---------------------------------------------

    Assignee: Michael Glavassevich

> Reduce performance penalty for using an EOFException to signal the end of the document.
> ---------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1264
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1264
>             Project: Xerces2-J
>          Issue Type: Improvement
>          Components: JAXP (javax.xml.parsers)
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Assignee: Michael Glavassevich
>
> As part of its normal control flow the XMLEntityScanner will throw an EOFException when it reaches the end of the document.  For small documents, this can take up as much as 20-25% of the total execution time in the parser.  Without messing with the current programming model, most of this time can be recovered by caching the exception (which eliminates the very expensive fillInStackTrace() on creation).
> Wolfgang Hoschek's post [1] to the j-dev list on this subject in 2004:
> =====================================================
> I have a server app that parsers millions of smallish documents.
> Performance has been improved at lot by reusing XMLReaders. It's pretty good but could perhaps get better when studying the (perhaps dubious?) hints given by the java -server -Xprof snippet below (JDK 1.5 RC, xerces CVS head, not using the JDK internal xerces which appears to be twice as slow in this case, for whatever reason).
> Accordingly, the theory is that throwing an (artifical) EOFException in XMLEntityScanner.load() at the end of each document consumes some 25% of the total execution time. Probably due too the heavy nature of exceptions and in particular Throwable.fillInStackTrace(). Would it perhaps be possibly (and correct) to avoid raising artificial exceptions for what appears to be normal program control flow (the documents and streams are fine)?
> Here is the trace snippet:
>           Stub + native   Method
>   28.6%     0  +   487    java.lang.Throwable.fillInStackTrace
>   28.6%     0  +   487    Total stub
>    Thread-local ticks:
>    0.1%     1             Blocked (of total)
>    0.1%     2             Class loader
>    0.1%     2             Compilation
>    0.2%     3             Unknown: thread_state
> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>    Thread-local ticks:
> 100.0%     1             Blocked (of total)
> Global summary of 35.44 seconds:
> 100.0%  1718             Received ticks
>    0.7%    12             Received GC ticks
>    9.7%   167             Compilation
>    0.1%     2             Class loader
>    0.2%     3             Unknown code
> real    0m35.715s
> user    0m34.170s
> sys     0m0.190s
> TRACE 300347:
>          java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown  
> line)
>          java.lang.Throwable.<init>(Throwable.java:181)
>          java.lang.Exception.<init>(Exception.java:29)
>          java.io.IOException.<init>(IOException.java:28)
>          java.io.EOFException.<init>(EOFException.java:32)
>          org.apache.xerces.impl.XMLEntityScanner.load(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown  
> Source>:Unknown line)
>           
> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.dis 
> patch(<Unknown Source>:Unknown line)
>           
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(<Unkn 
> own Source>:Unknown line)
>          org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.parsers.XMLParser.parse(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown  
> Source>:Unknown line)
>          nu.xom.Builder.build(Builder.java:786)
>          nu.xom.Builder.build(Builder.java:569)
>          gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
> I guess the relevant block is
> XMLEntityScanner.load(...):
>              ...
>              if (changeEntity) {
>                  fEntityManager.endEntity();
>                  if (fCurrentEntity == null) {
>                      throw new EOFException();
>                  }
>                  // handle the trailing edges
>                  if (fCurrentEntity.position == fCurrentEntity.count) {
>                      load(0, true);
>                  }
>              }
> [1] http://mail-archives.apache.org/mod_mbox/xerces-j-dev/200409.mbox/%3c25BEC610-FD4A-11D8-AA38-000A95BD16CE@lbl.gov%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org