You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2003/07/11 09:07:19 UTC

DO NOT REPLY [Bug 21491] - UTF-8 output is much slower for large chunks of character output

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21491>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21491

UTF-8 output is much slower for large chunks of character output





------- Additional Comments From minchau@ca.ibm.com  2003-07-11 07:07 -------
In the case that the character, when converted into UTF-8 encoded bytes, may 
exceed the internal buffer the bytes are written one at a time directly to the 
OutputStream.  This works, but the sheer volume of the number of calls is much 
slower than accumulating the bytes in an internal buffer.

The "trick" in the patch that I am about to append is to logically break the 
character array up into chunks, each of which will not blow the output buffer, 
and for the write() method to call itself recursively with these chunks. This 
retains the buffering performance.