You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Glenn Marcy <gm...@us.ibm.com> on 2001/08/02 00:07:53 UTC

Re: UTF-8 parsing faster than US-ASCII

Xerces has hard-wired encoding support for UTF-8.  US-ASCII, ISO-8859-1,
ISO-Latin-1, etc. are passed to the Java JDK,
for which the results may differ on different environments/platforms.

-Glenn



                                                                                                                   
                    "Sandeep                                                                                       
                    Randhawa"            To:     <xe...@xml.apache.org>                                     
                    <sand@glide.ne       cc:                                                                       
                    t.in>                Subject:     UTF-8 parsing faster than US-ASCII                           
                                                                                                                   
                    08/01/2001                                                                                     
                    12:07 PM                                                                                       
                    Please respond                                                                                 
                    to                                                                                             
                    xerces-j-dev                                                                                   
                                                                                                                   
                                                                                                                   



Hi,
    Somebody noticed this on Netbeans. I did a few my tests of my own and
found similar results. Is this a known issue? Very contrary to the docs.

Sandeep Randhawa

Sandeep Randhawa wrote:
>
> <?xml encoding="UTF-8" ?>
>
> If there is no specific reason to use "utf-8", stick with "us-ascii".
> Parsing is faster. Also, I noticed all of Netbeans Settings are stored
> without encoding attribute in the prolog. Xerces defaults to "utf-8" if
no
> encoding attribute is present. So for Petr Nejedly, add the attribute in
the
> prolog, we might catch a few more milliseconds.

I tried it, but with the opposite results.
I made a simple test that created a filesystem over all the modules
layers (it is a part of IDE startup sequence) and measured the time.
Then I replaced all the encoding="UTF-8" with us-ascii and added
it where it was missing and the parsing was slower then, but not much.
so I guess we could stick with using utf-8.

--
Petr Nejedly
NetBeans/Sun Microsystems
http://www.netbeans.org




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org