You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Toby H Ferguson <to...@sun.com> on 2001/01/02 15:35:51 UTC

RE: About XML encoding

Suk Tae Kyung,

you *must* use an encoding in which all your various languages can be
encoded if they're all going to be in the same file - however, you can break
up the file into different files, each with their own encoding, and that way
get around your problem with UTF-8

Why do you think the UTF-8 data is broken? THere are not many editors which
understand UTF-8 - I normally read UTF-8 documents using browsers, which are
UTF-8 aware.

Toby
  -----Original Message-----
  From: Suk Tae Kyung [mailto:tkstone@penta.co.kr]
  Sent: Tuesday, December 26, 2000 10:50 PM
  To: Xerces
  Subject: Q:About XML encoding


  Hi, Xerces Developers. I have a question on the encoding of XML.
  Although this question is not on Xerces, it has relation to Xerces.


  To specify encoding info, one should write xml like this...

     <?xml version="1.0" encoding="some encoding" ?>
     <~~~>
     </~~~>

  If i use Korean data, i should specify "euc-kr" encoding. But
  if i use Korean and Chinease and Japanese in one XML, which encoding
  should i set?
  It would be possible that all data is transformed into UTF-8 and
specifying "UTF-8"
  as encoding. But this method is not good because all data is transformed
into UTF-8
  first and this data is not good at reading.(data seems to be broken)

  Does any one have good idea?
  Thanks in advance.

  Regards.

  Suk Tae Kyung

Re: About XML encoding

Posted by Andy Clark <an...@apache.org>.
It's nice that Notepad in Win2K now can write its output as UTF-8,
as well as UTF-16.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org