You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Joseph Shraibman <jk...@selectacast.net> on 2001/11/06 02:16:52 UTC

accented characters and xerces j

I'm using Xerces 1.3.1

I have a file that contains 'รถ', ascii 246


When I try to parse the file using xerces I get:
: 151, 6: An invalid XML character (Unicode: 0x1b6803) was found in the element content of 
the document.

Presumably when java reads the file before it gets to xerces it converts 246 to that 
unicode value, but why?  I'm using the default (US) locale.

You can get the files involved from:
http://www.selectacast.net/~jks/xml/pr2.xml
http://www.selectacast.net/~jks/xml/pr2.txt is the original text file.


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org