You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Joseph Kesselman <ke...@us.ibm.com> on 2003/01/03 05:27:28 UTC
Re: Valid XML characters
On Thursday, 12/26/2002 at 07:23 ZE2, "Dima Gutzeit" <di...@mailvision.net>
wrote:
> Sometimes when parsing XML files I get an error message(exception) about
> "invalid Unicode characters" , is there any way to filter those before
parsing ?
There's no way to do that within the parser. "If it contains illegal
characters, it isn't XML" and the error messages are entirely correct.
You could, of course, write your own stream filter and pass the data
through that, then use its output as the input to the parser. That's
fairly straightforward Java coding. The problem would be deciding what
you're going to do with those characters when you see them -- if you just
discard them you may be changing the meaning of the document, and if you
turn them into some sort of private escape sequence only applications
which understand that convention will be able to do anything with them.
Fixing the source documents really is the cleanest answer.
For what it's worth: It has been proposed that future versions of XML
*may* relax the forbidden-character restrictions, but there's still no
firm consensus on whether that change would be desirable or what version
of XML it might find its way into.
______________________________________
Joe Kesselman / IBM Research
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org