You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xalan.apache.org by "Ushakov, Sergey N" <us...@int.com.ru> on 2002/10/01 17:51:03 UTC

Re: Character set ignorance

Richard, it is still difficult to make any conclusions regarding your issue
as the information you have posted misses such important parts as XML and
XSL prologues/declarations. If you post something more complete it would be
much easier... The same about verbatim complaints from Xalan.

Regards,
Sergey

----- Original Message -----
From: "Richard Rowell" <ri...@bowmansystems.com>
To: "xalan-j-users" <xa...@xml.apache.org>
Sent: Monday, September 30, 2002 6:44 PM
Subject: Re: Character set ignorance

On Thu, 2002-09-26 at 19:05, Ushakov, Sergey N wrote:
> Richard, could you post some minimal XML and XSL input and corresponding
> output? It would be easier to judge then...
>

The offending record looked like this:
<case_assessment date_added="2001-06-26T00:00:00"
date_updated="2001-06-26T00:00:00">
             <notes>I don~Rt believe he is clean an...</notes>
</case_assessment>

The ~R is how vim displays the forward tick (my keyboard does not have a
key for that character).  I am suprised that xerces did not reject the
file, as I pre-process with a simple class that is just validating
Xerces SAX parser (to assure the input is valid XML before it ever gets
into the import process). I had hoped it would have kicked out an
exception since this is clearly not UTF8.

Is there any way to verify that the input is UTF8 before it gets to
Xalan?  If Xerces can not/will not point out problems with character
sets, perhaps someone knows of some other tool that will?