You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by Derek Wueppelmann <dw...@newswire.ca> on 2002/01/08 19:48:33 UTC

UTF-8 encodings.

I seem to be having problems reading an XML file that contains accented
characters. The file says that the encoding should be UTF-8 whcih works
just fine for all the other files without accents. Here is a brief
summary of what I am running.

Xerces-c 1.3
Xerces-p 1.3.2

These were the only versions that would compile under Redhat 6.2. I
can't upgrade Perl since we are using alot of modules that were compiled
to work with perl 5.005.

The problem is that any field that contains the accented characters
simply comes up as being blank and containing no data after I parse it
with validation on.

I have tried to use character enties, using the iso8859-1 encoding of
the characters and two different ways to UTF-8 encode the charicters,
all of these methods came back with the same result. Any help would be
greatly appreciated.

-- 
This e-mail transmission is intended for the addressee indicated 
above. It may contain information that is privileged, confidential or 
otherwise protected from disclosure. Any review, dissemination, or 
use of this transmission or its contents by persons other  than the
addressee is strictly prohibited.
---------------------------------------------------------------------
(0       Derek Wueppelmann             derek.wueppelmann@newswire.ca
 D--      Canada NewsWire Ltd.        http://www.newswire.ca 
/ )                     Work: (416) 863-2107
---------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: UTF-8 encodings.

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Derek Wueppelmann" <dw...@newswire.ca> writes:

> I seem to be having problems reading an XML file that contains accented
> characters. The file says that the encoding should be UTF-8 whcih works
> just fine for all the other files without accents. Here is a brief
> summary of what I am running.
> 
> Xerces-c 1.3
> Xerces-p 1.3.2
> 
> These were the only versions that would compile under Redhat 6.2. I
> can't upgrade Perl since we are using alot of modules that were compiled
> to work with perl 5.005.

Perl didn't have support for UTF-8 until 5.6.0, you *must* upgrade if
you plan to use unicode. You can compile perl with compatibility mode
so that it will run your previous 5.005 modules as well.

You must upgrade to Xerces-C-1.5.2 (*not* 1.6.0), and
XML::Xerces-1.5.2_0. This is first release that will support any
encoding besides ASCII.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org