You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-users@xalan.apache.org by Graeme Ing <gr...@inclue.com> on 2006/03/09 00:29:58 UTC

FW: Xerces remapping &#xxxx;

 

Hello all,

 

I’m using Xerces 2.7 and I’m trying to parse the following snippet from my XML file:

 

<title>Junk Mail - just how &#8220;heavy&#8221; a problem is it?</title>

 

The xml header/encoding on the file is:

 

<?xml version="1.0" encoding="UTF-8"?>

 

When I parse this and walk the DOM and extract the contents of this title node, I get back:

 

Junk Mail - just how â€œheavyâ€ a problem is it?

Where the special characters are decimal 30,128,100 and 30,128,99

 

Why is Xerces interpreting the &#xxxx; codes and more importantly, how do I stop it? :-)

 

Here is my Xerces setup code:

 

      m_parser = new XercesDOMParser();

      m_parser->setValidationScheme( XercesDOMParser::Val_Never );

      m_parser->setDoNamespaces( false );

      m_parser->setDoSchema( false );

      m_errorHandler = (ErrorHandler*) new HandlerBase();

      m_parser->setErrorHandler( m_errorHandler );

 

Hope someone can help, thanks a lot!!

 

Graeme Ing

Re: FW: Xerces remapping &#xxxx;

Posted by David Bertoni <db...@apache.org>.

Graeme Ing wrote:
>  
> 
> Hello all,
> 
>  
> 
> I’m using Xerces 2.7 and I’m trying to parse the following snippet from 
> my XML file:
> 
>  
> 
> <title>Junk Mail - just how &#8220;heavy&#8221; a problem is it?</title>
> 
>  
> 
> The xml header/encoding on the file is:
> 
>  
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
>  
> 
> When I parse this and walk the DOM and extract the contents of this 
> title node, I get back:
> 
>  
> 
> Junk Mail - just how â€œheavyâ€ a problem is it?
> 

This is a question for the Xerces-C list, not for the Xalan-C list.

Dave