You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Arjan Moraal <na...@ajmoraal.fastmail.net> on 2008/03/06 12:32:17 UTC
XMLConverter and default charset
The org.apache.camel.converter.jaxp.XMLConverter class has a method to
convert a String to a DOM Document. This method is automatically called when
for instance an XPath expression is run on a TextMessage received from the
JMS.
@Converter
public Document toDOMDocument(String text) throws IOException,
SAXException, ParserConfigurationException {
return toDOMDocument(text.getBytes());
}
The problem with this is that the String is converted to a byte[] using the
default character encoding of the platform (in my case CP-1252 on
WindowsXP). But the XML in the text message might have a different encoding
attribute in the header (<?xml version="1.0" encoding="UTF-8"?>), which can
cause SAXParser exceptions (Like: Invalid byte 1 of 1-byte UTF-8 sequence).
So shouldn't this toDOMDocument() method use either the encoding defined in
the XML to convert the String to byte[]?
Or change the encoding attribute in the XML header to the character encoding
used to generate the byte[]?
Thanks,
Arjan
--
View this message in context: http://www.nabble.com/XMLConverter-and-default-charset-tp15871372s22882p15871372.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: XMLConverter and default charset
Posted by Arjan Moraal <na...@ajmoraal.fastmail.net>.
James.Strachan wrote:
>
> I've modified the code so that we parse the XML in the String using a
> StringReader/InputSource instead to avoid converting to/from bytes. Do
> you think that should help?
>
Thanks James, using the StringReader solved the problem indeed.
Arjan
--
View this message in context: http://www.nabble.com/XMLConverter-and-default-charset-tp15871372s22882p15891097.html
Sent from the Camel - Users mailing list archive at Nabble.com.
Re: XMLConverter and default charset
Posted by James Strachan <ja...@gmail.com>.
On 06/03/2008, Arjan Moraal <na...@ajmoraal.fastmail.net> wrote:
>
> The org.apache.camel.converter.jaxp.XMLConverter class has a method to
> convert a String to a DOM Document. This method is automatically called when
> for instance an XPath expression is run on a TextMessage received from the
> JMS.
I guess when sending non UTF encoded XML then folks should send a
BytesMessage instead?
>
> @Converter
> public Document toDOMDocument(String text) throws IOException,
> SAXException, ParserConfigurationException {
> return toDOMDocument(text.getBytes());
> }
>
> The problem with this is that the String is converted to a byte[] using the
> default character encoding of the platform (in my case CP-1252 on
> WindowsXP). But the XML in the text message might have a different encoding
> attribute in the header (<?xml version="1.0" encoding="UTF-8"?>), which can
> cause SAXParser exceptions (Like: Invalid byte 1 of 1-byte UTF-8 sequence).
>
> So shouldn't this toDOMDocument() method use either the encoding defined in
> the XML to convert the String to byte[]?
> Or change the encoding attribute in the XML header to the character encoding
> used to generate the byte[]?
Great catch!
I've modified the code so that we parse the XML in the String using a
StringReader/InputSource instead to avoid converting to/from bytes. Do
you think that should help?
--
James
-------
http://macstrac.blogspot.com/
Open Source Integration
http://open.iona.com