You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by jonathanq <jq...@abebooks.com> on 2011/01/24 22:33:05 UTC

Re: XStream and forcing ISO-8859-1 Encoding

I am sorry to bring this back from the dead. However I was just trying out
the unmarshal().xstream("ISO-8859-1") method introduced because of this
thread.  Unfortunately it still does not solve the problem (as of Camel
2.5.0)

>From non-camel routes, we have been publishing JMS messages and serializing
the message to XML as follows:

XStream xstream = new XStream(new DomDriver("ISO-8859-1"));
String messageXml = xstream.toXml(someObject);

Then using a producerTemplate to publish it to our messaging system.

When we used a route (like):

from(someIncomingEndpoint)
                .unmarshal().xstream("ISO-8859-1")
                .process(myUpdateProcessor);

Our processor received a deserialized message - but the content was not
correct.  It took strings that were serialized as ISO-8859-1 and then it
deserialized it as UTF-8.

I modified our route to introduce a new Processor (instead of the in-line
unmashal) that did the following:
String messageBody = exchange.getIn().getBody(String.class);
XStream xstream = new XStream(new DomDriver("ISO-8859-1"));
Object myObject = xstream.fromXml(messageBody );
exchange.getIn().setBody(myObject);

This works fine, the text our process receives is correct ISO-8859-1 and
nothing is garbled.

I set a breakpoint and stepped through the camel code with the in-line
unmarshal.  It does pass down the encoding specified (ISO-8859-1).  However
it constructs the XStream object using the default XppDriver (which you
can't specify an encoding on).  

According to the XStream documentation - the XppDriver (and others not
including DomDriver) rely on the underlying InputStream/OutputStream passed
to the XStream object to determine the encoding.

I found in this method of AbstractXStreamWrapper.java:

    public Object unmarshal(Exchange exchange, InputStream stream) throws
Exception {
        HierarchicalStreamReader reader =
createHierarchicalStreamReader(exchange, stream);
        try {
            return
getXStream(exchange.getContext().getClassResolver()).unmarshal(reader);
        } finally {
            reader.close();
        }
    }

The "HierarchicalStreamReader " that is created is of type:
com.thoughtworks.xstream.io.xml.StaxReader

When I stepped in to the "unmarshal" method the XStream class - I saw that
the reader passed in (the same StaxReader) has a property called "in" that
was of type: com.ctc.wstx.sr.ValidatingStreamReader

This, in turn, had 2 properties:

mDocInputEncoding = {java.lang.String@4784}"ISO-8859-1"
mDocXmlEncoding = {java.lang.String@4785}"UTF-8"

While I can't say that this is why the text is coming out as UTF-8 - but it
does seem suspicious that although the InputEncoding is set to ISO-8859-1,
the XmlEncoding is still "UTF-8".


In any event - for our own purposes we have created 2 Processor classes to
serialize/deserialize our XML.  We can't rely on the unmarshal/marshal
methods when it comes to encoding and our XML. 

Just wanted to pass along the news that the fix doesn't seem to have solved
the problem.

-- 
View this message in context: http://camel.465427.n5.nabble.com/XStream-and-forcing-ISO-8859-1-Encoding-tp478220p3355313.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: XStream and forcing ISO-8859-1 Encoding

Posted by Claus Ibsen <cl...@gmail.com>.
You can open a ticket in JIRA
http://camel.apache.org/support.html

If possible then a test case which demonstrates your issue is a great
start. That can be used to track down the issue and help solving it.

You are welcome to dig into the source code and provide a patch.



On Mon, Jan 24, 2011 at 10:33 PM, jonathanq <jq...@abebooks.com> wrote:
>
> I am sorry to bring this back from the dead. However I was just trying out
> the unmarshal().xstream("ISO-8859-1") method introduced because of this
> thread.  Unfortunately it still does not solve the problem (as of Camel
> 2.5.0)
>
> From non-camel routes, we have been publishing JMS messages and serializing
> the message to XML as follows:
>
> XStream xstream = new XStream(new DomDriver("ISO-8859-1"));
> String messageXml = xstream.toXml(someObject);
>
> Then using a producerTemplate to publish it to our messaging system.
>
> When we used a route (like):
>
> from(someIncomingEndpoint)
>                .unmarshal().xstream("ISO-8859-1")
>                .process(myUpdateProcessor);
>
> Our processor received a deserialized message - but the content was not
> correct.  It took strings that were serialized as ISO-8859-1 and then it
> deserialized it as UTF-8.
>
> I modified our route to introduce a new Processor (instead of the in-line
> unmashal) that did the following:
> String messageBody = exchange.getIn().getBody(String.class);
> XStream xstream = new XStream(new DomDriver("ISO-8859-1"));
> Object myObject = xstream.fromXml(messageBody );
> exchange.getIn().setBody(myObject);
>
> This works fine, the text our process receives is correct ISO-8859-1 and
> nothing is garbled.
>
> I set a breakpoint and stepped through the camel code with the in-line
> unmarshal.  It does pass down the encoding specified (ISO-8859-1).  However
> it constructs the XStream object using the default XppDriver (which you
> can't specify an encoding on).
>
> According to the XStream documentation - the XppDriver (and others not
> including DomDriver) rely on the underlying InputStream/OutputStream passed
> to the XStream object to determine the encoding.
>
> I found in this method of AbstractXStreamWrapper.java:
>
>    public Object unmarshal(Exchange exchange, InputStream stream) throws
> Exception {
>        HierarchicalStreamReader reader =
> createHierarchicalStreamReader(exchange, stream);
>        try {
>            return
> getXStream(exchange.getContext().getClassResolver()).unmarshal(reader);
>        } finally {
>            reader.close();
>        }
>    }
>
> The "HierarchicalStreamReader " that is created is of type:
> com.thoughtworks.xstream.io.xml.StaxReader
>
> When I stepped in to the "unmarshal" method the XStream class - I saw that
> the reader passed in (the same StaxReader) has a property called "in" that
> was of type: com.ctc.wstx.sr.ValidatingStreamReader
>
> This, in turn, had 2 properties:
>
> mDocInputEncoding = {java.lang.String@4784}"ISO-8859-1"
> mDocXmlEncoding = {java.lang.String@4785}"UTF-8"
>
> While I can't say that this is why the text is coming out as UTF-8 - but it
> does seem suspicious that although the InputEncoding is set to ISO-8859-1,
> the XmlEncoding is still "UTF-8".
>
>
> In any event - for our own purposes we have created 2 Processor classes to
> serialize/deserialize our XML.  We can't rely on the unmarshal/marshal
> methods when it comes to encoding and our XML.
>
> Just wanted to pass along the news that the fix doesn't seem to have solved
> the problem.
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/XStream-and-forcing-ISO-8859-1-Encoding-tp478220p3355313.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>



-- 
Claus Ibsen
-----------------
FuseSource
Email: cibsen@fusesource.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.blogspot.com/
Author of Camel in Action: http://www.manning.com/ibsen/