You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Tadayoshi Sato (JIRA)" <ji...@apache.org> on 2018/09/03 08:23:00 UTC

[jira] [Created] (CAMEL-12769) Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly

Tadayoshi Sato created CAMEL-12769:
--------------------------------------

             Summary: Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly
                 Key: CAMEL-12769
                 URL: https://issues.apache.org/jira/browse/CAMEL-12769
             Project: Camel
          Issue Type: Bug
          Components: camel-core
    Affects Versions: 2.22.0
            Reporter: Tadayoshi Sato
            Assignee: Tadayoshi Sato


This route:
{code:java}
from("file:/...?charset=iso-8859-1&&include=.*\.xml")
    .split(xpath("/foo/bar"))
        ...
{code}
does not read and split XML like the following with the correct encoding:
{code:xml}
<?xml version="1.0" encoding="ISO-8859-1"?>
<foo>
	<bar>abc</bar>
	<bar>xyz</bar>
	<bar>åäö</bar>
</root>
{code}

The root cause is due to the spec of {{IOConverter.toInputStream(File, String)}}:
https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119
which was clarified at CAMEL-8346 and CAMEL-8356.

This method converts a {{File}} with a charset to an {{InputStream}} with the *JVM default charset* encoding whatever the format of the file is. However, in turn [XmlConverter.toDOMDocument(...)|https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/jaxp/XmlConverter.java#L870-L872] uses {{DocumentBuilder}} to convert the input stream to a DOM {{Document}} and {{DocumentBuilder}} is aware of the XML declaration:
{code:xml}
<?xml version="1.0" encoding="ISO-8859-1"?>
{code}
to detect the file encoding, and there is a mismatch between the actual encoding of the input stream and the encoding declared in XML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)