You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Claus Ibsen (JIRA)" <ji...@apache.org> on 2018/09/04 14:38:00 UTC

[jira] [Resolved] (CAMEL-12769) Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly

     [ https://issues.apache.org/jira/browse/CAMEL-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claus Ibsen resolved CAMEL-12769.
---------------------------------
    Resolution: Fixed

Thanks for reporting and the PR

> Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly
> ----------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-12769
>                 URL: https://issues.apache.org/jira/browse/CAMEL-12769
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.22.0
>            Reporter: Tadayoshi Sato
>            Assignee: Tadayoshi Sato
>            Priority: Major
>             Fix For: 2.21.3, 2.23.0, 2.22.2
>
>
> This route:
> {code:java}
> from("file:/...?charset=iso-8859-1&&include=.*\.xml")
>     .split(xpath("/foo/bar"))
>         ...
> {code}
> does not read and split XML like the following with the correct encoding:
> {code:xml}
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <foo>
> 	<bar>abc</bar>
> 	<bar>xyz</bar>
> 	<bar>åäö</bar>
> </root>
> {code}
> The root cause is due to the spec of {{IOConverter.toInputStream(File, String)}}:
>  [https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119]
>  which was clarified at CAMEL-8346 and CAMEL-8356.
> This method converts a {{File}} with a charset to an {{InputStream}} with the *JVM default charset* encoding whatever the format of the file is. However, in turn [XmlConverter.toDOMDocument(...)|https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/jaxp/XmlConverter.java#L870-L872] uses {{DocumentBuilder}} to convert the input stream to a DOM {{Document}} and {{DocumentBuilder}} is aware of the XML declaration:
> {code:xml}
> <?xml version="1.0" encoding="ISO-8859-1"?>
> {code}
> to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)