You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/09/03 08:29:00 UTC
[jira] [Commented] (CAMEL-12769) Combination of File consumer with
charset and Split DSL with XPath doesn't parse XML correctly
[ https://issues.apache.org/jira/browse/CAMEL-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601889#comment-16601889 ]
ASF GitHub Bot commented on CAMEL-12769:
----------------------------------------
GitHub user tadayosi opened a pull request:
https://github.com/apache/camel/pull/2505
CAMEL-12769: Combination of File consumer with charset and Split DSL …
…with XPath doesn't parse XML correctly
https://issues.apache.org/jira/browse/CAMEL-12769
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tadayosi/camel CAMEL-12769
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/camel/pull/2505.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2505
----
commit b8dd9d2c9a4f9a0616ed3016b91fa547b138ae0f
Author: Tadayoshi Sato <sa...@...>
Date: 2018-09-03T08:24:05Z
CAMEL-12769: Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly
----
> Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly
> ----------------------------------------------------------------------------------------------
>
> Key: CAMEL-12769
> URL: https://issues.apache.org/jira/browse/CAMEL-12769
> Project: Camel
> Issue Type: Bug
> Components: camel-core
> Affects Versions: 2.22.0
> Reporter: Tadayoshi Sato
> Assignee: Tadayoshi Sato
> Priority: Major
>
> This route:
> {code:java}
> from("file:/...?charset=iso-8859-1&&include=.*\.xml")
> .split(xpath("/foo/bar"))
> ...
> {code}
> does not read and split XML like the following with the correct encoding:
> {code:xml}
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <foo>
> <bar>abc</bar>
> <bar>xyz</bar>
> <bar>åäö</bar>
> </root>
> {code}
> The root cause is due to the spec of {{IOConverter.toInputStream(File, String)}}:
> [https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119]
> which was clarified at CAMEL-8346 and CAMEL-8356.
> This method converts a {{File}} with a charset to an {{InputStream}} with the *JVM default charset* encoding whatever the format of the file is. However, in turn [XmlConverter.toDOMDocument(...)|https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/jaxp/XmlConverter.java#L870-L872] uses {{DocumentBuilder}} to convert the input stream to a DOM {{Document}} and {{DocumentBuilder}} is aware of the XML declaration:
> {code:xml}
> <?xml version="1.0" encoding="ISO-8859-1"?>
> {code}
> to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)