You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/09/03 08:29:00 UTC

[jira] [Commented] (CAMEL-12769) Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly

    [ https://issues.apache.org/jira/browse/CAMEL-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601889#comment-16601889 ] 

ASF GitHub Bot commented on CAMEL-12769:
----------------------------------------

GitHub user tadayosi opened a pull request:

    https://github.com/apache/camel/pull/2505

    CAMEL-12769: Combination of File consumer with charset and Split DSL …

    …with XPath doesn't parse XML correctly
    
    https://issues.apache.org/jira/browse/CAMEL-12769

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tadayosi/camel CAMEL-12769

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/camel/pull/2505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2505
    
----
commit b8dd9d2c9a4f9a0616ed3016b91fa547b138ae0f
Author: Tadayoshi Sato <sa...@...>
Date:   2018-09-03T08:24:05Z

    CAMEL-12769: Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly

----


> Combination of File consumer with charset and Split DSL with XPath doesn't parse XML correctly
> ----------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-12769
>                 URL: https://issues.apache.org/jira/browse/CAMEL-12769
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.22.0
>            Reporter: Tadayoshi Sato
>            Assignee: Tadayoshi Sato
>            Priority: Major
>
> This route:
> {code:java}
> from("file:/...?charset=iso-8859-1&&include=.*\.xml")
>     .split(xpath("/foo/bar"))
>         ...
> {code}
> does not read and split XML like the following with the correct encoding:
> {code:xml}
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <foo>
> 	<bar>abc</bar>
> 	<bar>xyz</bar>
> 	<bar>åäö</bar>
> </root>
> {code}
> The root cause is due to the spec of {{IOConverter.toInputStream(File, String)}}:
>  [https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/IOConverter.java#L84-L119]
>  which was clarified at CAMEL-8346 and CAMEL-8356.
> This method converts a {{File}} with a charset to an {{InputStream}} with the *JVM default charset* encoding whatever the format of the file is. However, in turn [XmlConverter.toDOMDocument(...)|https://github.com/apache/camel/blob/camel-2.22.1/camel-core/src/main/java/org/apache/camel/converter/jaxp/XmlConverter.java#L870-L872] uses {{DocumentBuilder}} to convert the input stream to a DOM {{Document}} and {{DocumentBuilder}} is aware of the XML declaration:
> {code:xml}
> <?xml version="1.0" encoding="ISO-8859-1"?>
> {code}
> to detect the file encoding, and there is a mismatch between the actual encoding of the input stream (JVM default) and the encoding declared in XML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)