You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@servicemix.apache.org by "Juergen Mayrbaeurl (JIRA)" <ji...@apache.org> on 2006/04/27 18:19:37 UTC

[jira] Updated: (SM-414) SourceTransformer cant transform to DOM with non US ASCII characters like 'ä' or 'ü'

     [ https://issues.apache.org/activemq/browse/SM-414?page=all ]

Juergen Mayrbaeurl updated SM-414:
----------------------------------

    Attachment: SampleInMessage.xml

Sample In Message with non US-ASCII characters

> SourceTransformer cant transform to DOM with non US ASCII characters like 'ä' or 'ü'
> ------------------------------------------------------------------------------------
>
>          Key: SM-414
>          URL: https://issues.apache.org/activemq/browse/SM-414
>      Project: ServiceMix
>         Type: Bug

>   Components: servicemix-core
>     Versions: 3.0-M1, 3.0-M2, 3.0, incubation
>  Environment: W2K, J2SE 1.4.2, Xerces 2.7.1, default locale of OS with character set 'windows-1252'
>     Reporter: Juergen Mayrbaeurl
>     Priority: Blocker
>      Fix For: 3.0, incubation
>  Attachments: SampleInMessage.xml, SourceTransformer-sources.zip
>
>
> The class org.apache.servicemix.jbi.jaxp.SourceTransformer, which belongs to the core classes of ServiceMix and is used very often, has major problems transforming Source to DOM data structures, when the source contains non US-ASCII charactes like 'ä' or 'ü'. 
> The class uses DocumentBuilders (see method 'public DOMSource toDOMSourceFromStream(StreamSource source) throws ParserConfigurationException, IOException, SAXException') for the transformation and uses the method 'public Document parse(InputStream is, String systemId) throws SAXException, IOException' without explicitly telling the DocumentBuilder the character encoding it should use. This results in fatal errors (exceptions) returned by the DocumentBuilder (Xerces 2.7.1), because it encounters invalid character code sequences (especially with UTF-8 and multi-byte characters like 'ä' or 'ö'). This means that you can't use non US-ASCII characters in messages, as soon as ServiceMix uses an instance of the class SourceTransformer to do any transformation to DOM. This is the case when tracing messages in the DeliveryChannel or evaluating an XPath expression for e.g. Content based routing. 
> The solution to this problem is straight forward: Tell the DocumentBuilder the character encoding it has to use. Looks like:
>     public DOMSource toDOMSourceFromStream(StreamSource source) throws ParserConfigurationException, IOException,
>             SAXException {
>         DocumentBuilder builder = createDocumentBuilder();
>         String systemId = source.getSystemId();
>         Document document = null;
>         InputStream inputStream = source.getInputStream();
>         if (inputStream != null) {
>             InputSource inputsource = new InputSource(inputStream);
>             inputsource.setSystemId(systemId);
>             inputsource.setEncoding(defaultCharEncodingName);  // <-- Very important
>             
>             document = builder.parse(inputsource);
>         }
>         else {
>             Reader reader = source.getReader();
>             if (reader != null) {
>                 document = builder.parse(new InputSource(reader));
>             }
>             else {
>                 throw new IOException("No input stream or reader available");
>             }
>         }
>         return new DOMSource(document, systemId);
>     }
> I've attached the original source file of SourceTransformer (3.0 SNAPSHOT, 2006-04-20) and the changed (Unfortunately I can't create a real patch).
> Kind regards
> Juergen

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://issues.apache.org/activemq/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira