You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@ws.apache.org by Rune Froysa <ru...@usit.uio.no> on 2002/11/29 12:50:36 UTC

Bug: MinML silently ignores encoding

Unless sax.driver is set, XmlRpc will default to the MinML sax driver.
This driver silently ignores the encoding specification in the first
line of the xml file, and the default character encoding does not seem
to be utf-8.

If this driver is supposed to still be the default driver for XmlRpc,
then I sugges that it should detect utf-8 usage and emit a warning to
save developers from having to do a lot of debugging to figure out why
their utf-8 based code does not work.

Pythons xmlrpclib seems to default to utf-8.  As far as I can see, the
Xml-RPC spec does not specify any character set.

Regards,
Rune Frøysa

Re: MinML silently ignores encoding

Posted by John Wilson <tu...@wilson.co.uk>.
----- Original Message -----
From: "Rune Froysa" <ru...@usit.uio.no>
To: <rp...@xml.apache.org>
Sent: Friday, November 29, 2002 11:50 AM
Subject: Bug: MinML silently ignores encoding


> Unless sax.driver is set, XmlRpc will default to the MinML sax driver.
> This driver silently ignores the encoding specification in the first
> line of the xml file, and the default character encoding does not seem
> to be utf-8.
>
> If this driver is supposed to still be the default driver for XmlRpc,
> then I sugges that it should detect utf-8 usage and emit a warning to
> save developers from having to do a lot of debugging to figure out why
> their utf-8 based code does not work.
>
> Pythons xmlrpclib seems to default to utf-8.  As far as I can see, the
> Xml-RPC spec does not specify any character set.

The XML-RPC character spec says that the contents of the message must be
ASCII characters. The Apache XML-RPC implementation extends this spec by
supporting ISO8859/1 encoding. Note that the encoding of ASCII characters is
identical in UTF-8 and ISO8859/1.

If you want to use non ASCII characters in a message then the best and
safest way of doing so is to escape those characters with Unicode values >
127 as &#nnnn; This will maximise your chance of interoperating between XML
implementations. Even so some implementations will fail when encountering
these entities.

I believe that these has been code committed to generate the &#nnnn;
escaping in some circumstances but I'm not sure that the XML writer
currently escapes all non ASCII characters.

The next version of MinML will recognise and use the encoding declaration.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk


Re: MinML silently ignores encoding

Posted by Praveen Udawat <pr...@gmo.jp>.
Hi Rune,

I also had the same problem of UTF-8 encoding. I needed to change  the
XmlRpc.java for my
project implementation.

Praveen
----- Original Message -----
From: "Rune Froysa" <ru...@usit.uio.no>
To: <rp...@xml.apache.org>
Sent: Friday, November 29, 2002 8:50 PM
Subject: Bug: MinML silently ignores encoding


> Unless sax.driver is set, XmlRpc will default to the MinML sax driver.
> This driver silently ignores the encoding specification in the first
> line of the xml file, and the default character encoding does not seem
> to be utf-8.
>
> If this driver is supposed to still be the default driver for XmlRpc,
> then I sugges that it should detect utf-8 usage and emit a warning to
> save developers from having to do a lot of debugging to figure out why
> their utf-8 based code does not work.
>
> Pythons xmlrpclib seems to default to utf-8.  As far as I can see, the
> Xml-RPC spec does not specify any character set.
>
> Regards,
> Rune Frøysa
>


Re: MinML silently ignores encoding

Posted by John Wilson <tu...@wilson.co.uk>.
----- Original Message -----
From: "Rune Froysa" <ru...@usit.uio.no>
To: <rp...@xml.apache.org>
Sent: Friday, November 29, 2002 11:50 AM
Subject: Bug: MinML silently ignores encoding


> Unless sax.driver is set, XmlRpc will default to the MinML sax driver.
> This driver silently ignores the encoding specification in the first
> line of the xml file, and the default character encoding does not seem
> to be utf-8.
>
> If this driver is supposed to still be the default driver for XmlRpc,
> then I sugges that it should detect utf-8 usage and emit a warning to
> save developers from having to do a lot of debugging to figure out why
> their utf-8 based code does not work.
>
> Pythons xmlrpclib seems to default to utf-8.  As far as I can see, the
> Xml-RPC spec does not specify any character set.

The XML-RPC character spec says that the contents of the message must be
ASCII characters. The Apache XML-RPC implementation extends this spec by
supporting ISO8859/1 encoding. Note that the encoding of ASCII characters is
identical in UTF-8 and ISO8859/1.

If you want to use non ASCII characters in a message then the best and
safest way of doing so is to escape those characters with Unicode values >
127 as &#nnnn; This will maximise your chance of interoperating between XML
implementations. Even so some implementations will fail when encountering
these entities.

I believe that these has been code committed to generate the &#nnnn;
escaping in some circumstances but I'm not sure that the XML writer
currently escapes all non ASCII characters.

The next version of MinML will recognise and use the encoding declaration.

John Wilson
The Wilson Partnership
http://www.wilson.co.uk


Re: MinML silently ignores encoding

Posted by Praveen Udawat <pr...@gmo.jp>.
Hi Rune,

I also had the same problem of UTF-8 encoding. I needed to change  the
XmlRpc.java for my
project implementation.

Praveen
----- Original Message -----
From: "Rune Froysa" <ru...@usit.uio.no>
To: <rp...@xml.apache.org>
Sent: Friday, November 29, 2002 8:50 PM
Subject: Bug: MinML silently ignores encoding


> Unless sax.driver is set, XmlRpc will default to the MinML sax driver.
> This driver silently ignores the encoding specification in the first
> line of the xml file, and the default character encoding does not seem
> to be utf-8.
>
> If this driver is supposed to still be the default driver for XmlRpc,
> then I sugges that it should detect utf-8 usage and emit a warning to
> save developers from having to do a lot of debugging to figure out why
> their utf-8 based code does not work.
>
> Pythons xmlrpclib seems to default to utf-8.  As far as I can see, the
> Xml-RPC spec does not specify any character set.
>
> Regards,
> Rune Frøysa
>