You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Kristian Barek <ba...@gmail.com> on 2009/02/10 19:15:50 UTC
anyURI MalformedURIException with UTF-8 characters - bug or feature?
Is Apache Axis correct in disallowing international (UTF-8) characters in
anyURI tags when processing responses to web services requests?
I've looked at the specification at
http://www.w3.org/TR/xmlschema-2/#anyURI, and as far as I can see,
anyURIs can contain any character, so long as the
resulting of URL encoring the URL is valid. This simple test case
illustrates the problem:
class Test {
public static void main(String[] args) {
try {
org.apache.axis.types.URI uri = new org.apache.axis.types.URI("
http://www.utdanningsdirektoratet.no/templates/udir/TM_Læreplan.aspx?id=2100&laereplanid=707207<http://www.utdanningsdirektoratet.no/templates/udir/TM_L%C3%A6replan.aspx?id=2100&laereplanid=707207>
");
} catch(Exception e) {
System.out.println(e);
}
}
}
If anyone can provide me with any background / reasons on why Axis indeed is
correct in invalidating this URI, I would be very grateful.
(If I can point to which standards our web services vendor is breaking, then
I have a much better case to get them to stop putting norwegian characters
in their anyURIs. :)
Best regards,
Kristian Barek
RE: anyURI MalformedURIException with UTF-8 characters - bug or
feature?
Posted by Tom Jordahl <tj...@adobe.com>.
Kristian,
Looking at the source for the Axis 1.x URI class, it seem there are two facts here
1. We borrowed this code from the Xerces 2 source tree
2. It jumps through a lot of hoops to make sure the right characters are in the URI, and the comment header in the file references the following RFCs:
RFC 2396 - http://www.ietf.org/rfc/rfc2396.txt?number=2396
RFC 2732 - http://www.ietf.org/rfc/rfc2732.txt?number=2732
Hope that helps.
Tom Jordahl
From: Kristian Barek [mailto:barexx@gmail.com]
Sent: Tuesday, February 10, 2009 1:16 PM
To: axis-dev@ws.apache.org
Subject: anyURI MalformedURIException with UTF-8 characters - bug or feature?
Is Apache Axis correct in disallowing international (UTF-8) characters in anyURI tags when processing responses to web services requests?
I've looked at the specification at http://www.w3.org/TR/xmlschema-2/#anyURI , and as far as I can see, anyURIs can contain any character, so long as the resulting of URL encoring the URL is valid. This simple test case illustrates the problem:
class Test {
public static void main(String[] args) {
try {
org.apache.axis.types.URI uri = new org.apache.axis.types.URI("http://www.utdanningsdirektoratet.no/templates/udir/TM_Læreplan.aspx?id=2100&laereplanid=707207<http://www.utdanningsdirektoratet.no/templates/udir/TM_L%C3%A6replan.aspx?id=2100&laereplanid=707207>");
} catch(Exception e) {
System.out.println(e);
}
}
}
If anyone can provide me with any background / reasons on why Axis indeed is correct in invalidating this URI, I would be very grateful.
(If I can point to which standards our web services vendor is breaking, then I have a much better case to get them to stop putting norwegian characters in their anyURIs. :)
Best regards,
Kristian Barek