You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Kristian Barek <ba...@gmail.com> on 2009/02/10 19:15:50 UTC

anyURI MalformedURIException with UTF-8 characters - bug or feature?

Is Apache Axis correct in disallowing international (UTF-8) characters in
anyURI tags when processing responses to web services requests?

I've looked at the specification at
http://www.w3.org/TR/xmlschema-2/#anyURI, and as far as I can see,
anyURIs can contain any character, so long as the
resulting of URL encoring the URL is valid. This simple test case
illustrates the problem:

class Test {
 public static void main(String[] args) {
   try {
   org.apache.axis.types.URI uri = new org.apache.axis.types.URI("
http://www.utdanningsdirektoratet.no/templates/udir/TM_Læreplan.aspx?id=2100&laereplanid=707207<http://www.utdanningsdirektoratet.no/templates/udir/TM_L%C3%A6replan.aspx?id=2100&laereplanid=707207>
");
   } catch(Exception e) {
     System.out.println(e);
   }
 }
}

If anyone can provide me with any background / reasons on why Axis indeed is
correct in invalidating this URI, I would be very grateful.
(If I can point to which standards our web services vendor is breaking, then
I have a much better case to get them to stop putting norwegian characters
in their anyURIs. :)

Best regards,
Kristian Barek

RE: anyURI MalformedURIException with UTF-8 characters - bug or feature?

Posted by Tom Jordahl <tj...@adobe.com>.
Kristian,

Looking at the source for the Axis 1.x URI class, it seem there are two facts here

1.       We borrowed this code from the Xerces 2 source tree

2.       It jumps through a lot of hoops to make sure the right characters are in the URI, and the comment header in the file references the following RFCs:
RFC 2396 - http://www.ietf.org/rfc/rfc2396.txt?number=2396
RFC 2732 - http://www.ietf.org/rfc/rfc2732.txt?number=2732

Hope that helps.

Tom Jordahl

From: Kristian Barek [mailto:barexx@gmail.com]
Sent: Tuesday, February 10, 2009 1:16 PM
To: axis-dev@ws.apache.org
Subject: anyURI MalformedURIException with UTF-8 characters - bug or feature?

Is Apache Axis correct in disallowing international (UTF-8) characters in anyURI tags when processing responses to web services requests?

I've looked at the specification at http://www.w3.org/TR/xmlschema-2/#anyURI , and as far as I can see, anyURIs can contain any character, so long as the resulting of URL encoring the URL is valid. This simple test case illustrates the problem:

class Test {
 public static void main(String[] args) {
   try {
   org.apache.axis.types.URI uri = new org.apache.axis.types.URI("http://www.utdanningsdirektoratet.no/templates/udir/TM_Læreplan.aspx?id=2100&laereplanid=707207<http://www.utdanningsdirektoratet.no/templates/udir/TM_L%C3%A6replan.aspx?id=2100&laereplanid=707207>");
   } catch(Exception e) {
     System.out.println(e);
   }
 }
}

If anyone can provide me with any background / reasons on why Axis indeed is correct in invalidating this URI, I would be very grateful.
(If I can point to which standards our web services vendor is breaking, then I have a much better case to get them to stop putting norwegian characters in their anyURIs. :)

Best regards,
Kristian Barek