You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Neil Graham <ne...@ca.ibm.com> on 2003/07/21 22:45:20 UTC

recent tightening of the URI implementation and the JAXP TCK

Hi all,

Recently, Michael's been working hard to make our formerly rather woeful
URI implementation conform more closely to the relevant RFC's.  I just
noticed some JAXP TCK tests that try and test the Schema anyURI type that
have started to fail as a result.  Now I have no tremendous expertise in
the area of URI validation, but from what I've gleaned so far, what
Michael's done looks quite correct.

I know there's lots of Sun folks on the list; I wonder if anyone would be
willing to run the TCK and bring to our attention any areas in which the
new code doesn't appear to conform to the Schema specs?  With a release
scheduled for the end of next week, it seems pretty important to straighten
this out as soon as possible; it would certainly be unfortunate if any
correct changes had to be pulled back because of a TCK-compliance issue.

Cheers!
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: recent tightening of the URI implementation and the JAXP TCK

Posted by Michael Glavassevich <mr...@apache.org>.
Hi everyone,

I would be curious to know which URI tests are failing.

Here's a summary of changes that I've made:

1. '[' and ']' added in RFC 2732, are not allowed in path segments.
2. No URI can begin with a ':'.
3. The scheme specific part of a URI cannot be empty, so any URIs of the
form scheme: or scheme:#fragment are not valid according to the BNF in RFC
2396.
4. Fixed relative URI resolution in the case where the base URI has a null
path. (This shouldn't show up in schema validation.)
5. Whitespace (even escaped as %20) is not permitted in the authority
portion of a URI.
6. IPv4 addresses must match 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "."
1*3DIGIT. Since RFC 2732.
7. IPv4 addresses are 32-bit, therefore no segment may be larger than 255.
This isn't expressed by the grammar.
8. Hostnames cannot end with a '-'.
9. Labels in a hostname must be 63 bytes or less [RFC 1034].
10. Hostnames may be no longer than 255 bytes [RFC 1034]. (That
restriction was already there. I just moved it inwards.
11. Added support for IPv6 references added in RFC 2732. URIs such as
http://[::ffff:1.2.3.4] are valid. The BNF in RFC 2373 isn't correct. IPv6
addresses are read according to section 2.2 of RFC 2373.

Changes 6-10 tightened the checking of the host portion of the authority.
Adding support for registry-based authority [RFC 2396 - section 3.2.1]
will permit the URIs that would be rejected by changes 6-10.

The BNF in RFC 2396 is ambiguous in terms of the path and authority
components, meaning a path component can start with '//', which is
usually before the authority. The ambiguity is resolved in section 4.3 of
RFC 2396. Currently the URI implementation will only try to match
authority if it sees a URI beginning with scheme://, instead of trying to
match the path portion if it cannot be an authority. Fixing this would
permit URIs the would be rejected by change #5. For example,
scheme://%20whitespace%20 is valid, where //%20whitespace%20 is the path
portion.

Perhaps the problem might be with change #3. Appearently 'DAV:' is a valid
URI, though the grammar doesn't permit it. See discussion at:
http://www.apache.org/~fielding/uri/rev-2002/issues.html#014-empty-opaque_part.

On Mon, 21 Jul 2003, Neil Graham wrote:

> Hi all,
>
> Recently, Michael's been working hard to make our formerly rather woeful
> URI implementation conform more closely to the relevant RFC's.  I just
> noticed some JAXP TCK tests that try and test the Schema anyURI type that
> have started to fail as a result.  Now I have no tremendous expertise in
> the area of URI validation, but from what I've gleaned so far, what
> Michael's done looks quite correct.
>
> I know there's lots of Sun folks on the list; I wonder if anyone would be
> willing to run the TCK and bring to our attention any areas in which the
> new code doesn't appear to conform to the Schema specs?  With a release
> scheduled for the end of next week, it seems pretty important to straighten
> this out as soon as possible; it would certainly be unfortunate if any
> correct changes had to be pulled back because of a TCK-compliance issue.
>
> Cheers!
> Neil
> Neil Graham
> XML Parser Development
> IBM Toronto Lab
> Phone:  905-413-3519, T/L 969-3519
> E-mail:  neilg@ca.ibm.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>

--------------------
Michael Glavassevich
mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org