You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2012/01/20 11:59:50 UTC

URL canonicalization with 1.6 clients and 1.7 servers

1.7 has stricter rules for canonical URLs (from RFC 3986) than 1.6:

 - no default port:

    "http://host/repo" not "http://host:80/repo"

 - no lowercase % encoding:

    "http://host/repo/%C3%A5" not "http://host/repo/%c3%a5"

 - no unnecessary % encoding:

    "http://host/repo/A" not "http://host/repo/%41"

All the above URLs can be used with a 1.7 client because the client
converts them to canonical form, but a 1.6 client will not do the
conversion and will pass the non-canonical form to the server.  A 1.7
server will sometimes reject such URLs:

   $ svn-1.6 co http://localhost:80/repo
   svn: Path 'http://localhost:80/repo' is not canonicalized; there is a problem with the client.

although some commands work:

   $ svn-1.6 ls http://localhost:80/repo
   A

So this is a break in compatibility, which we justify by saying that the
URLs are not canonical.  Could we do better?  We could make the server
canonicalize the URL.  That would probably allow the non-canonical URLs
to work, but might introduce problems like issue 3601:
http://subversion.tigris.org/issues/show_bug.cgi?id=3601

Should we attempt backwards compatibility or should we simply require
clients to upgrade or avoid the URLs?

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: URL canonicalization with 1.6 clients and 1.7 servers

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Garret Wilson wrote on Fri, Jan 20, 2012 at 06:27:07 -0800:
> On 1/20/2012 2:59 AM, Philip Martin wrote:
> >1.7 has stricter rules for canonical URLs (from RFC 3986) than 1.6:
> ...
> >...
> >  - no lowercase % encoding:
> >
> >     "http://host/repo/%C3%A5" not "http://host/repo/%c3%a5"
> ...
> >All the above URLs can be used with a 1.7 client because the client
> >converts them to canonical form, but a 1.6 client will not do the
> >conversion and will pass the non-canonical form to the server.  A 1.7
> >server will sometimes reject such URLs:
> 
> If I could throw my opinion in here even though I'm new...
> canonicalization is a good thing in line with RFC 3986, which says,
> "For consistency, URI producers and normalizers should use uppercase
> hexadecimal digits for all percent-encodings." But rejecting
> lowercase percent-encoded strings seems like a direct contradiction
> of RFC 3986, which also says, "If two URIs differ only in the case
> of hexadecimal digits used in percent-encoded octets, they are
> equivalent." Rejecting a string that is a valid URI according to the
> specification is going to bring interoperability headaches at the

We accept lowercase %-escapes in our interface to the outer world; but
we're free to define our internal notion of 'canonical' however we want.

We could easily decide that 'canonical' URLs may not use an uppercase E
in %-escapes.  We'd still be RFC compliant, since our public API
requires all URI arguments to be passed via svn_uri_canonicalize()
before being passed to any other function, and svn_uri_canonicalize()
would accept uppercase E in %-escapes (and change them into lowercase).

The problem we have is that the 1.7 server has a stricter notion of
'canonical' than the 1.6 client.

> least. I personally don't like the lenience of the percent-encoding
> case either, but that's what the spec says.
> 
> Garret

Re: URL canonicalization with 1.6 clients and 1.7 servers

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Fri, Jan 20, 2012 at 8:27 AM, Garret Wilson <ga...@globalmentor.com> wrote:
> [snip]
> If I could throw my opinion in here even though I'm new...

Just a meta-comment: of course you are welcome to throw in your
opinion, no matter how old or new you might be to this community!  As
long as it's relevant and honest, insights are welcomed no matter the
source.  I'd like to think that we haven't gotten so curmudgeonly as
to scare off new participants.  :)

> [snip]

-Hyrum


-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: URL canonicalization with 1.6 clients and 1.7 servers

Posted by Branko Čibej <br...@xbc.nu>.
On 20.01.2012 15:27, Garret Wilson wrote:
> On 1/20/2012 2:59 AM, Philip Martin wrote:
>> 1.7 has stricter rules for canonical URLs (from RFC 3986) than 1.6:
> ...
>> ...
>>   - no lowercase % encoding:
>>
>>      "http://host/repo/%C3%A5" not "http://host/repo/%c3%a5"
> ...
>> All the above URLs can be used with a 1.7 client because the client
>> converts them to canonical form, but a 1.6 client will not do the
>> conversion and will pass the non-canonical form to the server.  A 1.7
>> server will sometimes reject such URLs:
>
> If I could throw my opinion in here even though I'm new...
> canonicalization is a good thing in line with RFC 3986, which says,
> "For consistency, URI producers and normalizers should use uppercase
> hexadecimal digits for all percent-encodings."

"Normalizers should use" does not imply "consumers should reject
anything else". In other words, theh 1.7 server is just a bit too
aggressive in its interpretation of the RFC.

I consider this incompatibility to be a bug.

-- Brane

Re: URL canonicalization with 1.6 clients and 1.7 servers

Posted by Garret Wilson <ga...@globalmentor.com>.
On 1/20/2012 2:59 AM, Philip Martin wrote:
> 1.7 has stricter rules for canonical URLs (from RFC 3986) than 1.6:
...
> ...
>   - no lowercase % encoding:
>
>      "http://host/repo/%C3%A5" not "http://host/repo/%c3%a5"
...
> All the above URLs can be used with a 1.7 client because the client
> converts them to canonical form, but a 1.6 client will not do the
> conversion and will pass the non-canonical form to the server.  A 1.7
> server will sometimes reject such URLs:

If I could throw my opinion in here even though I'm new... 
canonicalization is a good thing in line with RFC 3986, which says, "For 
consistency, URI producers and normalizers should use uppercase 
hexadecimal digits for all percent-encodings." But rejecting lowercase 
percent-encoded strings seems like a direct contradiction of RFC 3986, 
which also says, "If two URIs differ only in the case of hexadecimal 
digits used in percent-encoded octets, they are equivalent." Rejecting a 
string that is a valid URI according to the specification is going to 
bring interoperability headaches at the least. I personally don't like 
the lenience of the percent-encoding case either, but that's what the 
spec says.

Garret