You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Lieven Govaerts <sv...@mobsol.be> on 2010/02/24 22:39:21 UTC

Re: bug report: ra_serf gets PROPFIND failure on certain non-ASCII paths

On Tue, Feb 23, 2010 at 3:37 PM, C. Michael Pilato <cm...@collab.net> wrote:
> Got the following bug report from one of our CollabNet Asia peers.  I've not
>  attempted to reproduce the problem myself.  This bug was experienced using
> a 1.6.x client, but I don't know exactly which 'x'.  Anybody else seen similar?

Yeah, it's easy to reproduce with trunk.
>
> --------------------------------------------------------------------
>
> serf looks having problem on handling double byte directory name. For
> example, if you have the following structure in your repository:

s/serf/ra_serf/

..

>
> "svn list http://path_to_repo/日" returns an error like:
> svn: The PROPFIND response did not include the requested resourcetype value

This is what happens:

On the PROPFIND request "PROPFIND
/svn/tst/!svn/rvr/285/trunk/%E6%97%A5 HTTP/1.1", the server responds
with the propstat xml response. The interesting parts of that response
are:
..
<D:href>/svn/tst/!svn/rvr/285/trunk/%e6%97%a5/</D:href>
..
<lp1:checked-in><D:href>/svn/tst/!svn/ver/285/trunk/%E6%97%A5</D:href></lp1:checked-in>
..
<D:href>/svn/tst/!svn/rvr/285/trunk/%e6%97%a5/abcdefg</D:href>
..
<lp1:checked-in><D:href>/svn/tst/!svn/ver/285/trunk/%E6%97%A5/abcdefg</D:href></lp1:checked-in>


As you can see, the 日 character is encoded as either %E6%97%A5
(capital E) or %e6%97%a5 (lowercase e). This shouldn't be a problem,
but ra_serf uses these paths as keys in a hash table to later retrieve
them. It turns out they are first stored as /path/%E6%97%A5, and then
later retrieved as /path/%e6%97%A5, which obviously doesn't work as
paths are handled case sensitive.

These paths are canonicalized before being used, but
svn_uri_canonicalize doesn't touch the encoded characters. Maybe it
should just convert those letters to lowercase? I don't really see a
better fix.

Lieven

Re: bug report: ra_serf gets PROPFIND failure on certain non-ASCII paths

Posted by "C. Michael Pilato" <cm...@collab.net>.
Just a note to say that I've filed issue #3601 to track this:
http://subversion.tigris.org/issues/show_bug.cgi?id=3601

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: bug report: ra_serf gets PROPFIND failure on certain non-ASCII paths

Posted by Lieven Govaerts <sv...@mobsol.be>.
On Thu, Feb 25, 2010 at 2:06 PM, Julian Foad <ju...@wandisco.com> wrote:
> Lieven Govaerts wrote:
>> [[[
>> ra_serf: Fix support for international characters in paths.
[..]
>
>> [[[
>> Index: subversion/libsvn_ra_serf/merge.c
>> ===================================================================
>> --- subversion/libsvn_ra_serf/merge.c   (revision 911289)
>> +++ subversion/libsvn_ra_serf/merge.c   (working copy)
>> @@ -364,7 +364,7 @@
>>        info->prop_val = apr_pstrmemdup(info->pool, info->prop_val,
>>                                        info->prop_val_len);
>>        if (strcmp(info->prop_name, "href") == 0)
>> -        info->prop_val = svn_uri_canonicalize(info->prop_val,
>> info->pool);
>> +        info->prop_val = svn_ra_serf__uri_to_internal(info->prop_val,
>> info->pool);
>
> As I mentioned on IRC, you canonicalize here when the URI is being put
> in to the hash. But when it gets read out from the hash (about 70 lines
> higher up) you also need to canonicalize the URL that it is compared
> with.  Here:
>
> [line 299]
>    href = apr_hash_get(info->props, "href", APR_HASH_KEY_STRING);
>    if (! svn_uri_is_ancestor(ctx->merge_url, href))
>
> ... ctx->merge_url comes from session->repos_url, and that comes
> from ... somewhere higher up.
>
> You could canonicalize it right here, but really we need to fix the
> global svn_uri_canonicalize() to do this kind of canonicalization
> throughout Subversion, not just in ra_serf. Currently,
> svn_uri_canonicalize() simply isn't producing a canonical URI.
>

Digging a bit further...

Looking at chapter 6 Normalization and Comparison in RFC 3986 there
are a few things svn_uri_canonicalize does, and a few things it
doesn't.
It does:
- Case normalization, partially:
  - scheme and hostname are converted to lowercase
  - file:/// components are converted to lowercase on Windows
  - other case sensitive components are left untouched
- Path segment Normalization
  removal of "." and ".." path segments and adjacent separator
characters where possible
- Scheme-based normalization, partially
  http://example.com vs http://example.com/

It does not:
- case normalization of percent-encoding triplets, which should use
uppercase letters A-F.
- scheme based normalization: removal of the default port for a scheme
(http://example.com:80 vs http://example.com)
- Percent-encoding normalization: decoding of any percent-encoded
character that shouldn't have been encoded

What about making svn_uri_canonicalize support all these
normalizations? It will solve the issue at hand, and a few others that
are easy to run into.

On Thu, Feb 25, 2010 at 12:11 AM, Bert Huijben <be...@qqmail.nl> wrote:
>
> svn_uri_canonicalize is still used for more than urls/uris, so it doesn't handle any encoding itself.

Hm, why is it called svn_uri_canonicalize if it takes other things
than uri's? I think it should take already encoded uri's only (and we
should make that more explicit in the documentation).

>
> But besides that, I also think this is not the right way to fix this. For several characters in uris it is optional if they are escaped or not. We need a better fix than just fixing the casing of the escaped characters...

-> Decode this optional-escaped characters during canonicalization.

>
> Unescaping the paths would be an option that resolves it in the generic case, or unescape followed by a specific standard escaping. (This last method is used in some class libraries to avoid similar issues, but to provide a useful uri anyway)

Unescaping followed by escaping is probably the easiest way to
implement this, but not really the fastest. We do have to do this only
in places where svn gets uri's from outside though.

Lieven

Re: bug report: ra_serf gets PROPFIND failure on certain non-ASCII paths

Posted by Julian Foad <ju...@wandisco.com>.
Lieven Govaerts wrote:
> [[[
> ra_serf: Fix support for international characters in paths.
> 
> Found by: cmpilato
> 
> * subversion/libsvn_ra_serf/property.c
>   (end_propfind): Replace call to svn_uri_canonicalize with a call to
>    svn_ra_serf__uri_to_internal.
> * subversion/libsvn_ra_serf/merge.c
>   (end_merge): Here too.
> * subversion/libsvn_ra_serf/ra_serf.h
>   (svn_ra_serf__uri_to_internal): New function declaration.
> * subversion/libsvn_ra_serf/util.c
>   (svn_ra_serf__uri_to_internal): New function definition.
> ]]]


> [[[ 
> Index: subversion/libsvn_ra_serf/merge.c
> ===================================================================
> --- subversion/libsvn_ra_serf/merge.c   (revision 911289)
> +++ subversion/libsvn_ra_serf/merge.c   (working copy)
> @@ -364,7 +364,7 @@
>        info->prop_val = apr_pstrmemdup(info->pool, info->prop_val,
>                                        info->prop_val_len);
>        if (strcmp(info->prop_name, "href") == 0)
> -        info->prop_val = svn_uri_canonicalize(info->prop_val,
> info->pool);
> +        info->prop_val = svn_ra_serf__uri_to_internal(info->prop_val,
> info->pool);

As I mentioned on IRC, you canonicalize here when the URI is being put
in to the hash. But when it gets read out from the hash (about 70 lines
higher up) you also need to canonicalize the URL that it is compared
with.  Here:

[line 299]
    href = apr_hash_get(info->props, "href", APR_HASH_KEY_STRING);
    if (! svn_uri_is_ancestor(ctx->merge_url, href))

... ctx->merge_url comes from session->repos_url, and that comes
from ... somewhere higher up.

You could canonicalize it right here, but really we need to fix the
global svn_uri_canonicalize() to do this kind of canonicalization
throughout Subversion, not just in ra_serf. Currently,
svn_uri_canonicalize() simply isn't producing a canonical URI.

- Julian


>        /* Set our property. */
>        apr_hash_set(info->props, info->prop_name, APR_HASH_KEY_STRING,
> Index: subversion/libsvn_ra_serf/util.c
> ===================================================================
> --- subversion/libsvn_ra_serf/util.c    (revision 911289)
> +++ subversion/libsvn_ra_serf/util.c    (working copy)
> @@ -1772,3 +1772,20 @@
>  
>    return SVN_NO_ERROR;
>  }
> +
> +const char *
> +svn_ra_serf__uri_to_internal(const char *uri_in, apr_pool_t *pool)
> +{
> +  const char *target;
> +
> +  /* Convert to URI, unescaping all internatonal characters. */
> +  target = svn_path_uri_decode(uri_in, pool);
> +
> +  /* Now escape the international characters again. */
> +  target = svn_path_uri_from_iri(target, pool);
> +
> +  /* Strip any trailing '/' and collapse other redundant elements. */
> +  target = svn_uri_canonicalize(target, pool);
> +
> +  return target;
> +}
> Index: subversion/libsvn_ra_serf/property.c
> ===================================================================
> --- subversion/libsvn_ra_serf/property.c        (revision 911289)
> +++ subversion/libsvn_ra_serf/property.c        (working copy)
> @@ -29,6 +29,7 @@
>  #include "svn_xml.h"
>  #include "svn_props.h"
>  #include "svn_dirent_uri.h"
> +#include "svn_path.h"
>  
>  #include "private/svn_dav_protocol.h"
>  #include "svn_private_config.h"
> @@ -332,7 +333,7 @@
>          {
>            if (strcmp(ctx->depth, "1") == 0)
>              {
> -              ctx->current_path = svn_uri_canonicalize(info->val,
> ctx->pool);
> +              ctx->current_path =
> svn_ra_serf__uri_to_internal(info->val, ctx->pool);
>              }
>            else
>              {
> Index: subversion/libsvn_ra_serf/ra_serf.h
> ===================================================================
> --- subversion/libsvn_ra_serf/ra_serf.h (revision 911289)
> +++ subversion/libsvn_ra_serf/ra_serf.h (working copy)
> @@ -1521,6 +1521,13 @@
>  svn_error_t *
>  svn_ra_serf__error_on_status(int status_code, const char *path);
>  
> +/**
> + * Handle an external uri so that it can be compared with other
> uri's.
> + * (canonicalize + re-encode international characters).
> + */
> +const char *
> +svn_ra_serf__uri_to_internal(const char *uri_in, apr_pool_t *pool);
> +
>  #ifdef __cplusplus
>  }
>  #endif /* __cplusplus */
> ]]]
> 

Re: bug report: ra_serf gets PROPFIND failure on certain non-ASCII paths

Posted by Lieven Govaerts <sv...@mobsol.be>.
On Thu, Feb 25, 2010 at 12:11 AM, Bert Huijben <be...@qqmail.nl> wrote:
>
>
>> -----Original Message-----
>> From: lieven.govaerts@gmail.com [mailto:lieven.govaerts@gmail.com] On
>> Behalf Of Lieven Govaerts
>> Sent: woensdag 24 februari 2010 23:39
>> To: C. Michael Pilato
>> Cc: Subversion Development
>> Subject: Re: bug report: ra_serf gets PROPFIND failure on certain non-
>> ASCII paths
>
>
>> These paths are canonicalized before being used, but
>> svn_uri_canonicalize doesn't touch the encoded characters. Maybe it
>> should just convert those letters to lowercase? I don't really see a
>> better fix.
>
> svn_uri_canonicalize is still used for more than urls/uris, so it doesn't handle any encoding itself.
>
> But besides that, I also think this is not the right way to fix this. For several characters in uris it is optional if they are escaped or not. We need a better fix than just fixing the casing of the escaped characters...
>
> Unescaping the paths would be an option that resolves it in the generic case, or unescape followed by a specific standard escaping. (This last method is used in some class libraries to avoid similar issues, but to provide a useful uri anyway)

Attached patch fixes the problem. Better suggestions are welcome, I'll
commit tonight.

Lieven

[[[
ra_serf: Fix support for international characters in paths.

Found by: cmpilato

* subversion/libsvn_ra_serf/property.c
  (end_propfind): Replace call to svn_uri_canonicalize with a call to
   svn_ra_serf__uri_to_internal.
* subversion/libsvn_ra_serf/merge.c
  (end_merge): Here too.
* subversion/libsvn_ra_serf/ra_serf.h
  (svn_ra_serf__uri_to_internal): New function declaration.
* subversion/libsvn_ra_serf/util.c
  (svn_ra_serf__uri_to_internal): New function definition.
]]]

RE: bug report: ra_serf gets PROPFIND failure on certain non-ASCII paths

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: lieven.govaerts@gmail.com [mailto:lieven.govaerts@gmail.com] On
> Behalf Of Lieven Govaerts
> Sent: woensdag 24 februari 2010 23:39
> To: C. Michael Pilato
> Cc: Subversion Development
> Subject: Re: bug report: ra_serf gets PROPFIND failure on certain non-
> ASCII paths


> These paths are canonicalized before being used, but
> svn_uri_canonicalize doesn't touch the encoded characters. Maybe it
> should just convert those letters to lowercase? I don't really see a
> better fix.

svn_uri_canonicalize is still used for more than urls/uris, so it doesn't handle any encoding itself.

But besides that, I also think this is not the right way to fix this. For several characters in uris it is optional if they are escaped or not. We need a better fix than just fixing the casing of the escaped characters...

Unescaping the paths would be an option that resolves it in the generic case, or unescape followed by a specific standard escaping. (This last method is used in some class libraries to avoid similar issues, but to provide a useful uri anyway)

	Bert