You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by ka...@apache.org on 2010/03/01 14:48:01 UTC

svn commit: r917523 - /subversion/trunk/subversion/mod_dav_svn/mirror.c

Author: kameshj
Date: Mon Mar  1 13:48:01 2010
New Revision: 917523

URL: http://svn.apache.org/viewvc?rev=917523&view=rev
Log:
With the below apache configuration(See the <space> character "/svn 1/").

<Location "/svn 1/">
  DAV svn
  SVNParentPath /repositories
</Location>
<Location "/svn 2/">
  DAV svn
  SVNParentPath /repositories-slave
  SVNMasterURI "http://localhost/svn 1"
</Location>

Write through proxy is *not* happening and commit happens *directly* inside the slave.

* subversion/mod_dav_svn/mirror.c
(proxy_request_fixup): URI encode the to be proxied file name.
(dav_svn__proxy_request_fixup): r->unparsed_uri is in url encoded form while
root_dir is not in encoded form. So use r->uri to compare with root_dir.
(dav_svn__location_in_filter): URL Encode the 'find & replace' urls as
the request body has it in url encoded format.
(dav_svn__location_header_filter): Encode the master_uri as the response from
master has the Location header url encoded already. Set the outgoing Location
header url encoded.
(dav_svn__location_body_filter): URL Encode the 'find & replace' urls as
the response body has it in url encoded format.

Modified:
    subversion/trunk/subversion/mod_dav_svn/mirror.c

Modified: subversion/trunk/subversion/mod_dav_svn/mirror.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/mod_dav_svn/mirror.c?rev=917523&r1=917522&r2=917523&view=diff
==============================================================================
--- subversion/trunk/subversion/mod_dav_svn/mirror.c (original)
+++ subversion/trunk/subversion/mod_dav_svn/mirror.c Mon Mar  1 13:48:01 2010
@@ -45,8 +45,10 @@
 
     r->proxyreq = PROXYREQ_REVERSE;
     r->uri = r->unparsed_uri;
-    r->filename = apr_pstrcat(r->pool, "proxy:", master_uri,
-                              uri_segment, NULL);
+    r->filename = (char *) svn_path_uri_encode(apr_pstrcat(r->pool, "proxy:",
+                                                           master_uri,
+                                                           uri_segment,
+                                                           NULL), r->pool);
     r->handler = "proxy-server";
     ap_add_output_filter("LocationRewrite", NULL, r, r->connection);
     ap_add_output_filter("ReposRewrite", NULL, r, r->connection);
@@ -78,7 +80,7 @@
            transaction tree resouces. */
         if (r->method_number == M_PROPFIND ||
             r->method_number == M_GET) {
-            if ((seg = ap_strstr(r->unparsed_uri, root_dir))) {
+            if ((seg = ap_strstr(r->uri, root_dir))) {
                 if (ap_strstr_c(seg, apr_pstrcat(r->pool, special_uri,
                                                  "/wrk/", NULL))
                     || ap_strstr_c(seg, apr_pstrcat(r->pool, special_uri,
@@ -95,7 +97,7 @@
         /* If this is a write request aimed at a public URI (such as
            MERGE, LOCK, UNLOCK, etc.) or any as-yet-unhandled request
            using a "special URI", we have to doctor it a bit for proxying. */
-        seg = ap_strstr(r->unparsed_uri, root_dir);
+        seg = ap_strstr(r->uri, root_dir);
         if (seg && (r->method_number == M_MERGE ||
                     r->method_number == M_LOCK ||
                     r->method_number == M_UNLOCK ||
@@ -158,6 +160,11 @@
        ### PUT requests and properties in PROPPATCH requests.
        ### See issue #3445 for details. */
 
+    /* We are url encoding the current url and the master url
+       as incoming(from client) request body has it encoded already. */
+    canonicalized_uri = (char *) svn_path_uri_encode(canonicalized_uri,
+                                                     r->pool);
+    root_dir = (char *) svn_path_uri_encode(root_dir, r->pool);
     if (!f->ctx) {
         ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
         ctx->remotepath = canonicalized_uri;
@@ -216,6 +223,7 @@
     /* Don't filter if we're in a subrequest or we aren't setup to
        proxy anything. */
     master_uri = dav_svn__get_master_uri(r);
+    master_uri = (char *) svn_path_uri_encode(master_uri, r->pool);
     if (r->main || !master_uri) {
         ap_remove_output_filter(f);
         return ap_pass_brigade(f->next, bb);
@@ -233,6 +241,7 @@
                                                dav_svn__get_root_dir(r), "/",
                                                start_foo, NULL),
                                    r);
+        new_uri = (char *) svn_path_uri_encode(new_uri, r->pool);
         apr_table_set(r->headers_out, "Location", new_uri);
     }
     return ap_pass_brigade(f->next, bb);
@@ -274,6 +283,11 @@
        ### they return in the process of trying to do URI fix-ups.
        ### See issue #3445 for details. */
 
+    /* We are url encoding the current url and the master url
+       as incoming(from master) request body has it encoded already. */
+    canonicalized_uri = (char *) svn_path_uri_encode(canonicalized_uri,
+                                                     r->pool);
+    root_dir = (char *) svn_path_uri_encode(root_dir, r->pool);
     if (!f->ctx) {
         ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
         ctx->remotepath = canonicalized_uri;



Re: svn commit: r917523 - mod_dav_svn/mirror.c - Are the URIs URI-encoded?

Posted by "C. Michael Pilato" <cm...@collab.net>.
C. Michael Pilato wrote:
> Julian Foad wrote:
>> On Mon, 2010-03-01, kameshj@apache.org wrote:
>>> Author: kameshj
>>> Date: Mon Mar  1 13:48:01 2010
>>> New Revision: 917523
>>>
>>> URL: http://svn.apache.org/viewvc?rev=917523&view=rev
>>> Log:
>>> With the below apache configuration(See the <space> character "/svn 1/").
>>>
>>> <Location "/svn 1/">
>> [...]
>>> Write through proxy is *not* happening and commit happens *directly* inside the slave.
>> Hi Kamesh.
>>
>> Today I have studied this change thoroughly, and it appears to me that
>> it changes the interpretation of the URL given in the <Location>
>> directive (and the SVNMasterURI directive) in a way that is probably
>> undesirable.
>>
>> Previously, the URL was assumed to be URI-encoded, and thus this one
>> with a space in it did not work because it did not match the canonical
>> equivalent ("/svn%201/").  That caused the "bug" that you were trying to
>> fix, I believe.
>>
>> After this patch, it appears to me that the URL given in the <Location>
>> directive (and the one in the SVNMasterURI directive) is assumed to be
>> NOT encoded, and mod_dav_svn will encode it internally.
>>
>> The new behaviour will be wrong for any Location URL that is already
>> written with %XX encoding, because it will encode it again.
>>
>> If we agree that the URL should be assumed already to be URI-encoded (at
>> least to the extent necessary to make "%" characters unambiguous), I
>> think there are two possible solutions:
>>
>>   1) Tell the user to use proper (canonically URI-encoded) URIs.
>>
>>   2) Assume the URI is already in (partly) URI-encoded form but
>> re-encode it into canonical URI-encoded form, encoding characters such
>> as SPACE but not re-encoding the "%" character, and also unencode any
>> characters that should not be encoded in canonical URI-encoded form.
>>
>> (1) is rather unfriendly.
> 
> This agreement isn't something "we" need to have.  <Location> is a directive
> fully owned by the Apache HTTP Server software, and there are many years of
> precedent for its proper usage.  We need only consult the Apache docs to
> determine whether the path provided to the <Location> directive is defined
> to be URI-encoded or not (then consult the Apache devs if the docs are
> insufficient) and then we need to make sure that our code behaves according
> to those expectation.
> 
> The SVNMasterURI directive is ours, and we can define its encoding level any
> way we wish to do so.  (I would suggest that it is most wise to define it
> similarly to mod_proxy's directives.)  Whatever we decide to go with, our
> code must behave accordingly.  That might mean extra work performing our own
> internal canonicalization (beyond whatever rules core Apache might have in
> place for its directives).  But it's the only sane way to go about things.
> 
> Solution (2) is a recipe for disaster.  The URI is or it isn't URI-encoded,
> period.  Choose one, document it, enforce it, and move on.
> 

FWIW, the Apache docs have this to say about <Location>:

   For all origin (non-proxy) requests, the URL to be matched is a URL-path
   of the form /path/. No scheme, hostname, port, or query string may be
   included. For proxy requests, the URL to be matched is of the form
   scheme://servername/path, and you must include the prefix.

Not the clearest thing in the world.  Some testing seems to indicate that
the path to Location is *not* expected to be URI-encoded, though.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn commit: r917523 - mod_dav_svn/mirror.c - Are the URIs URI-encoded?

Posted by "C. Michael Pilato" <cm...@collab.net>.
Julian Foad wrote:
> On Mon, 2010-03-01, kameshj@apache.org wrote:
>> Author: kameshj
>> Date: Mon Mar  1 13:48:01 2010
>> New Revision: 917523
>>
>> URL: http://svn.apache.org/viewvc?rev=917523&view=rev
>> Log:
>> With the below apache configuration(See the <space> character "/svn 1/").
>>
>> <Location "/svn 1/">
> [...]
>> Write through proxy is *not* happening and commit happens *directly* inside the slave.
> 
> Hi Kamesh.
> 
> Today I have studied this change thoroughly, and it appears to me that
> it changes the interpretation of the URL given in the <Location>
> directive (and the SVNMasterURI directive) in a way that is probably
> undesirable.
> 
> Previously, the URL was assumed to be URI-encoded, and thus this one
> with a space in it did not work because it did not match the canonical
> equivalent ("/svn%201/").  That caused the "bug" that you were trying to
> fix, I believe.
> 
> After this patch, it appears to me that the URL given in the <Location>
> directive (and the one in the SVNMasterURI directive) is assumed to be
> NOT encoded, and mod_dav_svn will encode it internally.
> 
> The new behaviour will be wrong for any Location URL that is already
> written with %XX encoding, because it will encode it again.
> 
> If we agree that the URL should be assumed already to be URI-encoded (at
> least to the extent necessary to make "%" characters unambiguous), I
> think there are two possible solutions:
> 
>   1) Tell the user to use proper (canonically URI-encoded) URIs.
> 
>   2) Assume the URI is already in (partly) URI-encoded form but
> re-encode it into canonical URI-encoded form, encoding characters such
> as SPACE but not re-encoding the "%" character, and also unencode any
> characters that should not be encoded in canonical URI-encoded form.
> 
> (1) is rather unfriendly.

This agreement isn't something "we" need to have.  <Location> is a directive
fully owned by the Apache HTTP Server software, and there are many years of
precedent for its proper usage.  We need only consult the Apache docs to
determine whether the path provided to the <Location> directive is defined
to be URI-encoded or not (then consult the Apache devs if the docs are
insufficient) and then we need to make sure that our code behaves according
to those expectation.

The SVNMasterURI directive is ours, and we can define its encoding level any
way we wish to do so.  (I would suggest that it is most wise to define it
similarly to mod_proxy's directives.)  Whatever we decide to go with, our
code must behave accordingly.  That might mean extra work performing our own
internal canonicalization (beyond whatever rules core Apache might have in
place for its directives).  But it's the only sane way to go about things.

Solution (2) is a recipe for disaster.  The URI is or it isn't URI-encoded,
period.  Choose one, document it, enforce it, and move on.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn commit: r917523 - mod_dav_svn/mirror.c - Are the URIs URI-encoded?

Posted by Julian Foad <ju...@wandisco.com>.
On Mon, 2010-03-01, kameshj@apache.org wrote:
> Author: kameshj
> Date: Mon Mar  1 13:48:01 2010
> New Revision: 917523
> 
> URL: http://svn.apache.org/viewvc?rev=917523&view=rev
> Log:
> With the below apache configuration(See the <space> character "/svn 1/").
> 
> <Location "/svn 1/">
[...]
> Write through proxy is *not* happening and commit happens *directly* inside the slave.

Hi Kamesh.

Today I have studied this change thoroughly, and it appears to me that
it changes the interpretation of the URL given in the <Location>
directive (and the SVNMasterURI directive) in a way that is probably
undesirable.

Previously, the URL was assumed to be URI-encoded, and thus this one
with a space in it did not work because it did not match the canonical
equivalent ("/svn%201/").  That caused the "bug" that you were trying to
fix, I believe.

After this patch, it appears to me that the URL given in the <Location>
directive (and the one in the SVNMasterURI directive) is assumed to be
NOT encoded, and mod_dav_svn will encode it internally.

The new behaviour will be wrong for any Location URL that is already
written with %XX encoding, because it will encode it again.

If we agree that the URL should be assumed already to be URI-encoded (at
least to the extent necessary to make "%" characters unambiguous), I
think there are two possible solutions:

  1) Tell the user to use proper (canonically URI-encoded) URIs.

  2) Assume the URI is already in (partly) URI-encoded form but
re-encode it into canonical URI-encoded form, encoding characters such
as SPACE but not re-encoding the "%" character, and also unencode any
characters that should not be encoded in canonical URI-encoded form.

(1) is rather unfriendly.

More follows ...

> * subversion/mod_dav_svn/mirror.c
> (proxy_request_fixup): URI encode the to be proxied file name.
> (dav_svn__proxy_request_fixup): r->unparsed_uri is in url encoded form while
> root_dir is not in encoded form. So use r->uri to compare with root_dir.
> (dav_svn__location_in_filter): URL Encode the 'find & replace' urls as
> the request body has it in url encoded format.
> (dav_svn__location_header_filter): Encode the master_uri as the response from
> master has the Location header url encoded already. Set the outgoing Location
> header url encoded.
> (dav_svn__location_body_filter): URL Encode the 'find & replace' urls as
> the response body has it in url encoded format.
> 
> Modified:
>     subversion/trunk/subversion/mod_dav_svn/mirror.c
> 
> Modified: subversion/trunk/subversion/mod_dav_svn/mirror.c
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/mod_dav_svn/mirror.c?rev=917523&r1=917522&r2=917523&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/mod_dav_svn/mirror.c (original)
> +++ subversion/trunk/subversion/mod_dav_svn/mirror.c Mon Mar  1 13:48:01 2010
> @@ -45,8 +45,10 @@
>  
>      r->proxyreq = PROXYREQ_REVERSE;
>      r->uri = r->unparsed_uri;
> -    r->filename = apr_pstrcat(r->pool, "proxy:", master_uri,
> -                              uri_segment, NULL);
> +    r->filename = (char *) svn_path_uri_encode(apr_pstrcat(r->pool, "proxy:",
> +                                                           master_uri,
> +                                                           uri_segment,
> +                                                           NULL), r->pool);

That looks odd.  It may be functionally correct, but normally I would
assume that any parameters with names like "master_uri" and
"uri_segment" would be already URI-encoded.  (If so, then encoding them
again would be wrong.)  Don't we want to keep URI data in URI-encoded
form most of the time?

In this case, it appears that you have modified this function and the
caller so that the arguments passed are NOT URI-encoded.  Looking at
where the "uri_segment" comes from ...

>      r->handler = "proxy-server";
>      ap_add_output_filter("LocationRewrite", NULL, r, r->connection);
>      ap_add_output_filter("ReposRewrite", NULL, r, r->connection);
> @@ -78,7 +80,7 @@
>             transaction tree resouces. */
>          if (r->method_number == M_PROPFIND ||
>              r->method_number == M_GET) {
> -            if ((seg = ap_strstr(r->unparsed_uri, root_dir))) {
> +            if ((seg = ap_strstr(r->uri, root_dir))) {

"r->uri" is documented as "The path portion of the URI."  It appears
from this commit that you have discovererd that "r->uri" is not
URI-encoded, whereas r->unparsed_uri (which is the full URL) is
URI-encoded.  OK so far.

I went looking to see where the "master_uri" comes from, and it comes
from dav_svn__get_master_uri() which gets its value directly from the
Apache configuration directive "SVNMasterURI".  It is not clear to me
whether we should expect that to be URI-encoded, but I would have
thought so.

To clarify that, we need first to decide what it is meant to be, and
then to document dav_svn__get_master_uri() and the other functions in
that group.  I attach a starting-point patch for the latter.

>                  if (ap_strstr_c(seg, apr_pstrcat(r->pool, special_uri,
>                                                   "/wrk/", NULL))
>                      || ap_strstr_c(seg, apr_pstrcat(r->pool, special_uri,
> @@ -95,7 +97,7 @@
>          /* If this is a write request aimed at a public URI (such as
>             MERGE, LOCK, UNLOCK, etc.) or any as-yet-unhandled request
>             using a "special URI", we have to doctor it a bit for proxying. */
> -        seg = ap_strstr(r->unparsed_uri, root_dir);
> +        seg = ap_strstr(r->uri, root_dir);
>          if (seg && (r->method_number == M_MERGE ||
>                      r->method_number == M_LOCK ||
>                      r->method_number == M_UNLOCK ||
> @@ -158,6 +160,11 @@
>         ### PUT requests and properties in PROPPATCH requests.
>         ### See issue #3445 for details. */
>  
> +    /* We are url encoding the current url and the master url
> +       as incoming(from client) request body has it encoded already. */
> +    canonicalized_uri = (char *) svn_path_uri_encode(canonicalized_uri,
> +                                                     r->pool);
> +    root_dir = (char *) svn_path_uri_encode(root_dir, r->pool);

In a separate discussion I think we decided that we will need to define
"canonical" as including "in a canonical URI-encoded form".  When we do
that, uri_canonicalize() will (I think) require that its input is
already URI-encoded, so we will need to call uri_encode() before
uri_canonicalize().  It would be good to make the calls in that order
now, but we will have to audit all calls when we make that change anyway
so it is not essential right now.

Are the (char *) type casts necessary?  If not, please remove them.
(Six of them in total.)

Thanks.
- Julian