You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by SteveKing <st...@gmx.ch> on 2003/05/28 17:55:59 UTC

svn ls shows escaped paths

Hi,

unlike other commands

svn ls url

does not unescape the paths it shows as output.
So if a file with e.g. a space is in the repository
it isn't shown correctly but with the %20 escape
sequence in it.

Shouldn't this be changed?

Stefan

Re: svn ls shows escaped paths

Posted by SteveKing <st...@gmx.ch>.
----- Original Message ----- 
From: <cm...@collab.net>
> > Try this patch:
> > 
> > * subversion/libsvn_ra_dav/fetch.c
> >   (svn_ra_dav__get_dir): URI-decode the dirent names before using them
> >     as hash keys.
> 
> Or just update to HEAD.  I committed the patch just now.

Thanks a lot! That was really fast!

Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by cm...@collab.net.
cmpilato@collab.net writes:

> Try this patch:
> 
> * subversion/libsvn_ra_dav/fetch.c
>   (svn_ra_dav__get_dir): URI-decode the dirent names before using them
>     as hash keys.

Or just update to HEAD.  I committed the patch just now.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: URL encoding/decoding (was: svn ls shows escaped paths)

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2003-06-02 at 23:23, Greg Stein wrote:
> On Mon, Jun 02, 2003 at 08:11:33AM -0500, cmpilato@collab.net wrote:
> >...
> > I think Greg H. knows what you're talking about.

That wasn't a great assumption, but I think I've caught up.

> Hunh. Mike pointed out that ra_local decodes the URL pretty early. I just
> looked at ra_svn, and it ships the encoded URL over to the server, which
> then decodes it [to look for the repos].

If we were doing things properly, we would never URI-decode a URL; that
operation doesn't really make sense.  You're supposed to decode after
you've split the URL into scheme and scheme-specific part, and (if
appropriate) into all the other subparts like hostname and path.

I played with doing things properly in ra_svn, but found that it got
irritating to deal with copyfrom history.  So I went back to doing it
the broken way ra_local does it.

The brokenness doesn't actually hurt us, because none of the separator
characters should appear in any of the parts we care about (we have no
schemes containing ":" characters, hostnames cannot contain "/"
characters, and we don't use usernames and passwords encoded into
URIs.).  It's just conceptually unclear.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

URL encoding/decoding (was: svn ls shows escaped paths)

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jun 02, 2003 at 08:11:33AM -0500, cmpilato@collab.net wrote:
>...
> I think Greg H. knows what you're talking about.  What he's saying,
> though, is that ra_svn and ra_local don't pass around URIs internally
> -- they don't need to, because they're not talking to a web server
> that so requires.  So for them to implement the __get_dir() interface
> that you proposed means that they have to actually URI-*en*code their
> results before handing them back to the user (who will then
> immediately URI-*de*code them.

Yah, but he talked about how far out to push the encoding, which befoozled
me. Whatever...

> Only ra_dav plays with URIs internally.

Right. But our libraries' interfaces, including the RA interface all speak
in terms of

  1) canonicalized filesystem paths
  2) relative (canonical) paths
  3) URLs

I don't even think the RA interface deals with (1).

> So, what was that about the ra-layer defining the interfaces? :-)  I'm
> kidding, of course, because the destructiveness of a URI-decode
> operation is important to note.

It isn't that. Look at all of our interfaces. You just revamped
svn_client_mkdir(). It takes an array of paths or URLs. We *do* use URLs.
Further, nothing above the RA layer knows anything about the URLs except
for the hierarchical /-based modeling of resources (e.g. collections of
resources, and /-separated, and whatnot). Those upper layers don't know if
the RA layer will be talking DAV, C-based APIs, or the custom svn protocol.
But they *do* know that they are passing a URL. And URLs have specific
restrictions imposed upon them by RFC 2396.

IOW, this isn't about ra_dav, but that we are nominally passing around URLs,
but we are sloppy about them. Most of the time, we don't bother with the
notion of URL-encoding or -decoding. That's because we've patched it all up
inside of ra_dav to compensate for the higher levels mucking it up :-)

> That said, where *do* we draw the line on these encodings?  For
> example, we drive editors all the time against the RA library, and we
> don't URI-encode the input to those editors -- ra_dav does that
> internally.  The ra->check_path() interface takes non-URI-encoded
> strings.  The reporter interface -- the same.  The MERGE response
> bump_handler() code does URI-decoding before calling a client
> callback, yet other callbacks seem to use relative URLs.

Yup. My first reaction to your checkin was, "woah. that's not right." But
after thinking on it some more, I arrived at the same point as you:

> My points: a) There are no well-defined rules about the URI "lines" in
> the RA interface. b) As a result, there's world of inconsistency going
> on in there, and c) perhaps we should have an intelligent discussion
> about this situation.

For (b) and (c), yes. For (a), I don't entirely agree: we have parameters
named URL (or whatever, but they are intended to pass actual URLs), but we
don't always treat them properly.

> > The change to __get_dir *decodes* a URI and returns that from a library.
> > That doesn't make sense. That decoding can also lose information, which
> > means that the code which is calling RA cannot turn the URI around and use
> > it with the SVN libraries.
> 
> ...and I have all too recent memories of the pain that this kind of
> early decoding can cause...

Heh. No doubt.

[ for those unfamiliar with the background here: Apache decodes the incoming
  URL, then passes that to its various module. unfortunately, sometimes the
  modules need an *encoded* URL, but Apache has already destroyed it. thus,
  the module can only make a best guess. ]


----------------------------

So to start the discussion...

Note that editors are defined to pass paths that are relative to some root.
I don't think we necessarily have to do anything to the paths, but if
somebody joins those paths to a URL, then some action will need to be taken.
Thankfully, it isn't too hard since we don't allow '/' in our path segments.
We can just encode all other must-be-escaped characters (and ignore the
slashes), and slam the encoded value onto the URL base.

The reporter interface is similar. All of the paths are specified as
relative to the base URL.

Note that ra_local and ra_svn have no requirements for passing a composed
URL over the wire, like ra_dav does. They are perfectly free to pass the
base URL and then the relative path in SVN-canonical form. Thus, they may be
able to avoid some URL-encoding on the relative paths.

Much of the RA interface seems to be path-based, relative to the base URL
used to open the RA session.

Hmm. I wonder, then, if we do have a problem with the dir entries stuff. If
we classify those as relative paths in the SVN-canon style, then decoding
*is* just fine.

And the decoding isn't actually destructive in this case. The thing that
really kills you with decoding is when '/', '?', and '&' appears in the URL.
But SVN-canon doesn't allow '/' within a segment, and '?' and '&' have no
special meaning in an SVN path. Thus, we can always safely encode them
without fear of changing the meaning of the URL.

Well... so it would seem the question comes down to "where do we join an SVN
(relative) path to something we know is a URL?" The primary places to look
are the callers of RA->open, to see where they got the repos_URL. Whether it
came straight from the the client input, or whether they joined a base URL
and an unencoded path. I'd be leery of the diff stuff when it goes to open
new RA sessions (does it still do that?). A quick look also shows us using
entry->url a bit, so any time URLs are joined there could be an issue.

Hunh. Mike pointed out that ra_local decodes the URL pretty early. I just
looked at ra_svn, and it ships the encoded URL over to the server, which
then decodes it [to look for the repos].

So I'm not sure that we've got a huge problem here. But I'm not entirely
confident that we've joined things properly everywhere.

Thoughts?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by cm...@collab.net.
"Sander Striker" <st...@apache.org> writes:

> > From: cmpilato@collab.net [mailto:cmpilato@collab.net]
> > Sent: Monday, June 02, 2003 3:12 PM
> 
> > I think Greg H. knows what you're talking about.  What he's saying,
> > though, is that ra_svn and ra_local don't pass around URIs internally
> > -- they don't need to, because they're not talking to a web server
> > that so requires.  So for them to implement the __get_dir() interface
> > that you proposed means that they have to actually URI-*en*code their
> > results before handing them back to the user (who will then
> > immediately URI-*de*code them.
> > 
> > Only ra_dav plays with URIs internally.
> 
> Hmmm, so when the user does:
> 
>   svn co file:///home/svn/myrepos
> 
> file:///home/svn/myrepos, or parts of it, is not passed on as a uri?
> Where is it decoded?  What happens when you have a repository at
> a location in the filesystem with a space in the path?

See ra_local:split_url.c.  The first thing that ra_local does is
URI-decode in the incoming URL, and it uses filesystem paths from
there on out.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: svn ls shows escaped paths

Posted by Sander Striker <st...@apache.org>.
> From: cmpilato@collab.net [mailto:cmpilato@collab.net]
> Sent: Monday, June 02, 2003 3:12 PM

> I think Greg H. knows what you're talking about.  What he's saying,
> though, is that ra_svn and ra_local don't pass around URIs internally
> -- they don't need to, because they're not talking to a web server
> that so requires.  So for them to implement the __get_dir() interface
> that you proposed means that they have to actually URI-*en*code their
> results before handing them back to the user (who will then
> immediately URI-*de*code them.
> 
> Only ra_dav plays with URIs internally.

Hmmm, so when the user does:

  svn co file:///home/svn/myrepos

file:///home/svn/myrepos, or parts of it, is not passed on as a uri?
Where is it decoded?  What happens when you have a repository at
a location in the filesystem with a space in the path?


Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by cm...@collab.net.
Greg Stein <gs...@lyra.org> writes:

> > Perhaps I'm confused as to what you're suggesting.  How far towards the
> > edge do you want to push this encoding?  If you're not using ra_dav, I
> > don't think there's any need to URI-encode dirent names at all.
> 
> *de*coding. We already require that incoming URIs are properly
> encoded.  Thus, there is also the notion that all URIs within the
> SVN libraries should be encoded already. Thus, we want to keep
> things in an encoded state.

I think Greg H. knows what you're talking about.  What he's saying,
though, is that ra_svn and ra_local don't pass around URIs internally
-- they don't need to, because they're not talking to a web server
that so requires.  So for them to implement the __get_dir() interface
that you proposed means that they have to actually URI-*en*code their
results before handing them back to the user (who will then
immediately URI-*de*code them.

Only ra_dav plays with URIs internally.

So, what was that about the ra-layer defining the interfaces? :-)  I'm
kidding, of course, because the destructiveness of a URI-decode
operation is important to note.

That said, where *do* we draw the line on these encodings?  For
example, we drive editors all the time against the RA library, and we
don't URI-encode the input to those editors -- ra_dav does that
internally.  The ra->check_path() interface takes non-URI-encoded
strings.  The reporter interface -- the same.  The MERGE response
bump_handler() code does URI-decoding before calling a client
callback, yet other callbacks seem to use relative URLs.

My points: a) There are no well-defined rules about the URI "lines" in
the RA interface. b) As a result, there's world of inconsistency going
on in there, and c) perhaps we should have an intelligent discussion
about this situation.

> The change to __get_dir *decodes* a URI and returns that from a library.
> That doesn't make sense. That decoding can also lose information, which
> means that the code which is calling RA cannot turn the URI around and use
> it with the SVN libraries.

...and I have all too recent memories of the pain that this kind of
early decoding can cause...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jun 02, 2003 at 01:14:28AM -0400, Greg Hudson wrote:
> On Mon, 2003-06-02 at 01:12, Greg Stein wrote:
> > > * subversion/libsvn_ra_dav/fetch.c
> > >   (svn_ra_dav__get_dir): URI-decode the dirent names before using them
> > >     as hash keys.
> > 
> > I don't think that is correct. URI decoding is a destructive operation. The
> > RA user will not be able to use the returned values for further operation.
> > 
> > The URI decoding should happen at the edge of the system. Not down in the
> > middle of it.
> 
> Perhaps I'm confused as to what you're suggesting.  How far towards the
> edge do you want to push this encoding?  If you're not using ra_dav, I
> don't think there's any need to URI-encode dirent names at all.

*de*coding. We already require that incoming URIs are properly encoded.
Thus, there is also the notion that all URIs within the SVN libraries should
be encoded already. Thus, we want to keep things in an encoded state.

The change to __get_dir *decodes* a URI and returns that from a library.
That doesn't make sense. That decoding can also lose information, which
means that the code which is calling RA cannot turn the URI around and use
it with the SVN libraries.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2003-06-02 at 01:12, Greg Stein wrote:
> > * subversion/libsvn_ra_dav/fetch.c
> >   (svn_ra_dav__get_dir): URI-decode the dirent names before using them
> >     as hash keys.
> 
> I don't think that is correct. URI decoding is a destructive operation. The
> RA user will not be able to use the returned values for further operation.
> 
> The URI decoding should happen at the edge of the system. Not down in the
> middle of it.

Perhaps I'm confused as to what you're suggesting.  How far towards the
edge do you want to push this encoding?  If you're not using ra_dav, I
don't think there's any need to URI-encode dirent names at all.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by Greg Stein <gs...@lyra.org>.
On Wed, May 28, 2003 at 01:24:30PM -0500, cmpilato@collab.net wrote:
> "SteveKing" <st...@gmx.ch> writes:
>...
> > svn ls url
> > 
> > does not unescape the paths it shows as output.
> > So if a file with e.g. a space is in the repository
> > it isn't shown correctly but with the %20 escape
> > sequence in it.
> > 
> > Shouldn't this be changed?
> 
> Try this patch:
> 
> * subversion/libsvn_ra_dav/fetch.c
>   (svn_ra_dav__get_dir): URI-decode the dirent names before using them
>     as hash keys.

I don't think that is correct. URI decoding is a destructive operation. The
RA user will not be able to use the returned values for further operation.

The URI decoding should happen at the edge of the system. Not down in the
middle of it.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn ls shows escaped paths

Posted by cm...@collab.net.
"SteveKing" <st...@gmx.ch> writes:

> Hi,
> 
> unlike other commands
> 
> svn ls url
> 
> does not unescape the paths it shows as output.
> So if a file with e.g. a space is in the repository
> it isn't shown correctly but with the %20 escape
> sequence in it.
> 
> Shouldn't this be changed?

Try this patch:

* subversion/libsvn_ra_dav/fetch.c
  (svn_ra_dav__get_dir): URI-decode the dirent names before using them
    as hash keys.

Index: subversion/libsvn_ra_dav/fetch.c
===================================================================
--- subversion/libsvn_ra_dav/fetch.c	(revision 6063)
+++ subversion/libsvn_ra_dav/fetch.c	(working copy)
@@ -946,7 +946,9 @@
           if (propval != NULL)
             entry->last_author = propval->data;
           
-          apr_hash_set(*dirents, svn_path_basename(childname, pool),
+          apr_hash_set(*dirents, 
+                       svn_path_uri_decode(svn_path_basename(childname, pool),
+                                           pool),
                        APR_HASH_KEY_STRING, entry);
         }
     }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org