You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Shun-ichi GOTO <go...@taiyo.co.jp> on 2004/08/03 13:49:52 UTC

Unnecessary path encoding in svnserve

Hi,

I'm using svnserve of 1.1.0 RC1 on BSD/OS and use it from Japanese
Windows client (svn.exe and TortoiseSVN). It works fine except treating
japanese kanji named directory.

When failed, we (client) get strange error message:

> [c:\temp\test]svn ci -m "" .
> svn: Commit failed (details follow):
> svn: Can't recode string

I inspected svnserve source code and found the fail point in
serve.c::find_repos(). It encodes requested repository path (utf-8) into
local encoding (apr-encoding) only for using apr_filepath_merge(). So it
limits directory name in repository with server side charset.

Is it intended?

For example, on our environment (BSD/OS 4.0.1), apr_os_locale_charset()
returns "iso-8859-1", thus it cannot handle japanese characters for
directory name.

--- Regards,
 Shun-ichi Goto  <go...@taiyo.co.jp>
   R&D Group, TAIYO Corp., Tokyo, JAPAN

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Hiroharu Tamaru <ta...@myn.rcast.u-tokyo.ac.jp>.
At Tue, 03 Aug 2004 11:54:10 -0500,
Ben Collins-Sussman wrote:
> 
> On Tue, 2004-08-03 at 11:33, Hiroharu Tamaru wrote:
> 
> > On a side note, one thing that doesn't work yet is that I
> > cannot use kanji directories when I have to specify URLs
> > (and URLs only) on the command line. 
> 
> This is something that should work in the upcoming 1.1.0-rc2 release:
> the ability to put spaces and i18n characters into URLs.  The
> commandline client will properly escape them for you.

Yes, just after I sent out my last message did I realize your post.

At Mon, 02 Aug 2004 16:24:58 -0500, Ben Collins-Sussman wrote:
> (Yes, breser, we'll be careful not to call it "IRI" support, but
> rather, "commandline client support for spaces and non-ascii chars in
> URLs".)

So brilliant!  I'll try it out.
Thank you guys all for your continued efforts.
-- 
Hiroharu Tamaru

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Ben Collins-Sussman <su...@collab.net>.
On Tue, 2004-08-03 at 11:33, Hiroharu Tamaru wrote:

> On a side note, one thing that doesn't work yet is that I
> cannot use kanji directories when I have to specify URLs
> (and URLs only) on the command line. 

This is something that should work in the upcoming 1.1.0-rc2 release:
the ability to put spaces and i18n characters into URLs.  The
commandline client will properly escape them for you.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Hiroharu Tamaru <ta...@myn.rcast.u-tokyo.ac.jp>.
Hi.

At Tue, 03 Aug 2004 22:49:52 +0900 (JST), Shun-ichi GOTO wrote:

> I'm using svnserve of 1.1.0 RC1 on BSD/OS and use it from Japanese
> Windows client (svn.exe and TortoiseSVN). It works fine except treating
> japanese kanji named directory.

Just to let you know that there is at least SOME setup that
is working, in case you are looking for anything that may work...

Kanji directories are working fine on a
mod_dav_svn(1.0.6)/apache(2.0.50)/FreeBSD(4.10-stable)
server we run here.  My clients are FreeBSD's and Windows
2000's (/w TortoiseSVN).  A kanji dir commited by a Windows
client can be checked out as an EUC (or SJIS) encoded kanji
directory by a FreeBSD client that has LANG=ja_JP.eucJP (or
ja_JP.SJIS) set, and vice versa.

I had to make sure that apr library had iconv enabled (The
apache2 port in FreeBSD used to have iconv disabled up until
when 2.0.50 was brought in), but I'm not tweeking any LANG
or LC variables on the server side.

I didn't look into the code you mentioned, and I'm new to
subversion, so I don't really know if it is because this is
a FreeBSD server or because I use mod_dav_svn.
# Could someone clarify?

I'm interested to know the result of the LC_CHARSET
workaround proposed by Greg, so please keep me informed.


On a side note, one thing that doesn't work yet is that I
cannot use kanji directories when I have to specify URLs
(and URLs only) on the command line.  An example is when I
want to checkout a part of the repository that contains a
kanji directory.  I can do
  svn co http://host/repo/trunk/
and have trunk/KANJIDIR/... checked out, but cannot directly do
  svn co http://host/repo/trunk/KANJIDIR/
I get
svn: URL 'http://host/repo/trunk/KANJIDIR/' is not properly URI-encoded
I assume I have to convert KANJIDIR into utf-8 and URI
encode it manually, but that I can live with at the moment.
This is only for FreeBSD client, and TortoiseSVN seems to
handle it fine, by the way.
-- 
Hiroharu Tamaru

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Re: Unnecessary path encoding in svnserve

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sat, 7 Aug 2004, Greg Hudson wrote:

> On Wed, 2004-08-04 at 17:33, Peter N. Lundblad wrote:
> > I've attached a patch that does this. This can be a security hole if not
> > done correctly, so would someone please review it. I couldn't juse
> > svn_path_is_backpath_present because it only checks for / as segment
> > separator, which isn't enough on Windows ofcourse.
>
> Hm.  Would it be better to use svn_path_internal_style() to convert any
> local separators (there shouldn't be any, but of course we can't trust
> the client to honor that) and then use svn_path_is_backpath_present on
> the result?
>
Yes, that's better and don't introduce code duplication.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Re: Unnecessary path encoding in svnserve

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sat, 7 Aug 2004, Greg Hudson wrote:

> On Wed, 2004-08-04 at 17:33, Peter N. Lundblad wrote:
> > I've attached a patch that does this. This can be a security hole if not
> > done correctly, so would someone please review it. I couldn't juse
> > svn_path_is_backpath_present because it only checks for / as segment
> > separator, which isn't enough on Windows ofcourse.
>
> Hm.  Would it be better to use svn_path_internal_style() to convert any
> local separators (there shouldn't be any, but of course we can't trust
> the client to honor that) and then use svn_path_is_backpath_present on
> the result?
>
I have been thinking more about this. The problem with your approach is
that it wouldn't allow \ in paths inside the repository on a Windows
serve, since they would be converted to slashes. This would be an
arbitrary limitation for a UNIX client. The problem is that we take the
rest of the repository path as the path inside the repository using
strlen. I don't think svn_path_internal_style just replaces characters
without insertion/deletion. As I maybe said earlier, I don't like this
nearly-duplication of code either...

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Re: Unnecessary path encoding in svnserve

Posted by Greg Hudson <gh...@MIT.EDU>.
(Sorry for the two-day latency on responding; work has been busy and I
wanted to think about this one carefully.)

On Wed, 2004-08-04 at 17:33, Peter N. Lundblad wrote:
> I've attached a patch that does this. This can be a security hole if not
> done correctly, so would someone please review it. I couldn't juse
> svn_path_is_backpath_present because it only checks for / as segment
> separator, which isn't enough on Windows ofcourse.

Hm.  Would it be better to use svn_path_internal_style() to convert any
local separators (there shouldn't be any, but of course we can't trust
the client to honor that) and then use svn_path_is_backpath_present on
the result?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

[PATCH] Re: Unnecessary path encoding in svnserve

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.

On Tue, 3 Aug 2004, Greg Hudson wrote:

> On Tue, 2004-08-03 at 18:17, Peter N. Lundblad wrote:
> > >   * Making svnserve's find_repos() use svn_path_join() instead of
> > > apr_filepath_merge().  To prevent the client from escaping from the
> > > repository root, we'd have to check for ".." path elements separately.
>
> > Would it be enough to check the URL path for "too many .. segments", i.e.
> > "foo/../bar" would be ok, but "foo/../.." would fail?
>
> I don't think we're currently allowing any .. segments in URLs, so it
> seems simplest to just blow out if you find any path segment equal to
> "..".
>
I've attached a patch that does this. This can be a security hole if not
done correctly, so would someone please review it. I couldn't juse
svn_path_is_backpath_present because it only checks for / as segment
separator, which isn't enough on Windows ofcourse. Still, I'm not sure
about irregular UTF8 sequences (I asked about that in an earlier mail).
Maybe we should add validation to svn_utf_cstring_from_utf8 and friends
even if the native charset is UTF-8.

Comments?

Re: Unnecessary path encoding in svnserve

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Tue, 3 Aug 2004, Greg Hudson wrote:

> On Tue, 2004-08-03 at 18:17, Peter N. Lundblad wrote:
> > I was looking at this. The problem is how to detect encoding errors.
> > Currently, we just return the APR error wrapped on recoding errors. Should
> > we introduce a special error code for recoding errors and wrap the APR
> > status in such an error?
>
> An alternative is to make svn_repos_find_root_path() perform a trial
> encoding of the path, chopping off components until it succeeds, and
> only then start calling check_repos_path().
>
Seems a little hackish to me, but it is a simpler change.

> > Would it be enough to check the URL path for "too many .. segments", i.e.
> > "foo/../bar" would be ok, but "foo/../.." would fail?
>
> I don't think we're currently allowing any .. segments in URLs, so it
> seems simplest to just blow out if you find any path segment equal to
> "..".
>
Yep.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-08-03 at 18:17, Peter N. Lundblad wrote:
> I was looking at this. The problem is how to detect encoding errors.
> Currently, we just return the APR error wrapped on recoding errors. Should
> we introduce a special error code for recoding errors and wrap the APR
> status in such an error?

Maybe.  That would have ramifications for essentially every
failure-to-encode situation; we might start producing overly-verbose
error messages for those cases.

An alternative is to make svn_repos_find_root_path() perform a trial
encoding of the path, chopping off components until it succeeds, and
only then start calling check_repos_path().

> >   * Making svnserve's find_repos() use svn_path_join() instead of
> > apr_filepath_merge().  To prevent the client from escaping from the
> > repository root, we'd have to check for ".." path elements separately.

> Would it be enough to check the URL path for "too many .. segments", i.e.
> "foo/../bar" would be ok, but "foo/../.." would fail?

I don't think we're currently allowing any .. segments in URLs, so it
seems simplest to just blow out if you find any path segment equal to
"..".


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Tue, 3 Aug 2004, Greg Hudson wrote:

> On Tue, 2004-08-03 at 14:15, Shun-ichi GOTO wrote:
> > 1. When the encoding is failed, coninue checking with shorter path
> >    removing trailing element instead of giving up whole of searching.
>
> Yeah, so it would be possible to solve the problem by:
>
>   * Making the svn_repos_find_root_path() detect encoding failures and
> continue, and:
>
I was looking at this. The problem is how to detect encoding errors.
Currently, we just return the APR error wrapped on recoding errors. Should
we introduce a special error code for recoding errors and wrap the APR
status in such an error?

>   * Making svnserve's find_repos() use svn_path_join() instead of
> apr_filepath_merge().  To prevent the client from escaping from the
> repository root, we'd have to check for ".." path elements separately.
>
Would it be enough to check the URL path for "too many .. segments", i.e.
"foo/../bar" would be ok, but "foo/../.." would fail?

//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-08-03 at 14:15, Shun-ichi GOTO wrote:
> 1. When the encoding is failed, coninue checking with shorter path
>    removing trailing element instead of giving up whole of searching.

Yeah, so it would be possible to solve the problem by:

  * Making the svn_repos_find_root_path() detect encoding failures and
continue, and:

  * Making svnserve's find_repos() use svn_path_join() instead of
apr_filepath_merge().  To prevent the client from escaping from the
repository root, we'd have to check for ".." path elements separately.

If someone wants to file an issue about this, go for it.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Shun-ichi GOTO <go...@taiyo.co.jp>.
>>>>> On Tue, 03 Aug 2004 10:11:52 -0400, Greg Hudson wrote,

> Pretty much.  Although the encoding step which failed could be avoided
> in a more perfect world, a later step would fail when we tried to find
> the repository root of the URL.  We do this search by starting with the
> full path portion of the URL, and then removing trailing path elements
> until we find a repository.  So we need to be able to encode the full
> URL path in the local charset.

I understand that svn_repos_find_root_path() requires local charset
encoding and also it will causes my problem.  But I imagine the
following two idea may save the problem.

1. When the encoding is failed, coninue checking with shorter path
   removing trailing element instead of giving up whole of searching.

2. Current svn_repos_find_root_path() checks with longest match way.
   But we can take shortest match strategy, I guess.
   Ex. url = /svnroot/project/trunk/dir1/dir2
       check /svnroot ... not repository
       check /svnroot/project  ... found! it's repository
   With this, the encoding restriction covers only for "/svnroot/project"
   and dir1 and dir2 can be allowed as any utf-8 characters.


> As a workaround, you could set LC_CHARSET to UTF8 when running svnserve.

Thanks.

But, unfortunately, on BSD/OS 4.0.1 (old system),
apr_os_locale_charset() simply calls apr_os_default_charset()
because it doesn't have HAVE_NL_LANGINFO,
and apr_os_defalut_charset() simply returns "iso-8859-1".
So I modified apr_os_default_charset() to return "utf-8" instead of
"iso-8859-1" for workaround. I can handle kanji directory with this.

--- Regards,
 Shun-ichi Goto  <go...@taiyo.co.jp>
   R&D Group, TAIYO Corp., Tokyo, JAPAN

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Unnecessary path encoding in svnserve

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-08-03 at 09:49, Shun-ichi GOTO wrote:
> I inspected svnserve source code and found the fail point in
> serve.c::find_repos(). It encodes requested repository path (utf-8) into
> local encoding (apr-encoding) only for using apr_filepath_merge(). So it
> limits directory name in repository with server side charset.
> 
> Is it intended?

Pretty much.  Although the encoding step which failed could be avoided
in a more perfect world, a later step would fail when we tried to find
the repository root of the URL.  We do this search by starting with the
full path portion of the URL, and then removing trailing path elements
until we find a repository.  So we need to be able to encode the full
URL path in the local charset.

As a workaround, you could set LC_CHARSET to UTF8 when running svnserve.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org