You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by SteveKing <st...@gmx.ch> on 2003/12/13 12:40:10 UTC

svn blame and filenames with non-ascii chars

Hi,

Yesterday I got some error reports that TSVN
can't do the "blame" command on files with
non-ascii chars in the filename. I could reproduce
the described behaviour, but failed to find the bug
in the TSVN code. So I tried the command line
client version 0.34 - and that one didn't work
either.
The blame command works fine for both TSVN
and the command line client if the filename has
only "normal" ascii chars in it, but fails for both
if the filename has non-ascii chars.

I tried to step through the subversion code,
but soon got lost somewhere in the apr hash
part - sorry for not being more helpful.

What I found was that:
- the filename/path is encoded correctly in
  utf8 before passing it to svn_client_blame()
- the "entries" file in the .svn directory does
  contain that file too, correctly encoded in utf8.

Steps to reproduce:

- add a file with a special name to wc (e.g. "äöüÄÖÜ.txt")
  and commit
- change the file and commit  (not sure if that's really needed to
reproduce)
- try 'svn blame <filepath_to_that_file>'
svn: Error string not specified yet
svn: Missing changed-path information for revision 3 of '%C3%A4%C3%B6%C3%B
C%C3%84%C3%96%C3%9C.txt'


Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by "Jostein Chr. Andersen" <jo...@josander.net>.
On Thursday 18 December 2003 11.48, John Szakmeister wrote:

> I think r8023 (fixes path comparison) and r8030 (updates test
> blame_tests.py test script) should be included in the 0.35.0 release
> and 1.0.  Votes?

In that case, I suggest that requests for merging into branch 0.35.0 are 
done from a top level posting so everyone can notice it.

Jostein

-- 
http://www.josander.net/kontakt/ ||
http://www.josander.net/en/contact/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by kf...@collab.net.
John Szakmeister <jo...@szakmeister.net> writes:
> I think r8023 (fixes path comparison) and r8030 (updates test blame_tests.py 
> test script) should be included in the 0.35.0 release and 1.0.  Votes?

Can you please file an issue for this as a 1.0 candidate revision,
instead?  (I know we put one fix into 0.35.0 already, but that was a
crash bug in the bindings, not a non-crash bug in core code).


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by John Szakmeister <jo...@szakmeister.net>.
On Wednesday 17 December 2003 04:12, John Szakmeister wrote:
> [snip]
> > The examples you raise certainly warrant further consideration, but
> > IIUC, you don't disagree with this particular change?  It looks fine
> > to me; I think it should be committed.
>
> I agree. :-)  Committed in r8023 with a more descriptive comment.

I think r8023 (fixes path comparison) and r8030 (updates test blame_tests.py 
test script) should be included in the 0.35.0 release and 1.0.  Votes?

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by John Szakmeister <jo...@szakmeister.net>.
On Sunday 14 December 2003 21:17, mark benedetto king wrote:
> On Sat, Dec 13, 2003 at 05:27:58PM +0000, Philip Martin wrote:
> > John Szakmeister <jo...@szakmeister.net> writes:
> > > Index: subversion/libsvn_client/blame.c
> > > ===================================================================
> > > --- subversion/libsvn_client/blame.c	(revision 7978)
> > > +++ subversion/libsvn_client/blame.c	(working copy)
> > > @@ -378,7 +378,9 @@
> > >
> > >    SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
> > >
> > > -  lmb.path = url + strlen (reposURL);
> > > +  /* Convert path from URI to UTF-8 before placing it in the baton */
> > > +  lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);
> >
> > Hmmm...
> >
> > My first instinct was "the comment is redundant" since it duplicates
> > the documented behaviour of the function.  Then I realised that the
> > documentation doesn't mention UTF-8, so I started thinking about the
> > URL.  It's already UTF-8 isn't it?  Whether the decoded URL is UTF-8
> > depends on what characters are URI encoded in the URL, where do we
> > guarantee that the URL contains URI encoded UTF-8?
> >
> > Next I tried a few things
>
> [examples snipped]
>
> The examples you raise certainly warrant further consideration, but
> IIUC, you don't disagree with this particular change?  It looks fine
> to me; I think it should be committed.

I agree. :-)  Committed in r8023 with a more descriptive comment.

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by mark benedetto king <mb...@lowlatency.com>.
On Sat, Dec 13, 2003 at 05:27:58PM +0000, Philip Martin wrote:
> John Szakmeister <jo...@szakmeister.net> writes:
> 
> > Index: subversion/libsvn_client/blame.c
> > ===================================================================
> > --- subversion/libsvn_client/blame.c	(revision 7978)
> > +++ subversion/libsvn_client/blame.c	(working copy)
> > @@ -378,7 +378,9 @@
> >  
> >    SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
> >  
> > -  lmb.path = url + strlen (reposURL);
> > +  /* Convert path from URI to UTF-8 before placing it in the baton */
> > +  lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);
> 
> Hmmm...
> 
> My first instinct was "the comment is redundant" since it duplicates
> the documented behaviour of the function.  Then I realised that the
> documentation doesn't mention UTF-8, so I started thinking about the
> URL.  It's already UTF-8 isn't it?  Whether the decoded URL is UTF-8
> depends on what characters are URI encoded in the URL, where do we
> guarantee that the URL contains URI encoded UTF-8?
> 
> Next I tried a few things
> 

[examples snipped]

The examples you raise certainly warrant further consideration, but
IIUC, you don't disagree with this particular change?  It looks fine
to me; I think it should be committed.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by John Szakmeister <jo...@szakmeister.net>.
On Saturday 13 December 2003 16:44, John Szakmeister wrote:
> On Saturday 13 December 2003 12:27, Philip Martin wrote:
> > John Szakmeister <jo...@szakmeister.net> writes:
> > > Index: subversion/libsvn_client/blame.c
> > > ===================================================================
> > > --- subversion/libsvn_client/blame.c	(revision 7978)
> > > +++ subversion/libsvn_client/blame.c	(working copy)
> > > @@ -378,7 +378,9 @@
> > >
> > >    SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
> > >
> > > -  lmb.path = url + strlen (reposURL);
> > > +  /* Convert path from URI to UTF-8 before placing it in the baton */
> > > +  lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);
> >
> > Hmmm...
> >
> > My first instinct was "the comment is redundant" since it duplicates
> > the documented behaviour of the function.  Then I realised that the
> > documentation doesn't mention UTF-8, so I started thinking about the
> > URL.  It's already UTF-8 isn't it?  Whether the decoded URL is UTF-8
> > depends on what characters are URI encoded in the URL, where do we
> > guarantee that the URL contains URI encoded UTF-8?
>
> Philip, I have to say you have an amazing ability to go through and
> validate *everything*. :-)  Didn't even cross my mind to even try the
> things you did.
>
> > Next I tried a few things
> >
> > $ svnadmin create repo
> > $ svn import Makefile http://localhost:8888/obj/repo/%c3%a9 -m ""
> > $ LANG=en_GB svn ls http://localhost:8888/obj/repo
> > é
> > $ svn blame http://localhost:8888/obj/repo/%c3%a9
> > ../svn/subversion/libsvn_client/blame.c:308: (apr_err=20014)
> > svn: Missing changed-path information for revision 1 of '%a9'
> > $ LANG=en_GB svn ls file://`pwd`/repo/%c3%a9
> > ../svn/subversion/libsvn_client/ls.c:144: (apr_err=160013)
> > svn: URL non-existent in that revision.
>
> The problem above is due to the fact that log_message_receiver() is
> comparing URI encoded paths against non-URI encoded ones.  The keys for the
> changed_paths hash where inserted in a non-URI format.  The patch will fix
> this problem, but not the following ones. :-)
>
> > Rather alarmingly, I can get non-URI encoded paths into the repository
>
> Oof, I don't like the sound of that.
>
> > $ svn import Makefile http://localhost:8888/obj/repo/%e9 -m ""
> > $ svnadmin dump -q repo | grep Node-path
> > Node-path: é
> >
> > which cause ra_dav to produce an error
> >
> > LANG=en_GB svn ls http://localhost:8888/obj/repo/
> > ../svn/subversion/libsvn_ra_dav/util.c:661: (apr_err=175002)
> > svn: PROPFIND request failed on '/obj/repo/!svn/bc/2'
> > ../svn/subversion/libsvn_ra_dav/util.c:647: (apr_err=175002)
> > svn: The PROPFIND request returned invalid XML in the response: XML parse
> > error at line 28: Bytes: 0xE9 0x22 0x3C 0x2F .. (/obj/repo/!svn/bc/2)
> >
> > and ra_local to enter an infinte loop
> >
> > LANG=en_GB svn ls file://`pwd`/repo
>
> Wow, can't say I like this either.  Any recommendations on how we should
> solve this problem?  I saw the discussion about performing UTF-8 encoding,
> and *then* URI encoding.  But how are we to validate something like
> 'http:// localhost:8888/obj/repo/%e9'.  Do we need to URI decode it, UTF8
> encode, and the URI encode it again?  Who should be responsible for doing
> this?  The command line client, or client library?

Well, I made a patch to try and fix this problem and discovered something 
interesting, but not too suprising.  I modified 
svn_opt_args_to_target_array() to URI-decode, UTF-8 convert, and URI-encode 
the strings... yeah, that didn't work.  We need a way to just verify that the 
string is UTF-8.  Trying to convert a string that is already UTF-8 just 
resulted in it being converted more based on my current locale (which is not 
UTF-8), and this is not what we want.  Any ideas on how to write such a 
function?

-John



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn blame and filenames with non-ascii chars

Posted by Julian Reschke <ju...@gmx.de>.
SteveKing wrote:
> I heard that there will soon be domain names with
> special characters in it, not just ASCII chars. How
> is the definition of that encoding? Does someone know?
> All I found about that is
> ftp://ftp.rfc-editor.org/in-notes/rfc3490.txt
> ftp://ftp.rfc-editor.org/in-notes/rfc3491.txt
> ftp://ftp.rfc-editor.org/in-notes/rfc3492.txt
> http://www.ietf.org/html.charters/OLD/idn-charter.html
> http://www.ietf.org/proceedings/00jul/SLIDES/idn-race/

I think that's a separate issue. It's about encoding non-ASCII 
characters into domain names (so a DNS protocol issue), not about 
mapping non-ASCII characters into HTTP path segments (an issue that's 
not handled by a public spec, but for which there's simply only one 
interoperable solution that indeed seems to have wide support).

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Joe Orton <jo...@manyfish.co.uk>.
On Sat, Dec 13, 2003 at 11:15:46PM +0100, SteveKing wrote:
> 
> ----- Original Message ----- 
> From: "John Szakmeister" <jo...@szakmeister.net>
> 
> [snip]
> > Wow, can't say I like this either.  Any recommendations 
> > on how we should solve this problem?  I saw the 
> > discussion about performing UTF-8 encoding, and 
> > *then* URI encoding.  But how are we to validate 
> > something like 'http://localhost:8888/obj/repo/%e9'.  
> > Do we need to URI decode it, UTF8 encode, and 
> > the URI encode it again?  
> 
> I heard that there will soon be domain names with
> special characters in it, not just ASCII chars. How
> is the definition of that encoding? Does someone know?
> All I found about that is
> ftp://ftp.rfc-editor.org/in-notes/rfc3490.txt
> ftp://ftp.rfc-editor.org/in-notes/rfc3491.txt
> ftp://ftp.rfc-editor.org/in-notes/rfc3492.txt
> http://www.ietf.org/html.charters/OLD/idn-charter.html
> http://www.ietf.org/proceedings/00jul/SLIDES/idn-race/

IDNA (RFC3490) will be supported by the next release of neon using GNU
libidn, and will allow hostnames to be passed in as valid UTF-8-encoded
strings (I don't know if ra_dav does that currently).

joe

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Martin Furter <mf...@rola.ch>.

On Sat, 13 Dec 2003, SteveKing wrote:
> I heard that there will soon be domain names with
> special characters in it, not just ASCII chars. How
> is the definition of that encoding? Does someone know?
> All I found about that is
> ftp://ftp.rfc-editor.org/in-notes/rfc3490.txt
> ftp://ftp.rfc-editor.org/in-notes/rfc3491.txt
> ftp://ftp.rfc-editor.org/in-notes/rfc3492.txt
> http://www.ietf.org/html.charters/OLD/idn-charter.html
> http://www.ietf.org/proceedings/00jul/SLIDES/idn-race/

I found another 2 URL's:

http://www.i-d-n.net/
http://www.switch.ch/id/idn/

And I heard that IE has some support for IDN.

Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by SteveKing <st...@gmx.ch>.
----- Original Message ----- 
From: "John Szakmeister" <jo...@szakmeister.net>

[snip]
> Wow, can't say I like this either.  Any recommendations 
> on how we should solve this problem?  I saw the 
> discussion about performing UTF-8 encoding, and 
> *then* URI encoding.  But how are we to validate 
> something like 'http://localhost:8888/obj/repo/%e9'.  
> Do we need to URI decode it, UTF8 encode, and 
> the URI encode it again?  

I heard that there will soon be domain names with
special characters in it, not just ASCII chars. How
is the definition of that encoding? Does someone know?
All I found about that is
ftp://ftp.rfc-editor.org/in-notes/rfc3490.txt
ftp://ftp.rfc-editor.org/in-notes/rfc3491.txt
ftp://ftp.rfc-editor.org/in-notes/rfc3492.txt
http://www.ietf.org/html.charters/OLD/idn-charter.html
http://www.ietf.org/proceedings/00jul/SLIDES/idn-race/


> Who should be responsible 
> for doing this?  The command line client, or client library?

I guess that depends on the solution for the problem.
But I'd prefer to handle that kind of conversion inside
the library so that clients only have to convert their
internal data to UTF8 - such a conversion is possible
from all programming languages. More special conversions
may not be available in some scripting languages.

Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by John Szakmeister <jo...@szakmeister.net>.
On Saturday 13 December 2003 12:27, Philip Martin wrote:
> John Szakmeister <jo...@szakmeister.net> writes:
> > Index: subversion/libsvn_client/blame.c
> > ===================================================================
> > --- subversion/libsvn_client/blame.c	(revision 7978)
> > +++ subversion/libsvn_client/blame.c	(working copy)
> > @@ -378,7 +378,9 @@
> >
> >    SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
> >
> > -  lmb.path = url + strlen (reposURL);
> > +  /* Convert path from URI to UTF-8 before placing it in the baton */
> > +  lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);
>
> Hmmm...
>
> My first instinct was "the comment is redundant" since it duplicates
> the documented behaviour of the function.  Then I realised that the
> documentation doesn't mention UTF-8, so I started thinking about the
> URL.  It's already UTF-8 isn't it?  Whether the decoded URL is UTF-8
> depends on what characters are URI encoded in the URL, where do we
> guarantee that the URL contains URI encoded UTF-8?

Philip, I have to say you have an amazing ability to go through and validate 
*everything*. :-)  Didn't even cross my mind to even try the things you did.

> Next I tried a few things
>
> $ svnadmin create repo
> $ svn import Makefile http://localhost:8888/obj/repo/%c3%a9 -m ""
> $ LANG=en_GB svn ls http://localhost:8888/obj/repo
> é
> $ svn blame http://localhost:8888/obj/repo/%c3%a9
> ../svn/subversion/libsvn_client/blame.c:308: (apr_err=20014)
> svn: Missing changed-path information for revision 1 of '%a9'
> $ LANG=en_GB svn ls file://`pwd`/repo/%c3%a9
> ../svn/subversion/libsvn_client/ls.c:144: (apr_err=160013)
> svn: URL non-existent in that revision.

The problem above is due to the fact that log_message_receiver() is comparing 
URI encoded paths against non-URI encoded ones.  The keys for the 
changed_paths hash where inserted in a non-URI format.  The patch will fix 
this problem, but not the following ones. :-)

> Rather alarmingly, I can get non-URI encoded paths into the repository 

Oof, I don't like the sound of that.

> $ svn import Makefile http://localhost:8888/obj/repo/%e9 -m ""
> $ svnadmin dump -q repo | grep Node-path
> Node-path: é
>
> which cause ra_dav to produce an error
>
> LANG=en_GB svn ls http://localhost:8888/obj/repo/
> ../svn/subversion/libsvn_ra_dav/util.c:661: (apr_err=175002)
> svn: PROPFIND request failed on '/obj/repo/!svn/bc/2'
> ../svn/subversion/libsvn_ra_dav/util.c:647: (apr_err=175002)
> svn: The PROPFIND request returned invalid XML in the response: XML parse
> error at line 28: Bytes: 0xE9 0x22 0x3C 0x2F .. (/obj/repo/!svn/bc/2)
>
> and ra_local to enter an infinte loop
>
> LANG=en_GB svn ls file://`pwd`/repo

Wow, can't say I like this either.  Any recommendations on how we should solve 
this problem?  I saw the discussion about performing UTF-8 encoding, and 
*then* URI encoding.  But how are we to validate something like 'http://
localhost:8888/obj/repo/%e9'.  Do we need to URI decode it, UTF8 encode, and 
the URI encode it again?  Who should be responsible for doing this?  The 
command line client, or client library?

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn blame and filenames with non-ascii chars

Posted by SteveKing <st...@gmx.ch>.
----- Original Message ----- 
From: "Philip Martin" <ph...@codematters.co.uk>

> >> What I do in TSVN is for paths to encode them in UTF-8 before
> >> passing to the subversion functions, and URL's first URI encoded 
> >> and then in UTF-8 before passing to the subversion functions. 
> >
> > Are you sure this the correct order? it means that if you decode the
> > URL you can be stuck with a non-UTF-8 encoded URL.
> 
> Exactly.  I don't think the command line client gets it right either.

So that's where the whole problem is? In that case I suggest
that (to make it easier for all the clients out there and the
several language bindings) the subversion library should do
the URI encoding and only leave the UTF8 encoding to
the clients. After all, the subversion libs know best if
a path is local/svn/http/... and can do the URI encoding
where necessary.

Stefan

(ok, I admit, it would also be easier for me :-) )

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Philip Martin <ph...@codematters.co.uk>.
"Erik Huelsmann" <e....@gmx.net> writes:

> Hi Stefan,
>
>> > depends on what characters are URI encoded in the URL, where do we
>> > guarantee that the URL contains URI encoded UTF-8?
>> 
>> What I do in TSVN is for paths to encode them in UTF-8 before
>> passing to the subversion functions, and URL's first URI encoded 
>> and then in UTF-8 before passing to the subversion functions. 
>
> Are you sure this the correct order? it means that if you decode the
> URL you can be stuck with a non-UTF-8 encoded URL.

Exactly.  I don't think the command line client gets it right either.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Erik Huelsmann <e....@gmx.net>.
> The process of mapping the *characters* for instance -- inside a 
> filename -- containing non-ASCII characters should always be:
> 
> - encode using UTF-8, resulting in a byte sequence
> - apply "hex"-escaping to those bytes that are outside ASCII or need to 
> be escaped inside a URI (such as a space character)

Which is what I meant of course. Yes, terminology is an important thing!

As pointed out by Stefan: do we need to fix the command line client for this
too?

bye,

Erik.

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Julian Reschke <ju...@gmx.de>.
Erik Huelsmann wrote:

> Hi Stefan,
> Are you sure this the correct order? it means that if you decode the URL you
> can be stuck with a non-UTF-8 encoded URL. I think Subversion libraries
 > ...

I don't have an opinion on this particular issue, but it certainly would 
be good to keep the terminology straight...:

There is no such thing as a "non-encoded" or "encoded" URI. RFC2396 
describes legal URIs, and legal URIs never ever contain non-ASCII 
characters.

The process of mapping the *characters* for instance -- inside a 
filename -- containing non-ASCII characters should always be:

- encode using UTF-8, resulting in a byte sequence
- apply "hex"-escaping to those bytes that are outside ASCII or need to 
be escaped inside a URI (such as a space character)


Julian


-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by SteveKing <st...@gmx.ch>.
----- Original Message ----- 
From: "Erik Huelsmann" <e....@gmx.net>
> > What I do in TSVN is for paths to encode them in UTF-8 before
> > passing to the subversion functions, and URL's first URI encoded
> > and then in UTF-8 before passing to the subversion functions.
>
> Are you sure this the correct order? it means that if you decode the URL
you
> can be stuck with a non-UTF-8 encoded URL. I think Subversion libraries
> assume they always operate on UTF-8 encoded strings. This means that you
should
> first encode your URL to UTF-8 and then URI-encode it. Reversing the
operation
> will give a UTF-8 encoded URL.

If you have to first do the UTF8 conversion and _then_ do the URI encoding,
then the command line client wouldn't work with most commands. The
command line client expects that the user enters URL's already URI encoded
and then does the UTF8 encoding.
(or does the shell already UTF8 encoding and the client just reads user
input as UTF8? That would really surprise me)

Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Erik Huelsmann <e....@gmx.net>.
Hi Stefan,

> > depends on what characters are URI encoded in the URL, where do we
> > guarantee that the URL contains URI encoded UTF-8?
> 
> What I do in TSVN is for paths to encode them in UTF-8 before
> passing to the subversion functions, and URL's first URI encoded 
> and then in UTF-8 before passing to the subversion functions. 

Are you sure this the correct order? it means that if you decode the URL you
can be stuck with a non-UTF-8 encoded URL. I think Subversion libraries
assume they always operate on UTF-8 encoded strings. This means that you should
first encode your URL to UTF-8 and then URI-encode it. Reversing the operation
will give a UTF-8 encoded URL.

[ snip ]

> That worked in the past with all functions I use in TSVN - at least
> I haven't found something that doesn't until now (except the blame
> command). 
> Am I wrong here with my assumptions?

Maybe in the order of encoding?

bye,

Erik.

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by SteveKing <st...@gmx.ch>.
----- Original Message ----- 
From: "Philip Martin" <ph...@codematters.co.uk>

> My first instinct was "the comment is redundant" since it duplicates
> the documented behaviour of the function.  Then I realised that the
> documentation doesn't mention UTF-8, so I started thinking about the
> URL.  It's already UTF-8 isn't it?  Whether the decoded URL is UTF-8

AFAIK yes. The whole subversion library expects paths and 
urls in UTF-8 format.

> depends on what characters are URI encoded in the URL, where do we
> guarantee that the URL contains URI encoded UTF-8?

What I do in TSVN is for paths to encode them in UTF-8 before
passing to the subversion functions, and URL's first URI encoded 
and then in UTF-8 before passing to the subversion functions. 

URLs/paths returned by the subversion libraries are usually also
encoded that way (UTF-8 and for URL's additionally URI).
I guess the only exception to that is the svn_client_ls() command
which provides the filelist already de-URI encoded.

That worked in the past with all functions I use in TSVN - at least
I haven't found something that doesn't until now (except the blame
command). 
Am I wrong here with my assumptions?

Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Julian Foad <ju...@btopenworld.com>.
kfogel@collab.net wrote:
> Julian Foad <ju...@btopenworld.com> writes:
> 
>>Does anyone else want me to commit it?  It is not clever or of much
>>use on its own, so it could just hang around here on the mailing
>>list until there is a patch to go with it, at which time the test
>>may well be extended or just re-created from scratch by someone
>>else.
> 
> It's always good to have a test in the tree, XFail or not.  One less
> thing for someone to have to dig out of a mailing list archive later.
> 
> Go for it!

Committed in r8008.  Thanks.

- Julian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by kf...@collab.net.
Julian Foad <ju...@btopenworld.com> writes:
> Does anyone else want me to commit it?  It is not clever or of much
> use on its own, so it could just hang around here on the mailing
> list until there is a patch to go with it, at which time the test
> may well be extended or just re-created from scratch by someone
> else.

It's always good to have a test in the tree, XFail or not.  One less
thing for someone to have to dig out of a mailing list archive later.

Go for it!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Julian Foad <ju...@btopenworld.com>.
mark benedetto king wrote:
> On Sun, Dec 14, 2003 at 04:21:59PM -0500, mark benedetto king wrote:
> 
>>On Sun, Dec 14, 2003 at 05:46:31PM +0000, Julian Foad wrote:
>>
>>>Even a space in a file name makes "svn blame" fail.  Here is an initial 
>>>test.
>>
>>The test looks good.  Can you make it XFAIL and commit it?
> 
> I lied; the log message says "Property change" instead of "Added file
> with space in name".

Here it is, changed to XFail, and with the test repository's log message empty (since most tests use a log message of "log msg" or "fooogle" or some such, which are just empty messages in disguise).

Does anyone else want me to commit it?  It is not clever or of much use on its own, so it could just hang around here on the mailing list until there is a patch to go with it, at which time the test may well be extended or just re-created from scratch by someone else.

- Julian


Re: svn blame and filenames with non-ascii chars

Posted by mark benedetto king <mb...@lowlatency.com>.
On Sun, Dec 14, 2003 at 04:21:59PM -0500, mark benedetto king wrote:
> On Sun, Dec 14, 2003 at 05:46:31PM +0000, Julian Foad wrote:
> > Even a space in a file name makes "svn blame" fail.  Here is an initial 
> > test.
> > 
> > - Julian
> 
> The test looks good.  Can you make it XFAIL and commit it?
> 

I lied; the log message says "Property change" instead of "Added file
with space in name".

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by mark benedetto king <mb...@lowlatency.com>.
On Sun, Dec 14, 2003 at 05:46:31PM +0000, Julian Foad wrote:
> Even a space in a file name makes "svn blame" fail.  Here is an initial 
> test.
> 
> - Julian

The test looks good.  Can you make it XFAIL and commit it?

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by Julian Foad <ju...@btopenworld.com>.
Even a space in a file name makes "svn blame" fail.  Here is an initial test.

- Julian

Re: svn blame and filenames with non-ascii chars

Posted by Philip Martin <ph...@codematters.co.uk>.
John Szakmeister <jo...@szakmeister.net> writes:

> Index: subversion/libsvn_client/blame.c
> ===================================================================
> --- subversion/libsvn_client/blame.c	(revision 7978)
> +++ subversion/libsvn_client/blame.c	(working copy)
> @@ -378,7 +378,9 @@
>  
>    SVN_ERR (ra_lib->get_repos_root (session, &reposURL, pool));
>  
> -  lmb.path = url + strlen (reposURL);
> +  /* Convert path from URI to UTF-8 before placing it in the baton */
> +  lmb.path = svn_path_uri_decode (url + strlen (reposURL), pool);

Hmmm...

My first instinct was "the comment is redundant" since it duplicates
the documented behaviour of the function.  Then I realised that the
documentation doesn't mention UTF-8, so I started thinking about the
URL.  It's already UTF-8 isn't it?  Whether the decoded URL is UTF-8
depends on what characters are URI encoded in the URL, where do we
guarantee that the URL contains URI encoded UTF-8?

Next I tried a few things

$ svnadmin create repo
$ svn import Makefile http://localhost:8888/obj/repo/%c3%a9 -m ""
$ LANG=en_GB svn ls http://localhost:8888/obj/repo
é
$ svn blame http://localhost:8888/obj/repo/%c3%a9
../svn/subversion/libsvn_client/blame.c:308: (apr_err=20014)
svn: Missing changed-path information for revision 1 of '%a9'
$ LANG=en_GB svn ls file://`pwd`/repo/%c3%a9
../svn/subversion/libsvn_client/ls.c:144: (apr_err=160013)
svn: URL non-existent in that revision.

Rather alarmingly, I can get non-UTF-8 names into the repository

$ svn import Makefile http://localhost:8888/obj/repo/%e9 -m ""
$ svnadmin dump -q repo | grep Node-path
Node-path: é

which cause ra_dav to produce an error

LANG=en_GB svn ls http://localhost:8888/obj/repo/
../svn/subversion/libsvn_ra_dav/util.c:661: (apr_err=175002)
svn: PROPFIND request failed on '/obj/repo/!svn/bc/2'
../svn/subversion/libsvn_ra_dav/util.c:647: (apr_err=175002)
svn: The PROPFIND request returned invalid XML in the response: XML parse error at line 28: Bytes: 0xE9 0x22 0x3C 0x2F
.. (/obj/repo/!svn/bc/2)

and ra_local to enter an infinte loop

LANG=en_GB svn ls file://`pwd`/repo

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn blame and filenames with non-ascii chars

Posted by John Szakmeister <jo...@szakmeister.net>.
On Saturday 13 December 2003 07:40, SteveKing wrote:
> [snip]
> Steps to reproduce:
>
> - add a file with a special name to wc (e.g. "äöüÄÖÜ.txt")
>   and commit
> - change the file and commit  (not sure if that's really needed to
> reproduce)
> - try 'svn blame <filepath_to_that_file>'
> svn: Error string not specified yet
> svn: Missing changed-path information for revision 3 of '%C3%A4%C3%B6%C3%B
> C%C3%84%C3%96%C3%9C.txt'

Thanks Stefan!  I was able to reproduce this and track down the problem.  It 
turns out that the log message baton had the path stored in URI format, while 
the changed path keys where UTF-8 encoded.

Here is the patch and the log message.  Since we're getting strict on commits, 
and I've never been a part of this process before (at least not for a project 
this big), I'll wait for approval before committing this change.

-John

Log:
Fix the case where paths containing non-ascii characters would cause the blame
command to fail.

* subversion/libsvn_client/blame.c
  (svn_client_blame): Convert the path stored in the message baton from a URI
  encoding to UTF-8.