You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nicolás Lichtmaier <jn...@synapsis-sa.com.ar> on 2004/02/19 15:58:10 UTC

Problem with accented characters in checkout

Hi. We are trying to implement subversion (0.37) in my company. We have 
configured a server, but when I try to do a checkout I'm getting:

svn: REPORT request failed on '/sda/!svn/vcc/default'
svn: The REPORT request returned invalid XML in the response: XML parse 
error at line 4745: Bytes: 0xF3 0x6E 0x20 0x64
. (/sda/!svn/vcc/default)

I have captured the conversation with Ethereal and I see this:

<S:add-file name="Descripción de cálculos eléctricos.doc">
<D:checked-in><D:href>/sda/!svn/ver/752/sac/tronco/doc/Descripci%F3n%20de%20c%E1lculos%20el%E9ctricos.doc</D:href></D:checked-in>
<S:set-prop name="svn:entry:committed-rev">348</S:set-prop>
<S:set-prop 
name="svn:entry:committed-date">2003-12-29T22:48:50.000000Z</S:set-prop>
<S:set-prop name="svn:entry:last-author">syajnl</S:set-prop>

May it be that there's a problem with the accented character?

Thanks!

-- 
Nicolás Lichtmaier.-
Synapsis Argentina
+54(11)4314-3000 (int. 231)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Philip Martin <ph...@codematters.co.uk>.
Ben Collins-Sussman <su...@collab.net> writes:

> On Thu, 2004-02-19 at 11:59, Philip Martin wrote:
>
>> The only caveat is that I don't really use anything other than ASCII,
>> and I don't usually use UTF-8 either. It could do with some real-world
>> testing. (The reason I wrote it is because I wanted to find out about
>> UTF-8 :)
>
> Hm, does that mean you don't think it's stable enough to go into 1.0.1? 
> If you're not confident about it, then the risk of releasing the change
> might be just as bad as the risk of not doing so.

It's simple, stable code with tests and I believe it does what I
intended it to do, which is to trap all input that is not "well formed
ITF-8".  What I don't know is whether those who use UTF-8 in the real
world follow those rules.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Ben Collins-Sussman <su...@collab.net>.
On Thu, 2004-02-19 at 11:59, Philip Martin wrote:

> The only caveat is that I don't really use anything other than ASCII,
> and I don't usually use UTF-8 either. It could do with some real-world
> testing. (The reason I wrote it is because I wanted to find out about
> UTF-8 :)

Hm, does that mean you don't think it's stable enough to go into 1.0.1? 
If you're not confident about it, then the risk of releasing the change
might be just as bad as the risk of not doing so.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Philip Martin <ph...@codematters.co.uk>.
Ben Collins-Sussman <su...@collab.net> writes:

> On Thu, 2004-02-19 at 11:32, Philip Martin wrote:
>
>> This is exactly what happens if one uses a UTF-8 locale and then one
>> imports a directory tree containing non-UTF-8 names.  There is no
>> validation of the names, they go straight into the repository and then
>> checkouts can fail.
>
> Yikes, so your r8581 change fixes this, right?

I believe it does.

> I wonder if we shouldn't get it into 1.0.1 ASAP.  This could really bite
> a whole lot of people.  For example, my freshly installed RH9 box uses a
> locale of "en_US.UTF-8" by default.  

The only caveat is that I don't really use anything other than ASCII,
and I don't usually use UTF-8 either. It could do with some real-world
testing. (The reason I wrote it is because I wanted to find out about
UTF-8 :)

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Ben Collins-Sussman <su...@collab.net>.
On Thu, 2004-02-19 at 11:32, Philip Martin wrote:

> This is exactly what happens if one uses a UTF-8 locale and then one
> imports a directory tree containing non-UTF-8 names.  There is no
> validation of the names, they go straight into the repository and then
> checkouts can fail.

Yikes, so your r8581 change fixes this, right?

I wonder if we shouldn't get it into 1.0.1 ASAP.  This could really bite
a whole lot of people.  For example, my freshly installed RH9 box uses a
locale of "en_US.UTF-8" by default.  



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Nicolás Lichtmaier <jn...@synapsis-sa.com.ar>.
Philip Martin wrote:

>This is exactly what happens if one uses a UTF-8 locale and then one
>imports a directory tree containing non-UTF-8 names.  There is no
>validation of the names, they go straight into the repository and then
>checkouts can fail.
>
>You either need to manually convert the names to UTF-8 before
>importing them, or use a non-UTF-8 locale and the client will convert
>on the fly and send UTF-8 to the repository.
>
>The trunk has code to detect the invalid name at import time and abort
>the import.
>  
>

I've used cvs2svn.py, but yes, LANG was wrongly set to "en_US.UTF-8". 
I'll need to re-run cvs2svn.py with a ISO-8859-1 locale to fix this, right?

I think that Subversion should not trust LANG so much and check that the 
input is valid UTF-8 before importing it in the db.

-- 
Nicolás Lichtmaier.-
Synapsis Argentina
+54(11)4314-3000 (int. 231)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Philip Martin <ph...@codematters.co.uk>.
Nicolás Lichtmaier <jn...@synapsis-sa.com.ar> writes:

> Hi. We are trying to implement subversion (0.37) in my company. We
> have configured a server, but when I try to do a checkout I'm getting:
>
> svn: REPORT request failed on '/sda/!svn/vcc/default'
> svn: The REPORT request returned invalid XML in the response: XML
> parse error at line 4745: Bytes: 0xF3 0x6E 0x20 0x64
> . (/sda/!svn/vcc/default)
>
> I have captured the conversation with Ethereal and I see this:
>
> <S:add-file name="Descripción de cálculos eléctricos.doc">

This is exactly what happens if one uses a UTF-8 locale and then one
imports a directory tree containing non-UTF-8 names.  There is no
validation of the names, they go straight into the repository and then
checkouts can fail.

You either need to manually convert the names to UTF-8 before
importing them, or use a non-UTF-8 locale and the client will convert
on the fly and send UTF-8 to the repository.

The trunk has code to detect the invalid name at import time and abort
the import.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Nicolás Lichtmaier <jn...@synapsis-sa.com.ar>.
Ben Collins-Sussman wrote:

>Did you use cvs2svn.py to create this repository?  And if so, what
>version?
>  
>

Yes, I did. It says "$LastChangedRevision: 7921 $".

And now that I check, LANG is (wrongly) set to an UTF-8 locale. That 
might be the problem...

-- 
Nicolás Lichtmaier.-
Synapsis Argentina
+54(11)4314-3000 (int. 231)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Ben Collins-Sussman <su...@collab.net>.
On Thu, 2004-02-19 at 09:58, Nicolás Lichtmaier wrote:

> <S:add-file name="Descripción de cálculos eléctricos.doc">

Nicolás, 

Did you use cvs2svn.py to create this repository?  And if so, what
version?



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Florian Weimer <fw...@deneb.enyo.de>.
Nicolás Lichtmaier wrote:

> May it be that there's a problem with the accented character?

I think so.  What's your locale?  Is it UTF-8 based?  In this case, your
file names should be encoded in UTF-8, too.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by Joe Orton <jo...@redhat.com>.
On Thu, Feb 19, 2004 at 10:44:05AM -0600, Mike Pilato wrote:
> Nicolás Lichtmaier <jn...@synapsis-sa.com.ar> writes:
> 
> > I have captured the conversation with Ethereal and I see this:
> > 
> > <S:add-file name="Descripción de cálculos eléctricos.doc">
> > <D:checked-in><D:href>/sda/!svn/ver/752/sac/tronco/doc/Descripci%F3n%20de%20c%E1lculos%20el%E9ctricos.doc</D:href></D:checked-in>
> > <S:set-prop name="svn:entry:committed-rev">348</S:set-prop>
> > <S:set-prop
> > name="svn:entry:committed-date">2003-12-29T22:48:50.000000Z</S:set-prop>
> > <S:set-prop name="svn:entry:last-author">syajnl</S:set-prop>
> > 
> > May it be that there's a problem with the accented character?
> 
> Hm.  I just did a checkout over HTTP with a file named "Descripción".
> My ethereal showed the unescaped, accented characters, just like
> yours.  But I didn't get any errors -- my checkout completed
> successfully.

The issue is that an ISO-8859-1 (or whatever) filename is getting
included verbatim in a REPORT response body, which is a UTF-8 XML
document.  This triggers an XML parse error, of course.  There have been
similar reports before:

http://www.contactor.se/~dast/svnusers/archive-2003-12/0075.shtml

joe

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem with accented characters in checkout

Posted by "C. Michael Pilato" <cm...@collab.net>.
Nicolás Lichtmaier <jn...@synapsis-sa.com.ar> writes:

> I have captured the conversation with Ethereal and I see this:
> 
> <S:add-file name="Descripción de cálculos eléctricos.doc">
> <D:checked-in><D:href>/sda/!svn/ver/752/sac/tronco/doc/Descripci%F3n%20de%20c%E1lculos%20el%E9ctricos.doc</D:href></D:checked-in>
> <S:set-prop name="svn:entry:committed-rev">348</S:set-prop>
> <S:set-prop
> name="svn:entry:committed-date">2003-12-29T22:48:50.000000Z</S:set-prop>
> <S:set-prop name="svn:entry:last-author">syajnl</S:set-prop>
> 
> May it be that there's a problem with the accented character?

Hm.  I just did a checkout over HTTP with a file named "Descripción".
My ethereal showed the unescaped, accented characters, just like
yours.  But I didn't get any errors -- my checkout completed
successfully.

Questions I have:

Is it possible that our respective Neon's are using different XML
parsers that behave differently with respect to accented characters
(mine is libxml2) ?  

Could it be a problem with LOCALE (mine is set to UTF8, I think)?

Why in the world doesn't Subversion's XML encoder convert accented
characters to numeric entities?!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org