You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Etienne Miret <et...@ens-lyon.fr> on 2008/03/19 16:40:48 UTC

Issue with UTF-8 filenames

Hello,

I’ve made an import on a svn repository with my locale incorrectly set 
to 'fr_FR', which led it to interpret my filenames as ISO-Latin-1, 
although they were UTF-8. Hence, the names are currently stored in my 
repository in double UTF-8.

After (correctly) setting the locale to 'fr_FR.UTF-8', I ran 'svn 
status' on my working directory, and got exactly the result I expected:
   $ svn status
   ?      Impérialisme
   !      Impérialisme
The files with the wrong name is reported missing, and the one with the 
correct name is reported not to be versioned.

Now I intended to delete my file, and correct the name by a 'svn update' 
followed by a 'svn move'. However :
   $ svn update
   A    Impérialisme

   $ svn status
   ?      Impérialisme
   ?      Impérialisme
   !      Impérialisme

   $ rm Impérialisme

   $ svn mv Impérialisme Impérialisme
   A         Impérialisme
   svn: Working copy 'Impérialisme' locked
   svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for
   details)

   $ svn status
   ?      Impérialisme
   ?      Impérialisme
   !  +   Impérialisme
   !      Impérialisme
Obviously 'svn' doesn’t correctly compares UTF-8 strings. The issue 
seems to be that there are several codes for the same character. For 
example 'é' can be 0xC3A9 (LATIN SMALL LETTER E WITH ACUTE) or 0x65CC81 
(LATIN SMALL LETTER E + COMBINING ACUTE ACCENT). Unfortunately, I wasn’t 
lucky enough for subversion and my OS to always use the same form.

I’m running subversion 1.4.4 on Mac OS X 10.5.2.

Is this a known bug, and is there any workaround?

Regards,

-- 
Etienne Miret
Ne m'envoyez pas de fichier Word SVP, je ne peux pas les lire !
Don't send me Word attachments please, I can't read them!
http://perso.ens-lyon.fr/etienne.miret/Netiquette/no_MS_Office

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Issue with UTF-8 filenames

Posted by Erik Huelsmann <eh...@gmail.com>.
On 3/20/08, Etienne Miret <et...@ens-lyon.fr> wrote:
> Thank you both for your answers.
>
> Actually, I think I had both bugs :-) First, the one explained by Erik
> Huelsmann, which is why ma files ended in double UTF-8 at first. The the
> one mentioned by Ryan Schmidt, which prevents me to correct the original
> problem by hand.
>
> Thank you again, I'll have a look at the patch.

For now, I'd like to point out that issue #2464 isn't a problem in
environments where only Mac OSX is used. So, if you will be working in
such an environment, there's no need to use the patch.

Next to that, if you have pre-existing mixed environments or you have
pre-existing mixed as well as Mac-only environments, the patch may
cause you more headaches than what it actually solves, because it
doesn't eliminate the root cause. It alleviates some problems for the
specific patch submitter.

So, before you decide to use the patch, please be sure to have it
tested in *your* environments: there are other issues with it than
lack of time.


HTH,


Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Issue with UTF-8 filenames

Posted by Erik Huelsmann <eh...@gmail.com>.
On 3/20/08, Ryan Schmidt <su...@ryandesign.com> wrote:
> On Mar 19, 2008, at 11:40, Etienne Miret wrote:
>
> > I've made an import on a svn repository with my locale incorrectly
> > set to 'fr_FR', which led it to interpret my filenames as ISO-
> > Latin-1, although they were UTF-8. Hence, the names are currently
> > stored in my repository in double UTF-8.
> >
> > After (correctly) setting the locale to 'fr_FR.UTF-8', I ran 'svn
> > status' on my working directory, and got exactly the result I
> > expected:
> >   $ svn status
> >   ?      Impérialisme
> >   !      ImpeÌ rialisme
> > The files with the wrong name is reported missing, and the one with
> > the correct name is reported not to be versioned.
> >
> > Now I intended to delete my file, and correct the name by a 'svn
> > update' followed by a 'svn move'. However :
> >   $ svn update
> >   A    ImpeÌ rialisme
> >
> >   $ svn status
> >   ?      ImpeÌ rialisme
> >   ?      Impérialisme
> >   !      ImpeÌ rialisme
> >
> >   $ rm Impérialisme
> >
> >   $ svn mv ImpeÌ rialisme Impérialisme
> >   A         Impérialisme
> >   svn: Working copy 'ImpeÌ rialisme' locked
> >   svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for
> >   details)
> >
> >   $ svn status
> >   ?      ImpeÌ rialisme
> >   ?      Impérialisme
> >   !  +   Impérialisme
> >   !      ImpeÌ rialisme
> > Obviously 'svn' doesn't correctly compares UTF-8 strings. The
> > issue seems to be that there are several codes for the same
> > character. For example 'é' can be 0xC3A9 (LATIN SMALL LETTER E WITH
> > ACUTE) or 0x65CC81 (LATIN SMALL LETTER E + COMBINING ACUTE ACCENT).
> > Unfortunately, I wasn't lucky enough for subversion and my OS to
> > always use the same form.
> >
> > I'm running subversion 1.4.4 on Mac OS X 10.5.2.
> >
> > Is this a known bug, and is there any workaround?
>
> Sounds like this bug, which is indeed a bigger problem for Mac users
> (specifically users of the Mac OS Extended filesystem):
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>
> There even appears to be a patch.

The problem Etienne describes is - although related - a bit different
than the one described above.

What's happening is this:
* Subversion asks APR in what encoding it can expect FS input to be
* APR answers (on all *nixy systems): Look at the locale
* Subversion sees FR_fr (which uses iso-8859-1 as its default)
* Subversion uses iso-8859-1

What should happen:
* Subversion asks APR in what encoding it can expect FS input
* APR answers: UTF-8 (because that's what Mac OSX FS api defines)
* Subversion uses UTF-8

This issue is actually fixed in recent APR versions (0.9.x as well as
1.2.x), so if you got your binary from a pre-built source, please ask
them to start building against the newest APR patch release of their
prefered minor version.


HTH,

Erik.

Re: Issue with UTF-8 filenames

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 19, 2008, at 11:40, Etienne Miret wrote:

> I’ve made an import on a svn repository with my locale incorrectly  
> set to 'fr_FR', which led it to interpret my filenames as ISO- 
> Latin-1, although they were UTF-8. Hence, the names are currently  
> stored in my repository in double UTF-8.
>
> After (correctly) setting the locale to 'fr_FR.UTF-8', I ran 'svn  
> status' on my working directory, and got exactly the result I  
> expected:
>   $ svn status
>   ?      Impérialisme
>   !      Impérialisme
> The files with the wrong name is reported missing, and the one with  
> the correct name is reported not to be versioned.
>
> Now I intended to delete my file, and correct the name by a 'svn  
> update' followed by a 'svn move'. However :
>   $ svn update
>   A    Impérialisme
>
>   $ svn status
>   ?      Impérialisme
>   ?      Impérialisme
>   !      Impérialisme
>
>   $ rm Impérialisme
>
>   $ svn mv Impérialisme Impérialisme
>   A         Impérialisme
>   svn: Working copy 'Impérialisme' locked
>   svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for
>   details)
>
>   $ svn status
>   ?      Impérialisme
>   ?      Impérialisme
>   !  +   Impérialisme
>   !      Impérialisme
> Obviously 'svn' doesn’t correctly compares UTF-8 strings. The  
> issue seems to be that there are several codes for the same  
> character. For example 'é' can be 0xC3A9 (LATIN SMALL LETTER E WITH  
> ACUTE) or 0x65CC81 (LATIN SMALL LETTER E + COMBINING ACUTE ACCENT).  
> Unfortunately, I wasn’t lucky enough for subversion and my OS to  
> always use the same form.
>
> I’m running subversion 1.4.4 on Mac OS X 10.5.2.
>
> Is this a known bug, and is there any workaround?

Sounds like this bug, which is indeed a bigger problem for Mac users  
(specifically users of the Mac OS Extended filesystem):

http://subversion.tigris.org/issues/show_bug.cgi?id=2464

There even appears to be a patch.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org