You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Erich Enke <ep...@ruffdogs.com> on 2004/10/04 17:14:57 UTC

International Characters & Subversion 1.1.0 Problems

I've upgraded from 1.0.8 to 1.1.0 and this hasn't solved my problems.

Problems:
* I can't check in a filename such as 'G%E4steBuch' and expect to be 
able to 'svn merge' and 'svn commit' it later (G?\228steBuch not found 
in repo)  (yes, I made a new merge now that I've upgraded.  Same deal.)

* If I try to 'svn commit --encoding UTF-8' the merge, that doesn't work 
(GästeBuch not found in repo)

* If I try to 'svn add GästeBuch; svn ci --encoding UTF-8', that doesn't 
work (invalid UTF-8 sequence (hex: e4 73 74 65))

Note that any time I have done operations with UTF above, I have done:
export LANG=UTF-8
export LC_CLANG=UTF-8
export LC_CTYPE=UTF-8

I need at least a work-around.  The desired behavior is to check in 
'G%E4steBuch', merge into another branch, be able to commit my changes, 
and have 'G%E4steBuch' show up in the new branch.  Please keep in mind 
that I have only done a little with international characters and unicode.

Explanations of the internal workings that are causing my problems would 
also be welcome.

TIA,
Erich
Ruffdogs.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Erich Enke <ep...@ruffdogs.com>.

Well, some of it is solved.  There are just so many variables to keep 
track of when getting this right.  One of the iterations (that I had 
previously done) had produced a working result, but I got a false 
negative because I hadn't removed the original G%E4steBuch, thinking 
that svn was trying to find the repository's GästeBuch for this very 
file.  Upon removing G%E4steBuch and doing a fresh merge, using just 
en_US (which BTW is the only one I could actually see the ä with.  UTF-8 
didn't work), it was able to svn cp the GästeBuch file for the merge 
with no problem.  I still have yet to know whether it was the 0xe4 or 
the 0xc3a4 version that was actually commitable.  But hope is on the 
horizon.

Apparently the %E4 is just a text-ish way of saying 0x00e4, which I 
guess svn is trying to interpret (in this instance wrongly) as a unicode 
character.  svn is making a   many-to-one-to-many  conversion of    
filename-to-unicode-to-repo_filename  which breaking for the %E4 version 
of the filename (since it eventually maps to the repo version of GästeBuch).

Let me stress again: The option to let alternative encodings, such as 
this '%E4', be escaped somehow should be added because these filenames 
are not necessarily meant for subversion.  In my case, it is phpwiki 
that is expecting these specific filenames, and subversion should just 
turn a blind eye.  I now have my workaround, but it will be VERY time 
ineffective for me.  I'll have to rename every file of this type and all 
of their references.  Maybe then phpwiki won't work with unicoded 
filenames.  Maybe that's why they were %'ed in the first place. 

I'm not trying to blame anyone.  I know how that sounds to volunteer 
ears.  I'm just trying to stress the importance of allowing escaped 
characters in filenames, such as letting a literal 'G%E4steBuch' pass as 
'G%E4steBuch'.

Thank you Patrick Smears for being the only one to attempt to help me. 

I'm still welcome to better solutions from anyone.  If anyone knows 
whether this escaped solution is indeed implemented, even in current 
versions, I would be very happy to hear any news along those lines.

Erich
Ruffdogs.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Patrick Smears <pa...@ensoft.co.uk>.

On Mon, 4 Oct 2004, Erich Enke wrote:

> Thinking that all might now be running smoothly, I try operations on 
> percent'ed filenames.  I can `svn add G%E4steBuch` just fine.  I can 
> `svn ci` it.  If I `svn ls` it directly (in the repository, specifying 
> even the filename), though, I get 'non-existent in that revision'.  
> Trying the merge-commit, I get 'File not found' for the hex sequence 47 
> e4 73 74 65 (I had to run it through hexdump to see what that character 
> was (it interpreted 'e4 73' as a box of some sort)) ... which, of 
> course, is the 0xe4 version of G%E4steBuch.
> 
> That is to say, filenames with percents still seem to be buggy.  This 
> should be reproducible.  Just touch a file as 'G%E4steBuch', add it, 
> commit it, then try ls'ing it, mv'ing it, rm'ing it, merging it into 
> another branch.  See if you have any problems.

I've tried this (svn ls <url-to-file-with-accents> with subversion 1.0.8
on FC2, and I get the same results as you. I agree that this is a bug. I 
believe the command

   svn ls $REPOS/G%E4steBuch

should list the file G{a-umlaut}steBuch, rather than giving a "can't 
recode string" error.

Patrick
-- 
The easy way to type accents in Windows: http://www.frkeys.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Erich Enke <ep...@ruffdogs.com>.

>>However, even though `locale charmap` says 'UTF-8', if I do:
>>echo ab | tr 'a' '\303' | tr 'b' '\244'
>>I get Ã¤    (Cap. A + superscript tilde,  and then something that looks 
>>like a misfigured pound sign).  That's not right.  I should get a 
>>lower-case a with hysteresis, I would think.
>>    
>>
>
>"locale charmep" shows what the environment variables in your shell are 
>telling your programs to use - i.e. how the programs that you run will 
>interpret and produce bytesequences. That needn't (sadly!) correspond to 
>the way your terminal window interprets those sequences when the programs 
>output them!
>
>The symbols that you're seeing correspond (possibly among other encodings)  
>to the characters mapped to 'c3' and 'a4' in the iso-8859-1 encoding. This
>would suggest that your terminal is interpreting the characters as
>iso-8859-1 (the default encoding in many situations).
>
>You may be able to start a UTF8 xterm with 'xterm -u8'.
>
>  
>
That was a good suggestion.  It gave me different results anyhow.

So, I start up an 'xterm -u8' and set LANG, LC_CLANG, LC_CTYPE, and 
LC_ALL to en_US.UTF-8.  After cleaning up the files left over from the 
other terminal, I can now print out 0xc3a4 and see an 
a-with-hysteresis!  That's correct.  Yay!

Thinking that all might now be running smoothly, I try operations on 
percent'ed filenames.  I can `svn add G%E4steBuch` just fine.  I can 
`svn ci` it.  If I `svn ls` it directly (in the repository, specifying 
even the filename), though, I get 'non-existent in that revision'.  
Trying the merge-commit, I get 'File not found' for the hex sequence 47 
e4 73 74 65 (I had to run it through hexdump to see what that character 
was (it interpreted 'e4 73' as a box of some sort)) ... which, of 
course, is the 0xe4 version of G%E4steBuch.

That is to say, filenames with percents still seem to be buggy.  This 
should be reproducible.  Just touch a file as 'G%E4steBuch', add it, 
commit it, then try ls'ing it, mv'ing it, rm'ing it, merging it into 
another branch.  See if you have any problems.

>As I'm sure you've realised, that's 'e4' (our troublesome friend :-), 
>  
>
Oh yes.  :-)  

>followed by "ste". So clearly the 'e4' is being taken as UTF-8 for some 
>reason.
>
>  
>
I'm glad you think so too.  It's nice to have some confirmation of my 
line of thought.  ;-)

>Another possibility is that since the terminal seems to be in iso-8859-1 
>mode, but the environment variables suggest you're using UTF8, that the 
>character isn't being affected at all, when it should in fact get 
>converted from iso-8859-1 to utf-8. (There may still be a bug here, by 
>this stage my head is spinning!)
>  
>
Yeah, but I just did an strace on the commit (within the old terminal, 
using en_US.UTF-8 variables), and I wouldn't expect to see the following 
if that were the case:

open("/usr/share/locale/en_US.UTF-8/LC_MEASUREMENT", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
mmap2(NULL, 23, PROT_READ, MAP_PRIVATE, 3, 0) = 0x4010e000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_TELEPHONE", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
mmap2(NULL, 59, PROT_READ, MAP_PRIVATE, 3, 0) = 0x4010f000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_ADDRESS", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=155, ...}) = 0
mmap2(NULL, 155, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40110000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_NAME", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=77, ...}) = 0
mmap2(NULL, 77, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40111000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_PAPER", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0
mmap2(NULL, 34, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40112000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=80, ...}) = 0
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/SYS_LC_MESSAGES", 
O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=52, ...}) = 0
mmap2(NULL, 52, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40113000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_MONETARY", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0
mmap2(NULL, 286, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40114000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_COLLATE", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=882134, ...}) = 0
mmap2(NULL, 882134, PROT_READ, MAP_PRIVATE, 3, 0) = 0x405fe000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_TIME", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=2451, ...}) = 0
mmap2(NULL, 2451, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40115000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_NUMERIC", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=54, ...}) = 0
mmap2(NULL, 54, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40116000
close(3)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_CTYPE", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=208464, ...}) = 0
mmap2(NULL, 208464, PROT_READ, MAP_PRIVATE, 3, 0) = 0x406d6000



If the file's contents are:

Date: Sat, 3 Feb 2002 17:28:00 -0500
Mime-Version: 1.0 (Produced by PhpWiki 1.3.3-jeffs-hacks)
X-Rcs-Id: $Id: G%E4steBuch,v 1.6 2002/03/02 22:32:06 carstenklapp Exp $
Content-Type: application/x-phpwiki;
  pagename=G%E4steBuch;
  flags="";
  charset=iso-8859-1
Content-Transfer-Encoding: binary

Would that make a difference?  Would the 'charset=iso-8859-1' be messing 
things up?

Here are some possibly-relevant portions of the stack trace:

\nle\0\0\1p\4\0\0011hhap\24\0\1(svn:executable 1 
*)c\0\0\1b\4\0\0011hh9bf\3\1((PhpWiki eky.22a.g0) (G\303\244steBuch 
hoj.22a.th) (VolltextSuche el9.22a.g0) (PhpWikiSystemverwalten 
el0.22a.g0) (G%E4steBuch hom.22a.tq) (BackLinks eki.22a.g0) 
(EditiereText eko.22a.g0)

e/de/G%E4steBuch hof.22a.sp modify 1 1 0 ).s[\0\1(change 54 
/SocialMPN/branches/wiki/phpwiki/locale/de/G%E4steBuch hof.22a.sp modify 
0  1 1)  W\0\1(change 54 
/SocialMPN/branches/wiki/phpwiki/locale/de/G%E4steBuch hof.22a.sp add 0  
0 )\2\0\2\0\1sp\0\0\0]\0\1(change 60 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G%E4steBuch eks.241.sn 
add 0  0 )_\0\1(change 59 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G\303\244steBuch 
eks.23r.sj delete 0  0 )\2\0\2\0\1sn\0\0\0`\0\1(change 60 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G%E4steBuch eks.22a.g0 
delete 0  0 )y\\\0\1(change 59 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G\303\244steBuch 
eks.23r.sj add 0  0 )\0\2\0\1sj\0\0\0R\0\1

(copy 60 /SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G%E4steBuch tb 
hoi.251.tc).\3\0\001251).E\0\1(soft-copy 35 
/trunk/SocialMPN/admin/settings.php 2 3z 10 
2q5.24c.sw)\3\0\00124ca.J\0\1(soft-copy 40 
/trunk/SocialMPN/admin/original/main.php 2 86 10 
2r0.24b.sw)cg.\3\0\00124b).D\0\1(copy 48 
/branches/RuffDogs/Oscar/mods/client/benefit.php r 
w0.2e.s)e\2\0\0012e\nDRF\0\1(copy 50 
/branches/RuffDogs/Oscar/mods/client/incidents.php r 
g9.2d.s)t(2\2\0\0012dT NE\0\1(copy 49 
/branches/RuffDogs/Oscar/mods/client/commserv.php r fy.2c.s)\2\0\0012c 
NOE\0\1(copy 49 /branches/RuffDogs/Oscar/mods/client/contacts.php r 
es.2b.s)\2\0\0012b\f<\fE\0\1(copy 49 
/branches/RuffDogs/Oscar/mods/client/learnhrs.php r 
d9.2a.s)\2\0\0012ase.B\0\1(copy 46 
/branches/RuffDogs/Oscar/lib/libaccess.inc.php r td.29.s)m 
I\2\0\00129resA\0\1(copy 45 
/branches/RuffDogs/Oscar/lib/libmysql.inc.php r 
n6.28.s)\2\0\00128k\223N@\0\1(copy 44 
/branches/RuffDogs/Oscar/lib/libmisc.inc.php r 
m6.27.s)\10\2\0\00127=\5\201C\0\1(copy 47 
/branches/RuffDogs/Oscar/config.inc.php.default r 
w8.26.s)\201^\2\0\00126\201\0WF\0\1(copy 50 
/branches/RuffDogs/Oscar/lib/menus/clients.inc.php r w6.25.s)) 
_\2\0\00125phpS\0\1(copy 59 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G\303\244steBuch sj 
eks.241.sn)g.\3\0\001241).C\0\1(copy 47 
/branches/RuffDogs/Oscar/lib/libdisplay.inc.php r 
vy.24.s)\4\207\2\0\00124\202\34\202T\0\1(copy 60 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G%E4steBuch sd 
eks.23r.sj).\3\0\00123r).B\0\1

.j \1\0\1u_\0\1(change 59 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G\303\244steBuch 
hoj.22a.th delete 0  0 )an\2\0\1tr

p\3\0\001242b S\0\1(copy 59 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G\303\244steBuch sj 
eks.241.sn)g.\3\0\001241).T\0\1(copy 60 
/SocialMPN/branches/wiki/phpwiki/locale/de/pgsrc/G%E4steBuch sd 
eks.23r.sj).\3\0\00123r).B\0\1

And others like them...

>It may be worth setting your environment variable to an iso-8859-1 locale
>- in that case the character you're typing *should* get converted to utf8;
>if not there's either a bug somewhere or a problem with the character 
>conversion libraries.
>  
>
Well, that's essentially what I started with.  `locale charmap` on en_US 
gives iso-8859-1.

Erich
Ruffdogs.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Patrick Smears <pa...@ensoft.co.uk>.

On Mon, 4 Oct 2004, Erich Enke wrote:

> I have a little more information now than previously.
> 
> The hex character that svn is complaining is a bad UTF sequence is the 
> e4.  That would make some sense, since the Unicode value for the a with 
> hysteresis is indeed 0x00e4. 
> 
> 0x00e4 in octal is \344, the one that the en_US commit says is missing.
> 
> When manually encoding 0x00e4 into UTF-8, I come up with:
> 0x00e4 = 000 1110 0100    ==>      110 000 11  10 10 0100 = 0xc3a4 = 
> \303 \244
> with the standard 110 and 10 prefixes.

Your conversion is correct:

printf "\xe4" | iconv -f iso-8859-1 -t UTF-8 | od -tx1
0000000 c3 a4

> However, even though `locale charmap` says 'UTF-8', if I do:
> echo ab | tr 'a' '\303' | tr 'b' '\244'
> I get Ã¤    (Cap. A + superscript tilde,  and then something that looks 
> like a misfigured pound sign).  That's not right.  I should get a 
> lower-case a with hysteresis, I would think.

"locale charmep" shows what the environment variables in your shell are 
telling your programs to use - i.e. how the programs that you run will 
interpret and produce bytesequences. That needn't (sadly!) correspond to 
the way your terminal window interprets those sequences when the programs 
output them!

The symbols that you're seeing correspond (possibly among other encodings)  
to the characters mapped to 'c3' and 'a4' in the iso-8859-1 encoding. This
would suggest that your terminal is interpreting the characters as
iso-8859-1 (the default encoding in many situations).

You may be able to start a UTF8 xterm with 'xterm -u8'.

> I tried checking in a file with that name, but when commiting the merge, 
> it doesn't recognize it as an a-with-hysteresis, even though I'm pretty 
> sure I got the octal right.  However, now I can't even remove that extra 
> file!  It says:
> 
> followed by invalid UTF-8 sequence
> (hex: e4 73 74 65)

As I'm sure you've realised, that's 'e4' (our troublesome friend :-), 
followed by "ste". So clearly the 'e4' is being taken as UTF-8 for some 
reason.

> It seems like I should have enough information to piece together what's 
> going on if I could just put it all together...
> 
> Trying svn remove on (with cosmetic spaces) 'G 0xe4 steBuch' and 'G 
> 0xc3a4 steBuch' both (I can hexdump the contents of the variables I am 
> using to hold these characters and confirm that I am indeed holding 0xe4 
> and 0xc3a4) yield the above 'invalid UTF-8 sequence', including the 'e4' 
> character.  So both UTF-16 (I think??) and UTF-8 are being converted to 
> UTF-16 (?) somewhere along the way, but that UTF-16 (?) char is being 
> interpreted as UTF-8 (0xe4 is indeed invalid UTF-8), which shouldn't be 
> happening.  This is sounding more and more like a bug to me.

Another possibility is that since the terminal seems to be in iso-8859-1 
mode, but the environment variables suggest you're using UTF8, that the 
character isn't being affected at all, when it should in fact get 
converted from iso-8859-1 to utf-8. (There may still be a bug here, by 
this stage my head is spinning!)

It may be worth setting your environment variable to an iso-8859-1 locale
- in that case the character you're typing *should* get converted to utf8;
if not there's either a bug somewhere or a problem with the character 
conversion libraries.

Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Erich Enke <ep...@ruffdogs.com>.

I have a little more information now than previously.

The hex character that svn is complaining is a bad UTF sequence is the 
e4.  That would make some sense, since the Unicode value for the a with 
hysteresis is indeed 0x00e4. 

0x00e4 in octal is \344, the one that the en_US commit says is missing.

When manually encoding 0x00e4 into UTF-8, I come up with:
0x00e4 = 000 1110 0100    ==>      110 000 11  10 10 0100 = 0xc3a4 = 
\303 \244
with the standard 110 and 10 prefixes.

However, even though `locale charmap` says 'UTF-8', if I do:
echo ab | tr 'a' '\303' | tr 'b' '\244'
I get Ã¤    (Cap. A + superscript tilde,  and then something that looks 
like a misfigured pound sign).  That's not right.  I should get a 
lower-case a with hysteresis, I would think.

I tried checking in a file with that name, but when commiting the merge, 
it doesn't recognize it as an a-with-hysteresis, even though I'm pretty 
sure I got the octal right.  However, now I can't even remove that extra 
file!  It says:

followed by invalid UTF-8 sequence
(hex: e4 73 74 65)

It seems like I should have enough information to piece together what's 
going on if I could just put it all together...

Trying svn remove on (with cosmetic spaces) 'G 0xe4 steBuch' and 'G 
0xc3a4 steBuch' both (I can hexdump the contents of the variables I am 
using to hold these characters and confirm that I am indeed holding 0xe4 
and 0xc3a4) yield the above 'invalid UTF-8 sequence', including the 'e4' 
character.  So both UTF-16 (I think??) and UTF-8 are being converted to 
UTF-16 (?) somewhere along the way, but that UTF-16 (?) char is being 
interpreted as UTF-8 (0xe4 is indeed invalid UTF-8), which shouldn't be 
happening.  This is sounding more and more like a bug to me.

Rats.  I was hoping figuring this out would give me ideas for a 
workaround.  Oh well.

Erich
Ruffdogs.com

Erich Enke wrote:

> Patrick Smears wrote:
>
>> On Mon, 4 Oct 2004, Erich Enke wrote:
>>
>>  
>>
>>> Note that any time I have done operations with UTF above, I have done:
>>> export LANG=UTF-8
>>> export LC_CLANG=UTF-8
>>> export LC_CTYPE=UTF-8
>>>   
>>
>>
>> On my system at least, I have to set "LANG" etc to something ending 
>> in ".utf8", for example....
>>
>>  export LANG=en_GB.utf8      # United Kingdom
>>  export LANG=de_DE.utf8      # Germany
>>
>>  
>>
> Thanks for the hint.  I wasn't quite doing that correctly.
>
> However, exporting LANG, LC_CLANG, LC_CTYPE, and LC_ALL as en_US.UTF-8 
> gives the same results as I had previously.
>
> Erich
> Ruffdogs.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Erich Enke <ep...@ruffdogs.com>.

Patrick Smears wrote:

>On Mon, 4 Oct 2004, Erich Enke wrote:
>
>  
>
>>Note that any time I have done operations with UTF above, I have done:
>>export LANG=UTF-8
>>export LC_CLANG=UTF-8
>>export LC_CTYPE=UTF-8
>>    
>>
>
>On my system at least, I have to set "LANG" etc to something ending in 
>".utf8", for example....
>
>  export LANG=en_GB.utf8      # United Kingdom
>  export LANG=de_DE.utf8      # Germany
>
>  
>
Thanks for the hint.  I wasn't quite doing that correctly.

However, exporting LANG, LC_CLANG, LC_CTYPE, and LC_ALL as en_US.UTF-8 
gives the same results as I had previously.

Erich
Ruffdogs.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: International Characters & Subversion 1.1.0 Problems

Posted by Patrick Smears <pa...@ensoft.co.uk>.

On Mon, 4 Oct 2004, Erich Enke wrote:

> Note that any time I have done operations with UTF above, I have done:
> export LANG=UTF-8
> export LC_CLANG=UTF-8
> export LC_CTYPE=UTF-8

On my system at least, I have to set "LANG" etc to something ending in 
".utf8", for example....

  export LANG=en_GB.utf8      # United Kingdom
  export LANG=de_DE.utf8      # Germany

... in order for the UTF8 encoding to be active. You can check which 
character encoding is active with the command:

  locale charmap

(of course, your distribution may be different...)

Patrick

-- 
The easy way to type accents in Windows: http://www.frkeys.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org