You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Tim Armes <ta...@fr.imaje.com> on 2004/02/02 10:36:52 UTC

Character encodings

Hi,

I'm not sure if this is the right place to pose my question, but I'm hoping
that someone will be able to shed some light.

I use TortoiseSVN on Windows to commit my changes.  The message logs that I
use often include accented characters such as the acute e.  When I use the
Command prompt to retrieve the log, everthing is as expected.  There is,
therefore some sort of consistancy of character encodings.

Now, when I try to view the repository using ViewCVS, my accented characters
don't come out correctly at all. Looking in to the problem a little further,
I discover that the character code returned for the acute e is 0x82, which
is in fact a "low quote" under the Windows character encoding.

This is strange since the Windows encodings place this character at 0xE9.
Indeed, had it been E9, ViewCVS would have displayed it correctly since the
acute e is also at E9 under ISO 8859-1.

Can anyone explain what's going on.

Tim
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.F-Secure.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Character encodings

Posted by David Ripton <dr...@ripton.net>.
Tim Armes wrote:

> I use TortoiseSVN on Windows to commit my changes.  The message logs that I
> use often include accented characters such as the acute e.  When I use the
> Command prompt to retrieve the log, everthing is as expected.  There is,
> therefore some sort of consistancy of character encodings.
> 
> Now, when I try to view the repository using ViewCVS, my accented characters
> don't come out correctly at all. Looking in to the problem a little further,
> I discover that the character code returned for the acute e is 0x82, which
> is in fact a "low quote" under the Windows character encoding.
> 
> This is strange since the Windows encodings place this character at 0xE9.
> Indeed, had it been E9, ViewCVS would have displayed it correctly since the
> acute e is also at E9 under ISO 8859-1.
> 
> Can anyone explain what's going on.

On checkin, Subversion converts from your local encoding to UTF-8.  On 
checkout, it does the reverse.

ViewCVS is displaying the file as UTF-8, rather than translating it to 
your local encoding like the Subversion client does.

Unfortunately, your browser is not detecting that the file is UTF-8, and 
is trying to display it using the wrong encoding.  Plain ASCII 
characters work anyway because both encodings use the same values to 
represent them; accented characters do not.

You can work around this by telling your browser to use the UTF-8 
encoding before viewing a Subversion repository with ViewCVS.  If that's 
not good enough, and you control the repository, you can probably coerce 
either your web server or ViewCVS to set the content-type to "text/html; 
charset=utf-8" instead of just "text/html"

-- 
David Ripton    dripton@ripton.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org