You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Erik Huelsmann <eh...@gmail.com> on 2008/04/01 22:51:45 UTC

[RFC] Unicode character encoding for other-than-filenames?

Working on the NFC/NFD awareness of the Subversion client, I soon
realised this issue also affects URLs.

However, many of the textual identifiers we use in the client (such as
changelists) are also UTF-8 encoded. How should Subversion behave if a
user sets a changelist with NFC encoded characters and later (somehow)
tries to retrieve that same changelist using NFD encoded characters?
Giving an error message with the changelist name will look strange to
the user: the changelist identifier looks exactly the same to the
user.

So, do we have to do Unicode-aware string comparison for
other-than-filename-identifiers? If so, which ones?

Bye,


Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [RFC] Unicode character encoding for other-than-filenames?

Posted by Branko Čibej <br...@xbc.nu>.
David Glasser wrote:
> On Wed, Apr 2, 2008 at 3:54 PM, Branko Čibej <br...@xbc.nu> wrote:
>   
>> Erik Huelsmann wrote:
>>
>>     
>>> Working on the NFC/NFD awareness of the Subversion client, I soon
>>> realised this issue also affects URLs.
>>>
>>> However, many of the textual identifiers we use in the client (such as
>>> changelists) are also UTF-8 encoded. How should Subversion behave if a
>>> user sets a changelist with NFC encoded characters and later (somehow)
>>> tries to retrieve that same changelist using NFD encoded characters?
>>> Giving an error message with the changelist name will look strange to
>>> the user: the changelist identifier looks exactly the same to the
>>> user.
>>>
>>> So, do we have to do Unicode-aware string comparison for
>>> other-than-filename-identifiers? If so, which ones?
>>>
>>>
>>>       
>>  We only have to normalize keys in the repository, which means filenames.
>> Everything else is either not indexed, or stored only locally in the WC. If
>> it's local, and the user or OS magically changes the normalization during
>> the lifetime of the WC ...
>>     
>
> Or, well, to expand: we only have to normalize keys in the repository
> (because they are shared by multiple clients), or things that we let
> the OS reinterpret for us (like filenames).
>   

Ah, indeed. I just conflated the two since we happen to use filenames in 
both contexts.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [RFC] Unicode character encoding for other-than-filenames?

Posted by David Glasser <gl...@davidglasser.net>.
On Wed, Apr 2, 2008 at 3:54 PM, Branko Čibej <br...@xbc.nu> wrote:
> Erik Huelsmann wrote:
>
> > Working on the NFC/NFD awareness of the Subversion client, I soon
> > realised this issue also affects URLs.
> >
> > However, many of the textual identifiers we use in the client (such as
> > changelists) are also UTF-8 encoded. How should Subversion behave if a
> > user sets a changelist with NFC encoded characters and later (somehow)
> > tries to retrieve that same changelist using NFD encoded characters?
> > Giving an error message with the changelist name will look strange to
> > the user: the changelist identifier looks exactly the same to the
> > user.
> >
> > So, do we have to do Unicode-aware string comparison for
> > other-than-filename-identifiers? If so, which ones?
> >
> >
>
>  We only have to normalize keys in the repository, which means filenames.
> Everything else is either not indexed, or stored only locally in the WC. If
> it's local, and the user or OS magically changes the normalization during
> the lifetime of the WC ...

Or, well, to expand: we only have to normalize keys in the repository
(because they are shared by multiple clients), or things that we let
the OS reinterpret for us (like filenames).

--dave


-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

Re: [RFC] Unicode character encoding for other-than-filenames?

Posted by Branko Čibej <br...@xbc.nu>.
Erik Huelsmann wrote:
> Working on the NFC/NFD awareness of the Subversion client, I soon
> realised this issue also affects URLs.
>
> However, many of the textual identifiers we use in the client (such as
> changelists) are also UTF-8 encoded. How should Subversion behave if a
> user sets a changelist with NFC encoded characters and later (somehow)
> tries to retrieve that same changelist using NFD encoded characters?
> Giving an error message with the changelist name will look strange to
> the user: the changelist identifier looks exactly the same to the
> user.
>
> So, do we have to do Unicode-aware string comparison for
> other-than-filename-identifiers? If so, which ones?
>   

We only have to normalize keys in the repository, which means filenames. 
Everything else is either not indexed, or stored only locally in the WC. 
If it's local, and the user or OS magically changes the normalization 
during the lifetime of the WC ...

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org