You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Marcin Kasperski <Ma...@softax.com.pl> on 2004/02/19 14:25:28 UTC

Client properties, checkout/update hook, encoding...

Seems my remarks about subversion book went to this list, so 
let's continue with more general thoughts. As short intro: I am 
currently using and co-administering CVS repository (being a 
person who made my organization to use CVS that 4 or 5 years 
ago), currently I watch subversion and slowly consider whether 
it would be good move for us...

Let me first describe the problem we currently have. I am Polish 
and that means in particular that in the texts I write I use 
some national characters. Those characters are defined within 
iso-8859-2 encoding and used so on all Unix, Linux, VMS, Mac and 
VMS platforms. But our friends from Microsoft decided to create 
and use so called win-1250 encoding where those national 
characters are placed somewhere else. Now, we are developing 
some cross-platform libraries and software, co-developed by 
people working on Windows and Linux/Unix/VMS. An this results in 
the mess - whoever writes a comment, readme, whatever, uses his/
her natural character encoding, people working on the other 
platform see this as a strange chars. We are routinely 
converting all that to iso-8859-2 but this means Windows people 
always see this wrong.
BTW: this problem has some similarity to the famous CR vs CR/LF..

I think I see the fairly natural solution which subversion could 
implement to help solving such a problem. What I need is
an ability for pre-commit hook to change the modification text, 
additional checkout/update hook and an ability to bind some kind 
of property to the client (sandbox) and make it available to the 
hook. I imagine it so:
a) For every textual file I define some kind of property (say 
'natural-encoding') which tells what should be the natural 
(repository) file encoding. Maybe some commit hook verifies 
whether this is set but this is not so important. Maybe natural 
encoding is always UTF-8 and need not the property.
b) In pre-commit hook I convert the file between the encodings in 
case the client encoding differs from the natural encoding of 
this file (of course only for the files which have the property 
activating the whole mechanism, this is not good idea to do it 
for Word docs). Here are the two needed subversion features: the 
hook needs the info which encoding the sandbox is using (some 
kind of sandbox property forwarded to the server while 
commiting) and an ability to modify the changes being commited 
by the pre-commit hook. 
c) Similarly, some update/checkout hook would convert opposite 
way. Here one need to have such a hook at all, to give to it the 
client property and to influence the file body.

This way it seems possible that each sandbox will use its natural 
characters encoding in the way similar to using its own end of 
line mark.

What do you thing about such a idea? Or is there something else?

By the way: I think that sandbox properties, checkout/update hook 
and data modification in hooks could have more usage than the 
character conversion. As a quick example for the first two, 
sandbox marked with 'official build' property could allow only 
checkouts from tags directory...




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Client properties, checkout/update hook, encoding...

Posted by Ben Collins-Sussman <su...@collab.net>.
On Thu, 2004-02-19 at 09:05, Tobias Ringström wrote:

> No it doesn't. The commit log messages are handled that way, but not the 
> file contents.

Correct.  The repository stores all paths and commit logs in UTF8, but
doesn't *ever* change file contents.  The repository treats file
contents as a pure bytestream.  The only thing that ever changes file
contents is the working copy, which might do EOL translation or keyword
substitution.

It sounds like Marcin is asking for a 3rd type of working-copy
translation, one which does charset translation.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Client properties, checkout/update hook, encoding...

Posted by Tobias Ringström <to...@ringstrom.mine.nu>.
Francois Beausoleil wrote:

>Hi !
>
>Subversion already does everything you wrote about.  Without needing pre
>commit hooks or anything.
>
>The files on the server are always encoded as UTF-8, and are transported
>this way on the wire.  When the WC is updated, Subversion decodes the
>UTF-8 and translates it to the currently selected platform encoding.  The
>reverse is done when the file is committed.
>  
>
No it doesn't. The commit log messages are handled that way, but not the 
file contents.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Client properties, checkout/update hook, encoding...

Posted by Francois Beausoleil <fb...@users.sourceforge.net>.
Hi !

Subversion already does everything you wrote about.  Without needing pre
commit hooks or anything.

The files on the server are always encoded as UTF-8, and are transported
this way on the wire.  When the WC is updated, Subversion decodes the
UTF-8 and translates it to the currently selected platform encoding.  The
reverse is done when the file is committed.

Hope that helps !
François

On Thu, 19 Feb 2004 15:25:28 +0100, "Marcin Kasperski"
<Ma...@softax.com.pl> said:
> Seems my remarks about subversion book went to this list, so 
> let's continue with more general thoughts. As short intro: I am 
> currently using and co-administering CVS repository (being a 
> person who made my organization to use CVS that 4 or 5 years 
> ago), currently I watch subversion and slowly consider whether 
> it would be good move for us...
> 
> Let me first describe the problem we currently have. I am Polish 
> and that means in particular that in the texts I write I use 
> some national characters. Those characters are defined within 
> iso-8859-2 encoding and used so on all Unix, Linux, VMS, Mac and 
> VMS platforms. But our friends from Microsoft decided to create 
> and use so called win-1250 encoding where those national 
> characters are placed somewhere else. Now, we are developing 
> some cross-platform libraries and software, co-developed by 
> people working on Windows and Linux/Unix/VMS. An this results in 
> the mess - whoever writes a comment, readme, whatever, uses his/
> her natural character encoding, people working on the other 
> platform see this as a strange chars. We are routinely 
> converting all that to iso-8859-2 but this means Windows people 
> always see this wrong.
> BTW: this problem has some similarity to the famous CR vs CR/LF..
> 
> I think I see the fairly natural solution which subversion could 
> implement to help solving such a problem. What I need is
> an ability for pre-commit hook to change the modification text, 
> additional checkout/update hook and an ability to bind some kind 
> of property to the client (sandbox) and make it available to the 
> hook. I imagine it so:
> a) For every textual file I define some kind of property (say 
> 'natural-encoding') which tells what should be the natural 
> (repository) file encoding. Maybe some commit hook verifies 
> whether this is set but this is not so important. Maybe natural 
> encoding is always UTF-8 and need not the property.
> b) In pre-commit hook I convert the file between the encodings in 
> case the client encoding differs from the natural encoding of 
> this file (of course only for the files which have the property 
> activating the whole mechanism, this is not good idea to do it 
> for Word docs). Here are the two needed subversion features: the 
> hook needs the info which encoding the sandbox is using (some 
> kind of sandbox property forwarded to the server while 
> commiting) and an ability to modify the changes being commited 
> by the pre-commit hook. 
> c) Similarly, some update/checkout hook would convert opposite 
> way. Here one need to have such a hook at all, to give to it the 
> client property and to influence the file body.
> 
> This way it seems possible that each sandbox will use its natural 
> characters encoding in the way similar to using its own end of 
> line mark.
> 
> What do you thing about such a idea? Or is there something else?
> 
> By the way: I think that sandbox properties, checkout/update hook 
> and data modification in hooks could have more usage than the 
> character conversion. As a quick example for the first two, 
> sandbox marked with 'official build' property could allow only 
> checkouts from tags directory...
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
Developer of Java Gui Builder
http://jgb.sourceforge.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org