You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2007/08/27 04:32:28 UTC

Quriky utf-8, BOM and svn:eol-style

I'd like to float an idea for a new feature to svn:eol-style native.
We understand 'native' to mean a local representation of text, and
many repositories do not retain the BOM (they might use an svnmailer,
or svn:mime-type to represent the encoding).

If the local utf-8 (or someday, utf-16le or utf-16be) file adds a
byte order mark, and the svn:eol-style is native, and the repository
copy contains no such mark, and they use a (new optional) config flag;

[miscellany]
bom-style-native = yes

could we kindly keep their BOM data to their local machine, and strip
it from the committed file?

If the remote file includes the BOM, of course we would want to retain
it.  (Does this mean re-adding it if a local edit strips it out?  Good
question to discuss.)

We've had allot of trouble with the same file maintained across various
platforms, and (for example) notepad/windows users pollute our repos
with their local byte order marks.  We'd like to see this behavior die :)

Comments?

Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quriky utf-8, BOM and svn:eol-style

Posted by Jack Repenning <jr...@collab.net>.
On Aug 26, 2007, at 9:32 PM, William A. Rowe, Jr. wrote:

> If the local utf-8 (or someday, utf-16le or utf-16be) file adds a
> byte order mark, and the svn:eol-style is native, and the repository
> copy contains no such mark, and they use a (new optional) config flag;
>
> [miscellany]
> bom-style-native = yes
>
> could we kindly keep their BOM data to their local machine, and strip
> it from the committed file?


Sounds good, but incomplete.  I agree that the industry situation  
(conflicting standards, popular non-compliant implementations, enough  
history to squelch any hope of convergence) is strikingly similar for  
BOMs as for line endings.  But it's not enough to protect non-BOM- 
users from the BOM, we must also provide a BOM for those who expect  
it: not only strip it from the commit, but add it to the checkout/ 
update.  I'm not clear why there should be a new config flag, but  
there might ought to be a new property: BOM behavior is not "eol- 
style," being not at the eol, even though the disagreeing platforms  
happen to line up in largely the same camps.  Perhaps renaming  
"svn:eol-style" to "svn:text-style" would cover both....

Tangentially, I think this is solely a UTF-8 problem, isn't it?  ISTR  
the 16le/be specs explicitly require the use of the BOM.  An  
implementation that does not do so would be non-compliant, and I  
don't really expect that out of, say, the Linux community.  The UTF-8  
problem is that the standards do not specify the use of a BOM, but MS  
does it anyway.



-==-
Jack Repenning
Chief Technology Officer
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
mobile: +1 408.835.8090
raindance: +1 877.326.2337, x844.7461
aim: jackrepenning
skype: jrepenning




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org