You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by pe...@arm.com on 2001/08/01 09:13:29 UTC

Re: Ascii/binary detection.

On 2001-08-01 00:33:21 Branko Čibej wrote:
>>`svn:line-ending'
>>
>>    If this property is present on a given non-binary file, its value
>>    is used to determine how line-endings should be translated.
>>
>>    Values for this can be:
>>
>>        'native'                - Use the line ending mechanism native
>>                                  to the user's operating system.
>>
>>        'dos', 'unix', or 'mac' - Use CRLF, LF, or LFCR, respectively.
>>
>I'm not sure what the correct 'mac' line ending is. Have to check that.

It's CR.

>There are (used to be?) systems where lines are delimited from both
>ends. On VMS, a line started with a LF and ended with a CR, IIRC. How
>about a more generic approach: the value of this property is a pair of
>strings, one for the BOL and one for the EOL marker. 'native' would
>still have the same meaning, while 'dos', 'unix' and 'mac' would be
>aliases for ':\r\n', ':\n' and ':\n\r' (or whatever), respectively. A
>VMS guy would make 'native' an alias for '\n:\r'.

It's probably best not to use "\n" and "\r" because "\n" is ambiguous.
To a Mac programmer, for instance, it means a CR, and to a Windows
programmer it means CRLF - maybe not in C, but certainly in Perl.
Stick to numeric values.

>(And someone porting SVN to the ZX Spectrum will define 'native' as
>':\r' -- then run out of memory when compiling neon :-)
>
>
>>    Absence of this property means that no line-ending substitution
>>    should occur at all.
>>
>Um. I'd rather use 'none' (':', if you accept the idea outlined above),
>and make 'native' the default for text files

Agreed - it's better to be explicit.

Another thought: don't assume a file is binary just because it doesn't
have any CR or LF characters! It might use the Unicode line separator
LS (2028) or paragraph separator PS (2029), Or even EBCDIC NEL,
which is in Unicode as 0085. This is all discussed at:

<http://www.unicode.org/unicode/reports/tr13/>

May I suggest LS as the repository's native newline character?


Peter.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ascii/binary detection.

Posted by Branko Čibej <br...@xbc.nu>.
peter.westlake@arm.com wrote:

>>There are (used to be?) systems where lines are delimited from both
>>ends. On VMS, a line started with a LF and ended with a CR, IIRC. How
>>about a more generic approach: the value of this property is a pair of
>>strings, one for the BOL and one for the EOL marker. 'native' would
>>still have the same meaning, while 'dos', 'unix' and 'mac' would be
>>aliases for ':\r\n', ':\n' and ':\n\r' (or whatever), respectively. A
>>VMS guy would make 'native' an alias for '\n:\r'.
>>
>
>It's probably best not to use "\n" and "\r" because "\n" is ambiguous.
>To a Mac programmer, for instance, it means a CR, and to a Windows
>programmer it means CRLF - maybe not in C, but certainly in Perl.
>Stick to numeric values.
>
This are Subversion properties, not string constants in your favourite 
programming language. We can define "\n" to always mean "\x0A", and it's 
a good mnemonic.


>Another thought: don't assume a file is binary just because it doesn't
>have any CR or LF characters! It might use the Unicode line separator
>LS (2028) or paragraph separator PS (2029), Or even EBCDIC NEL,
>which is in Unicode as 0085. This is all discussed at:
>
Until we can handle Unicode, EBCDIC, et al. natively, we'll have to 
treat them as binary.

><http://www.unicode.org/unicode/reports/tr13/>
>
>May I suggest LS as the repository's native newline character?
>
That would only make sense for Unicode., and we don't handle Unicode 
natively (yet), see above.

Right now, we'll only handle ASCII derivatives (that includes UTF-8). 
Recognizing EBCDIC would be nice, but I don't think any kind of 
heuristic will help us here: the user will have to say charset=EBCDIC 
(whereupon we ask: which dialect? :-). Or we could make that the default 
character set for text files where EBCDIC is the native single-byte 
encoding.

Whatever; all of this is post-M3, IMHO.

    Brane

-- 
Brane �ibej
    home:   <br...@xbc.nu>             http://www.xbc.nu/brane/
    work:   <br...@hermes.si>   http://www.hermes-softlab.com/
     ACM :   <br...@acm.org>            http://www.acm.org/




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org