You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2006/02/09 23:40:46 UTC

Convert cmdline args to UTF-8 all at once at program start?

Paul Burba has just sent a patch to do native-to-UTF-8 conversion of 
command-line arguments right at the start of the program, before argument 
processing.  That's because our argument processing assumes an ASCII subset for 
all the strings it needs to match, but he's converting from EBCDIC.

So, is there any reason we can't do the native-to-UTF-8 conversion once and for 
all at program start, on all systems, and remove all the bits and pieces of 
conversion that we are presently applying at later stages?

It seems to me that this would make the program neater, because it would just 
be a single call, and more robust, because we wouldn't be able, as we are now, 
to forget to do the conversion for some arguments or parts thereof.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Convert cmdline args to UTF-8 all at once at program start?

Posted by Branko Čibej <br...@xbc.nu>.
Julian Foad wrote:
> Paul Burba has just sent a patch to do native-to-UTF-8 conversion of 
> command-line arguments right at the start of the program, before 
> argument processing.  That's because our argument processing assumes 
> an ASCII subset for all the strings it needs to match, but he's 
> converting from EBCDIC.
>
> So, is there any reason we can't do the native-to-UTF-8 conversion 
> once and for all at program start, on all systems, and remove all the 
> bits and pieces of conversion that we are presently applying at later 
> stages?
>
> It seems to me that this would make the program neater, because it 
> would just be a single call, and more robust, because we wouldn't be 
> able, as we are now, to forget to do the conversion for some arguments 
> or parts thereof.
+1.

And if we use apr_app_initialize instead of apr_initialize, we can even 
do away with this conversion on (NT-class) Windows, because APR will do 
that for us. :)

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Convert cmdline args to UTF-8 all at once at program start?

Posted by "C. Michael Pilato" <cm...@collab.net>.
Branko Čibej wrote:
> C. Michael Pilato wrote:
> 
>> Julian Foad wrote:
>>  
>>
>>> Paul Burba has just sent a patch to do native-to-UTF-8 conversion of
>>> command-line arguments right at the start of the program, before
>>> argument processing.  That's because our argument processing assumes an
>>> ASCII subset for all the strings it needs to match, but he's converting
>>> from EBCDIC.
>>>     
>>
>>
>> I've not seen Paul's change, but does this cause any misinteraction with
>> a commandline like:
>>
>>    svn commit -m <Shift-JIS log message> --encoding Shift-JIS ...
>>
>> Seems a native-to-UTF-8 conversion of that log message is likely to fail
>> unless Shift-JIS (in this case) is the native encoding.
>>   
> 
> Doesn't --encoding only affect log messages pulled from files with -F? I
> can't imagine how mixed encodings on the command line would work in
> general.

How would they work?  As expected, of course:

$ svnadmin create repos
$ svn mkdir file://`pwd`/repos/foo \
            -m `cat ~/misc/i18n-data/shift-jis.txt`
subversion/libsvn_subr/utf.c:555: (apr_err=22)
svn: Valid UTF-8 data
(hex: 73 68 69 66 74 2d 6a 69 73 2d)
followed by invalid UTF-8 sequence
(hex: 83 63 81 5b)
$ svn mkdir file://`pwd`/repos/foo \
            -m `cat ~/misc/i18n-data/shift-jis.txt` \
             --encoding Shift-JIS
Committed revision 1.
$ svn log file://`pwd`/repos
------------------------------------------------------------------------
r1 | cmpilato | 2006-02-10 09:50:40 -0500 (Fri, 10 Feb 2006) | 1 line

shift-jis-ツールバー
------------------------------------------------------------------------
$


-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: Convert cmdline args to UTF-8 all at once at program start?

Posted by Branko Čibej <br...@xbc.nu>.
C. Michael Pilato wrote:
> Julian Foad wrote:
>   
>> Paul Burba has just sent a patch to do native-to-UTF-8 conversion of
>> command-line arguments right at the start of the program, before
>> argument processing.  That's because our argument processing assumes an
>> ASCII subset for all the strings it needs to match, but he's converting
>> from EBCDIC.
>>     
>
> I've not seen Paul's change, but does this cause any misinteraction with
> a commandline like:
>
>    svn commit -m <Shift-JIS log message> --encoding Shift-JIS ...
>
> Seems a native-to-UTF-8 conversion of that log message is likely to fail
> unless Shift-JIS (in this case) is the native encoding.
>   
Doesn't --encoding only affect log messages pulled from files with -F? I 
can't imagine how mixed encodings on the command line would work in general.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Convert cmdline args to UTF-8 all at once at program start?

Posted by "C. Michael Pilato" <cm...@collab.net>.
Julian Foad wrote:
> Paul Burba has just sent a patch to do native-to-UTF-8 conversion of
> command-line arguments right at the start of the program, before
> argument processing.  That's because our argument processing assumes an
> ASCII subset for all the strings it needs to match, but he's converting
> from EBCDIC.

I've not seen Paul's change, but does this cause any misinteraction with
a commandline like:

   svn commit -m <Shift-JIS log message> --encoding Shift-JIS ...

Seems a native-to-UTF-8 conversion of that log message is likely to fail
unless Shift-JIS (in this case) is the native encoding.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand