You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org> on 2020/09/18 20:26:30 UTC

svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Hi,

While I tried to make an example that escape_path() does not work as
expected in specific locale such as ja_JP.SJIS or CP932 (suggested by
Jun), I found it seems svn_cmdline__edit_file_externally() always 
passes the file name as UTF-8 even if the LC_CTYPE is other than
UTF-8.

I think it is also need svn_path_cstring_from_utf8() conversion for
file_name in svn_cmdline__edit_file_externally().

Here is a reproucing script. (Using ja_JP.UTF8 and ja_JP.SJIS locale).
[[[
#!/bin/sh
# assuming UTF-8 encoding in this file

testdir=/tmp/svn-conflict-edit-filename-test

if [ ! -d ${testdir} ]; then
  mkdir -p ${testdir}
fi

reposdir=${testdir}/testrepo
reposurl=file://${reposdir}

svnadmin create ${reposdir}
cat > ${testdir}/record_filename.sh <<EOF
#!/bin/sh
LC_CTYPE=C; export LC_CTYPE
echo \$* > ${testdir}/svn-conflict-edit-file-name.txt
exit 0
EOF

LC_CTYPE=ja_JP.UTF-8 ; export LC_CTYPE

# add a file "予定表.txt" (it means schedule in Japanese)
# in UTF-8 working copy.
# "予定表.txt" represented in hex are followings:
#    e4 ba 88 e5 ae 9a e8 a1 a8 2e 74 78 74 (UTF-8)
#    97 5c 92 e8 95 5c 2e 74 78 74          (SJIS; contains two '\'== 0x5c)
#    cd bd c4 ea 9c bd 2e 74 78 74          (EUC-JP)
schedfn_utf8="予定表.txt"
schedfn_sjis=`echo ${schedfn_utf8} | iconv -f utf-8 -t sjis`
schedfn_eucjp=`echo ${schedfn_utf8} | iconv -f utf-8 -t euc-jp`

svn checkout ${reposurl} ${testdir}/wc-utf-8
cd ${testdir}/wc-utf-8

cat > ${schedfn_utf8} <<EOF
2020/09/19 foo
EOF

svn add ${schedfn_utf8}
svn commit -m 'add schedule memo.'

# prepare SJIS locale wc.
(LC_CTYPE=ja_JP.SJIS; export LC_CTYPE ; \
    svn checkout $reposurl ${testdir}/wc-sjis)

# update the file in UTF-8 wc and commit it
cat >> ${schedfn_utf8} <<EOF
2020/09/20 bar
EOF
svn commit -m 'add schedule at 2020/09/20' 

# add local modification in SJIS wc
LC_CTYPE=ja_JP.SJIS ; export LC_CTYPE
cd ${testdir}/wc-sjis
cat >> ${schedfn_sjis} <<EOF
2020/09/21 baz
EOF

svn update --force-interactive --accept edit \
  --editor-cmd "/bin/sh ${testdir}/record_filename.sh"

LC_CTYPE=C ; export LC_CTYPE
ls | od -t x1
od -t x1 ${testdir}/svn-conflict-edit-file-name.txt
]]]

The last 2 lines in this script makes hexdump of conflict file names,
actual and passed to the editor command. Out put of those lines
should be same, however, I got below:

[[[
Checked out revision 0.
A         予定表.txt
Adding         予定表.txt
Transmitting file data .done
Committing transaction...
Committed revision 1.
A    /tmp/svn-conflict-edit-filename-test/wc-sjis/�\��\.txt
Checked out revision 1.
Sending        予定表.txt
Transmitting file data .done
Committing transaction...
Committed revision 2.
Updating '.':
C    �\��\.txt
Updated to revision 2.
Merge conflicts in '�\��\.txt' marked as resolved.
Summary of conflicts:
  Text conflicts: 0 remaining (and 1 already resolved)
0000000    97  5c  92  e8  95  5c  2e  74  78  74  0a                    
0000013
0000000    e4  ba  88  e5  ae  9a  e8  a1  a8  2e  74  78  74  0a        
0000016
]]]

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Wed, 14 Oct 2020 13:30 +0900:
> On 2020/10/08 13:58, Yasuhito FUTATSUKI wrote:
> > Then I'll try to resolve it by the way you suggest.  
> 
⋮
> 
> log message:
> [[[
> Follow up to r1882234,r1882235,r1882313: Fix file name encoding issue
> when invoking editor on Windows.
> 
⋮
> Suggested by jun66j5 (editor return value in find_editor_binary)
> ]]]
> 
> However, I still have some considerations:
⋮
> Thanks,

Sorry, I don't have anything to say about the actual code change.

Just one syntax nitpick: contribulyzer parentheticals should be on
a separate line, otherwise they get treated as part of the author's name,
as in https://www.red-bean.com/svnproject/contribulyzer/detail/brane%28indentation%29.html.
(I made the same mistake myself crediting astieger.)

That tends to matter more when crediting partial committers or
non-committers, since then it splits their credits into multiple
contribulyzer entries.  (E.g., Doug Robinson has three entries.)

Cheers,

Daniel

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/10/08 13:58, Yasuhito FUTATSUKI wrote:
> Hi,
> 
> On 2020/10/08 13:12, Jun Omae wrote:
>> Hi all,
>>
>> On Sun, Oct 4, 2020 at 11:33 PM Yasuhito FUTATSUKI <fu...@yf.bsdclub.org> wrote:
>>> I commited the v3 patch in 1882234.
>>>
>>> I confirmed that on Windows in CP932, both of with and without
>>> the v3 patch, it cannot pass the file name to edit.
>>>
>>> For Windows I confirmed that it can be fixed by using _wsystem()
>>> with WCHAR string, so I commited it in 1882235.
>>
>> I found a minor issue on Windows after r1882235.
>>
>> When the editor command (via SVN_EDITOR environment and --editor-cmd option) has
>> non-ascii characters, it is unable to launch the editor.
>>
>> I think we should convert the editor command to UTF-16 encoding before passing it
>> to _wsystem(), otherwise r1882235 should be reverted.
> 
> I confirmed it is a regression that introduced in r1882235. So I reverted it
> in 1882313.
> 
> Then I'll try to resolve it by the way you suggest.

With the patch attached, I could confirm that I can invoke editor command
even if it contains non-ascii character and/or ascii space in path via
SVN_EDITOR environment variable, --editor-cmd option, and config file
setting (saved in active sytem code page). Especially with environment
variable, Unicode path which cannot represent with active code page can
be used. (I confirmed it on Windows, with VS2019 build).

log message:
[[[
Follow up to r1882234,r1882235,r1882313: Fix file name encoding issue
when invoking editor on Windows.

* subversion/libsvn_subr/cmdline.c
  (): include apr_env.h for apr_env_get
  (find_editor_binary):
    - Change the encoding to set to EDITOR, from active code page to
      UTF-8 on Windows.
      - Transcode editor_cmd to UTF-8.
      - Use apr_env_get() instead of getenv().
      - Transcode config setting for SVN_CONFIG_OPTION_EDITOR_CMD.
    - Add pool argument for transcoding and getting environment variables.
  (svn_cmdline_edit_file_externally, svn_cmdline_edit_string_externally):
    - Use _wsystem() instead of system() on Windows environment.
    - Enclose whole command strings with double quotes[1] in _wsystem().

[1] https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/cmd


Suggested by jun66j5 (editor return value in find_editor_binary)
]]]

However, I still have some considerations:

* Compatibility on use of _wsystem(). I used it for all Win32 environment,
  but isn't it compatibility issue here ?

* Consistency. It seems we use active system code page as encoding of
  native C string, but I use UTF-8 not to break file path.

* There is no way to pass those paths which cannot be represented in
  active codepage, from command line. It is because we use main()
  for program entry point, and its argv is transcoded to active code page
  by C run time library before main() is called. If we use _wmain(),
  we can obtain native UTF-16 argv, or if we use APR's libaprapp,
  we can use argv, transcoded from UTF-16 to UTF-8.

* Embeding native C string in error message. If system() or _wsystem()
  causes error, we make error message from "cmd" argument passed to
  system without transcode, and then we'll see the message that
  incorrectly transcoded as UTF-8 to native C encoding.

Thanks,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
Hi,

On 2020/10/08 13:12, Jun Omae wrote:
> Hi all,
> 
> On Sun, Oct 4, 2020 at 11:33 PM Yasuhito FUTATSUKI <fu...@yf.bsdclub.org> wrote:
>> I commited the v3 patch in 1882234.
>>
>> I confirmed that on Windows in CP932, both of with and without
>> the v3 patch, it cannot pass the file name to edit.
>>
>> For Windows I confirmed that it can be fixed by using _wsystem()
>> with WCHAR string, so I commited it in 1882235.
> 
> I found a minor issue on Windows after r1882235.
> 
> When the editor command (via SVN_EDITOR environment and --editor-cmd option) has
> non-ascii characters, it is unable to launch the editor.
> 
> I think we should convert the editor command to UTF-16 encoding before passing it
> to _wsystem(), otherwise r1882235 should be reverted.

I confirmed it is a regression that introduced in r1882235. So I reverted it
in 1882313.

Then I'll try to resolve it by the way you suggest.

Thanks,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Jun Omae <ju...@gmail.com>.
Hi all,

On Sun, Oct 4, 2020 at 11:33 PM Yasuhito FUTATSUKI <fu...@yf.bsdclub.org> wrote:
> I commited the v3 patch in 1882234.
>
> I confirmed that on Windows in CP932, both of with and without
> the v3 patch, it cannot pass the file name to edit.
>
> For Windows I confirmed that it can be fixed by using _wsystem()
> with WCHAR string, so I commited it in 1882235.

I found a minor issue on Windows after r1882235.

When the editor command (via SVN_EDITOR environment and --editor-cmd option) has
non-ascii characters, it is unable to launch the editor.

I think we should convert the editor command to UTF-16 encoding before passing it
to _wsystem(), otherwise r1882235 should be reverted.


[[[
C> subversion\svnadmin\svnadmin.exe create C:\usr\tmp\svnrepos

C> mkdir C:\usr\tmp\エディタ

C> copy nul C:\usr\tmp\エディタ\editor-stub.cmd

C> set SVN_EDITOR=C:\usr\tmp\エディタ\editor-stub.cmd

C> subversion\svn\svn.exe propedit svn:ignore file:///C:/usr/tmp/svnrepos
The system cannot find the path specified.
svn: E200012: system('C:\usr\tmp\�G�f�B�^\editor-stub.cmd "svn-prop.tmp"') returned 1

C> subversion\svn\svn.exe propedit --editor-cmd=C:\usr\tmp\エディタ\editor-stub.cmd svn:ignore file:///C:/usr/tmp/svnrepos
The system cannot find the path specified.
svn: E200012: system('C:\usr\tmp\�G�f�B�^\editor-stub.cmd "svn-prop.tmp"') returned 1

C> set SVN_EDITOR=

C> subversion\svn\svn.exe propedit --editor-cmd=C:\usr\tmp\エディタ\editor-stub.cmd svn:ignore file:///C:/usr/tmp/svnrepos
The system cannot find the path specified.
svn: E200012: system('C:\usr\tmp\�G�f�B�^\editor-stub.cmd "svn-prop.tmp"') returned 1

C> subversion\svn\svn.exe propedit --editor-cmd=C:\usr\tmp\エディタ\editor-stubx.cmd svn:ignore file:///C:/usr/tmp/svnrepos
The system cannot find the path specified.
svn: E200012: system('C:\usr\tmp\�G�f�B�^\editor-stubx.cmd "svn-prop.tmp"') returned 1

C> subversion\svn\svn.exe propedit --editor-cmd=C:\usr\tmp\エディタ\editor-stub.cmd svn:ignore file:///C:/usr/tmp/svnrepos
The system cannot find the path specified.
svn: E200012: system('C:\usr\tmp\�G�f�B�^\editor-stub.cmd "svn-prop.tmp"') returned 1
]]]

-- 
Jun Omae <ju...@gmail.com> (大前 潤)

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/09/22 7:55, Yasuhito FUTATSUKI wrote:
> On 2020/09/21 5:27, Stefan Sperling wrote:
>> On Mon, Sep 21, 2020 at 01:18:54AM +0900, Yasuhito FUTATSUKI wrote:
>>> On 2020/09/20 23:53, Yasuhito FUTATSUKI wrote:

>>> Parhaps even with this patch, it may not work on Windows system other than
>>> UTF-8 code page, because svn_path_cstring_from_utf8() returns just same
>>> string as input and system() will interpret with system active code page.
>>> (I don't know Windows so much, so I leave it.)
>>
>> I don't know Windows either.
>>
>> Perhaps someone else reading this could test the patch?
> 
> The patch itself affect nothing on Windows and macOS environment
> because svn_path_cstring_from_utf8() returns just same string as a
> input string. So if there is the same issue on Windows environment,
> it is need another patch to fix it. So I'll commit this patch if
> it is OK on Linux and Unix like OS.

I commited the v3 patch in 1882234.

I confirmed that on Windows in CP932, both of with and without
the v3 patch, it cannot pass the file name to edit.

For Windows I confirmed that it can be fixed by using _wsystem()
with WCHAR string, so I commited it in 1882235.

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/09/21 5:27, Stefan Sperling wrote:
> On Mon, Sep 21, 2020 at 01:18:54AM +0900, Yasuhito FUTATSUKI wrote:
>> On 2020/09/20 23:53, Yasuhito FUTATSUKI wrote:
>>> On 2020/09/20 23:41, Stefan Sperling wrote:
>>>> On Sun, Sep 20, 2020 at 11:06:21PM +0900, Yasuhito FUTATSUKI wrote:
>>>>> On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:
>>
>>>> It looks like apr_escape_shell() will escape 0x5c (backslash, \) when
>>>> looking for ASCII character bytes. When 0x5c is escaped this breaks the
>>>> SJIS-encoded byte sequence.
>>>
>>> Yes, it is right.
>>>
>>>> Have you already tried always using escape_path() on the UTF-8 version of
>>>> the string, and then converting the escaped path to the locale's encoding?
>>>> In other words: First use escape_path, then use svn_path_cstring_from_utf8?
>>>> Perhaps that will make SJIS work?
>>>
>>> Ah, I didn't make sense. I'll try and then post a new patch.
>>> Thank you very much!
>>
>> I've tried and it works on SJIS environment. Thanks.
>>
>> I attached an updated patch. (fix-edit-file-externally-patch-v2.txt)
>> [[[
>> Fix file name to edit from utf8 to local style.
>>
>> * subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
>>   Apply svn_path_cstring_from_utf8() to target file name after escape_shell().
>>
>> Suggested by: stsp (applying order of file name conversion)  
>> ]]]
>>
> 
> Yes, looks good to me. Thank you :)

I looked subversion/libsvn_subr/cmdline.c over again, then I was aware
that svn_cmdline__edit_string_externally() has also the issue of
applying order potentially.

Current use of svn_cmdline__edit_string_externally() is by
subversion/svn/util.c, subversion/svn/propedit-cmd.c, and 
subversion/svnmucc/svnmucc.c. They pass "svn-commit", "svn-prop" and
"svnmucc-commit" for prefix(filename), and svn_io_open_uniquely_named
add only ".tmp" as suffix and numbers, so it is not need to apply
escape_path(). However, the description of 
svn_cmdline__edit_string_externally in svn_cmdline_prvate.h does not
restrict character class for PREFIX, so I think it is good to fix
the issue.

So I updated the patch again. (fix-edit-file-externally-patch-v3.txt)

[[[
Fix file name to edit from utf8 to local style.

* subversion/libsvn_subr/cmdline.c 
  (svn_cmdline__edit_file_externally):
  Apply svn_path_cstring_from_utf8() to target file name after escape_shell().
  (svn_cmdline__edit_string_externally):
  Fix order to apply svn_path_cstring_from_utf8() and escape_shell().

Suggested by: stsp (applying order of file name conversion)  
]]]
 
>> Parhaps even with this patch, it may not work on Windows system other than
>> UTF-8 code page, because svn_path_cstring_from_utf8() returns just same
>> string as input and system() will interpret with system active code page.
>> (I don't know Windows so much, so I leave it.)
> 
> I don't know Windows either.
> 
> Perhaps someone else reading this could test the patch?

The patch itself affect nothing on Windows and macOS environment
because svn_path_cstring_from_utf8() returns just same string as a
input string. So if there is the same issue on Windows environment,
it is need another patch to fix it. So I'll commit this patch if
it is OK on Linux and Unix like OS.

My reproduction script was only for Linux/Unix, but it is not
difficult to reproduce the issue, just make textual context in file
which file name has different representation in UTF-8 and in system
active code page, and select resolution action 'edit'.
If the editor can open the file of right name, there is no problem,
else there is a problem.

Thanks,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Sep 21, 2020 at 01:18:54AM +0900, Yasuhito FUTATSUKI wrote:
> On 2020/09/20 23:53, Yasuhito FUTATSUKI wrote:
> > On 2020/09/20 23:41, Stefan Sperling wrote:
> >> On Sun, Sep 20, 2020 at 11:06:21PM +0900, Yasuhito FUTATSUKI wrote:
> >>> On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:
> 
> >> It looks like apr_escape_shell() will escape 0x5c (backslash, \) when
> >> looking for ASCII character bytes. When 0x5c is escaped this breaks the
> >> SJIS-encoded byte sequence.
> > 
> > Yes, it is right.
> > 
> >> Have you already tried always using escape_path() on the UTF-8 version of
> >> the string, and then converting the escaped path to the locale's encoding?
> >> In other words: First use escape_path, then use svn_path_cstring_from_utf8?
> >> Perhaps that will make SJIS work?
> > 
> > Ah, I didn't make sense. I'll try and then post a new patch.
> > Thank you very much!
> 
> I've tried and it works on SJIS environment. Thanks.
> 
> I attached an updated patch. (fix-edit-file-externally-patch-v2.txt)
> [[[
> Fix file name to edit from utf8 to local style.
> 
> * subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
>   Apply svn_path_cstring_from_utf8() to target file name after escape_shell().
> 
> Suggested by: stsp (applying order of file name conversion)  
> ]]]
> 

Yes, looks good to me. Thank you :)

> Parhaps even with this patch, it may not work on Windows system other than
> UTF-8 code page, because svn_path_cstring_from_utf8() returns just same
> string as input and system() will interpret with system active code page.
> (I don't know Windows so much, so I leave it.)

I don't know Windows either.

Perhaps someone else reading this could test the patch?

> Index: subversion/libsvn_subr/cmdline.c
> ===================================================================
> --- subversion/libsvn_subr/cmdline.c	(revision 1881848)
> +++ subversion/libsvn_subr/cmdline.c	(working copy)
> @@ -1400,6 +1400,7 @@
>                                    apr_pool_t *pool)
>  {
>    const char *editor, *cmd, *base_dir, *file_name, *base_dir_apr;
> +  const char *file_name_local;
>    char *old_cwd;
>    int sys_err;
>    apr_status_t apr_err;
> @@ -1423,9 +1424,11 @@
>      return svn_error_wrap_apr
>        (apr_err, _("Can't change working directory to '%s'"), base_dir);
>  
> +  SVN_ERR(svn_path_cstring_from_utf8(&file_name_local, 
> +                                     escape_path(pool, file_name), pool));
>    /* editor is explicitly documented as being interpreted by the user's shell,
>       and as such should already be quoted/escaped as needed. */
> -  cmd = apr_psprintf(pool, "%s %s", editor, escape_path(pool, file_name));
> +  cmd = apr_psprintf(pool, "%s %s", editor, file_name_local);
>    sys_err = system(cmd);
>  
>    apr_err = apr_filepath_set(old_cwd, pool);


Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/09/20 23:53, Yasuhito FUTATSUKI wrote:
> On 2020/09/20 23:41, Stefan Sperling wrote:
>> On Sun, Sep 20, 2020 at 11:06:21PM +0900, Yasuhito FUTATSUKI wrote:
>>> On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:

>> It looks like apr_escape_shell() will escape 0x5c (backslash, \) when
>> looking for ASCII character bytes. When 0x5c is escaped this breaks the
>> SJIS-encoded byte sequence.
> 
> Yes, it is right.
> 
>> Have you already tried always using escape_path() on the UTF-8 version of
>> the string, and then converting the escaped path to the locale's encoding?
>> In other words: First use escape_path, then use svn_path_cstring_from_utf8?
>> Perhaps that will make SJIS work?
> 
> Ah, I didn't make sense. I'll try and then post a new patch.
> Thank you very much!

I've tried and it works on SJIS environment. Thanks.

I attached an updated patch. (fix-edit-file-externally-patch-v2.txt)
[[[
Fix file name to edit from utf8 to local style.

* subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
  Apply svn_path_cstring_from_utf8() to target file name after escape_shell().

Suggested by: stsp (applying order of file name conversion)  
]]]

Parhaps even with this patch, it may not work on Windows system other than
UTF-8 code page, because svn_path_cstring_from_utf8() returns just same
string as input and system() will interpret with system active code page.
(I don't know Windows so much, so I leave it.)

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/09/20 23:41, Stefan Sperling wrote:
> On Sun, Sep 20, 2020 at 11:06:21PM +0900, Yasuhito FUTATSUKI wrote:
>> On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:
>>> Hi,
>>>
>>> While I tried to make an example that escape_path() does not work as
>>> expected in specific locale such as ja_JP.SJIS or CP932 (suggested by
>>> Jun), I found it seems svn_cmdline__edit_file_externally() always 
>>> passes the file name as UTF-8 even if the LC_CTYPE is other than
>>> UTF-8.
>>>
>>> I think it is also need svn_path_cstring_from_utf8() conversion for
>>> file_name in svn_cmdline__edit_file_externally().
>>
>> As far as I read the code, svn_cmdline__edit_file_externally() is
>> called with paths come from svn_io_open_unique_file3() or
>> "local_abspath" from svn_client_conflict_t, so I believe it is
>> needed. 
> 
> Your patch looks good. I have one question below:
>  
>> I wrote a patch address it.
>> (attached fix-edit-file-externally-patch.txt)
>> [[[
>> Fix file name to edit from utf8 to local style.
>>
>> * subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
>>   Apply svn_path_cstring_from_utf8() to target file name.
>> ]]]
>>
>>> Here is a reproucing script. (Using ja_JP.UTF8 and ja_JP.SJIS locale).
>>
>> As it is quite few system can use ja_JP.SJIS locale (I only know
>> FreeBSD and macOS) and there is also escape_path issue in SJIS locale,
>> I modified this script to use ja_JP.eucJP locale.
> 
> It looks like apr_escape_shell() will escape 0x5c (backslash, \) when
> looking for ASCII character bytes. When 0x5c is escaped this breaks the
> SJIS-encoded byte sequence.

Yes, it is right.

> Have you already tried always using escape_path() on the UTF-8 version of
> the string, and then converting the escaped path to the locale's encoding?
> In other words: First use escape_path, then use svn_path_cstring_from_utf8?
> Perhaps that will make SJIS work?

Ah, I didn't make sense. I'll try and then post a new patch.
Thank you very much!

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: [PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Sep 20, 2020 at 11:06:21PM +0900, Yasuhito FUTATSUKI wrote:
> On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:
> > Hi,
> > 
> > While I tried to make an example that escape_path() does not work as
> > expected in specific locale such as ja_JP.SJIS or CP932 (suggested by
> > Jun), I found it seems svn_cmdline__edit_file_externally() always 
> > passes the file name as UTF-8 even if the LC_CTYPE is other than
> > UTF-8.
> > 
> > I think it is also need svn_path_cstring_from_utf8() conversion for
> > file_name in svn_cmdline__edit_file_externally().
> 
> As far as I read the code, svn_cmdline__edit_file_externally() is
> called with paths come from svn_io_open_unique_file3() or
> "local_abspath" from svn_client_conflict_t, so I believe it is
> needed. 

Your patch looks good. I have one question below:
 
> I wrote a patch address it.
> (attached fix-edit-file-externally-patch.txt)
> [[[
> Fix file name to edit from utf8 to local style.
> 
> * subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
>   Apply svn_path_cstring_from_utf8() to target file name.
> ]]]
> 
> > Here is a reproucing script. (Using ja_JP.UTF8 and ja_JP.SJIS locale).
> 
> As it is quite few system can use ja_JP.SJIS locale (I only know
> FreeBSD and macOS) and there is also escape_path issue in SJIS locale,
> I modified this script to use ja_JP.eucJP locale.

It looks like apr_escape_shell() will escape 0x5c (backslash, \) when
looking for ASCII character bytes. When 0x5c is escaped this breaks the
SJIS-encoded byte sequence.

Have you already tried always using escape_path() on the UTF-8 version of
the string, and then converting the escaped path to the locale's encoding?
In other words: First use escape_path, then use svn_path_cstring_from_utf8?
Perhaps that will make SJIS work?

Cheers,
Stefan

> [[[
> #!/bin/sh
> # assuming UTF-8 encoding in this file
> 
> testdir=/tmp/svn-conflict-edit-filename-test
> 
> if [ ! -d ${testdir} ]; then
>   mkdir -p ${testdir}
> fi
> 
> reposdir=${testdir}/testrepo
> reposurl=file://${reposdir}
> :${svn:=svn}
> 
> svnadmin create ${reposdir}
> cat > ${testdir}/record_filename.sh <<EOF
> #!/bin/sh
> LC_CTYPE=C; export LC_CTYPE
> echo \$* > ${testdir}/svn-conflict-edit-file-name.txt
> exit 0
> EOF
> 
> LC_CTYPE=ja_JP.UTF-8 ; export LC_CTYPE
> 
> # add a file "予定表.txt" (it means schedule in Japanese)
> # in UTF-8 working copy.
> # "予定表.txt" represented in hex are followings:
> #    e4 ba 88 e5 ae 9a e8 a1 a8 2e 74 78 74 (UTF-8)
> #    97 5c 92 e8 95 5c 2e 74 78 74          (SJIS; contains two '\'(== 0x5c))
> #    cd bd c4 ea 9c bd 2e 74 78 74          (EUC-JP)
> schedfn_utf8="予定表.txt"
> schedfn_sjis=`echo ${schedfn_utf8} | iconv -f utf-8 -t sjis`
> schedfn_eucjp=`echo ${schedfn_utf8} | iconv -f utf-8 -t euc-jp`
> 
> ${svn} checkout ${reposurl} ${testdir}/wc-utf-8
> cd ${testdir}/wc-utf-8
> 
> cat > ${schedfn_utf8} <<EOF
> 2020/09/19 foo
> EOF
> 
> ${svn} add ${schedfn_utf8}
> ${svn} commit -m 'add schedule memo.'
> 
> # prepare EUC-JP locale wc.
> (LC_CTYPE=ja_JP.eucJP; export LC_CTYPE ; \
>     ${svn} checkout $reposurl ${testdir}/wc-eucjp)
> 
> # update the file in UTF-8 wc and commit it
> cat >> ${schedfn_utf8} <<EOF
> 2020/09/20 bar
> EOF
> ${svn} commit -m 'add schedule at 2020/09/20' 
> 
> # add local modification in EUC-JP wc
> LC_CTYPE=ja_JP.eucJP ; export LC_CTYPE
> cd ${testdir}/wc-eucjp
> cat >> ${schedfn_eucjp} <<EOF
> 2020/09/21 baz
> EOF
> 
> ${svn} update --force-interactive --accept edit \
>   --editor-cmd "/bin/sh ${testdir}/record_filename.sh"
> 
> LC_CTYPE=C ; export LC_CTYPE
> ls | od -t x1
> od -t x1 ${testdir}/svn-conflict-edit-file-name.txt
> ]]]
> 
> The result with unpatched svn (last 4 lines):
> [[[
> 0000000    cd  bd  c4  ea  c9  bd  2e  74  78  74  0a                    
> 0000013
> 0000000    e4  ba  88  e5  ae  9a  e8  a1  a8  2e  74  78  74  0a        
> 0000016
> ]]]
> 
> The result with patched svn (last 4 lines):
> [[[
> 0000000    cd  bd  c4  ea  c9  bd  2e  74  78  74  0a                    
> 0000013
> 0000000    cd  bd  c4  ea  c9  bd  2e  74  78  74  0a                    
> 0000013
> ]]]
> 
> Cheers,
> -- 
> Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

> Index: subversion/libsvn_subr/cmdline.c
> ===================================================================
> --- subversion/libsvn_subr/cmdline.c	(revision 1881848)
> +++ subversion/libsvn_subr/cmdline.c	(working copy)
> @@ -1400,6 +1400,7 @@
>                                    apr_pool_t *pool)
>  {
>    const char *editor, *cmd, *base_dir, *file_name, *base_dir_apr;
> +  const char *file_name_local;
>    char *old_cwd;
>    int sys_err;
>    apr_status_t apr_err;
> @@ -1423,9 +1424,11 @@
>      return svn_error_wrap_apr
>        (apr_err, _("Can't change working directory to '%s'"), base_dir);
>  
> +  SVN_ERR(svn_path_cstring_from_utf8(&file_name_local, file_name, pool));
>    /* editor is explicitly documented as being interpreted by the user's shell,
>       and as such should already be quoted/escaped as needed. */
> -  cmd = apr_psprintf(pool, "%s %s", editor, escape_path(pool, file_name));
> +  cmd = apr_psprintf(pool, "%s %s", editor,
> +                     escape_path(pool, file_name_local));
>    sys_err = system(cmd);
>  
>    apr_err = apr_filepath_set(old_cwd, pool);


[PATCH] svn_cmdline__edit_file_externally() may not be able to open the target file in locale other than UTF-8

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/09/19 5:26, Yasuhito FUTATSUKI wrote:
> Hi,
> 
> While I tried to make an example that escape_path() does not work as
> expected in specific locale such as ja_JP.SJIS or CP932 (suggested by
> Jun), I found it seems svn_cmdline__edit_file_externally() always 
> passes the file name as UTF-8 even if the LC_CTYPE is other than
> UTF-8.
> 
> I think it is also need svn_path_cstring_from_utf8() conversion for
> file_name in svn_cmdline__edit_file_externally().

As far as I read the code, svn_cmdline__edit_file_externally() is
called with paths come from svn_io_open_unique_file3() or
"local_abspath" from svn_client_conflict_t, so I believe it is
needed. 

I wrote a patch address it.
(attached fix-edit-file-externally-patch.txt)
[[[
Fix file name to edit from utf8 to local style.

* subversion/libsvn_subr/cmdline.c (svn_cmdline__edit_file_externally):
  Apply svn_path_cstring_from_utf8() to target file name.
]]]

> Here is a reproucing script. (Using ja_JP.UTF8 and ja_JP.SJIS locale).

As it is quite few system can use ja_JP.SJIS locale (I only know
FreeBSD and macOS) and there is also escape_path issue in SJIS locale,
I modified this script to use ja_JP.eucJP locale.

[[[
#!/bin/sh
# assuming UTF-8 encoding in this file

testdir=/tmp/svn-conflict-edit-filename-test

if [ ! -d ${testdir} ]; then
  mkdir -p ${testdir}
fi

reposdir=${testdir}/testrepo
reposurl=file://${reposdir}
:${svn:=svn}

svnadmin create ${reposdir}
cat > ${testdir}/record_filename.sh <<EOF
#!/bin/sh
LC_CTYPE=C; export LC_CTYPE
echo \$* > ${testdir}/svn-conflict-edit-file-name.txt
exit 0
EOF

LC_CTYPE=ja_JP.UTF-8 ; export LC_CTYPE

# add a file "予定表.txt" (it means schedule in Japanese)
# in UTF-8 working copy.
# "予定表.txt" represented in hex are followings:
#    e4 ba 88 e5 ae 9a e8 a1 a8 2e 74 78 74 (UTF-8)
#    97 5c 92 e8 95 5c 2e 74 78 74          (SJIS; contains two '\'(== 0x5c))
#    cd bd c4 ea 9c bd 2e 74 78 74          (EUC-JP)
schedfn_utf8="予定表.txt"
schedfn_sjis=`echo ${schedfn_utf8} | iconv -f utf-8 -t sjis`
schedfn_eucjp=`echo ${schedfn_utf8} | iconv -f utf-8 -t euc-jp`

${svn} checkout ${reposurl} ${testdir}/wc-utf-8
cd ${testdir}/wc-utf-8

cat > ${schedfn_utf8} <<EOF
2020/09/19 foo
EOF

${svn} add ${schedfn_utf8}
${svn} commit -m 'add schedule memo.'

# prepare EUC-JP locale wc.
(LC_CTYPE=ja_JP.eucJP; export LC_CTYPE ; \
    ${svn} checkout $reposurl ${testdir}/wc-eucjp)

# update the file in UTF-8 wc and commit it
cat >> ${schedfn_utf8} <<EOF
2020/09/20 bar
EOF
${svn} commit -m 'add schedule at 2020/09/20' 

# add local modification in EUC-JP wc
LC_CTYPE=ja_JP.eucJP ; export LC_CTYPE
cd ${testdir}/wc-eucjp
cat >> ${schedfn_eucjp} <<EOF
2020/09/21 baz
EOF

${svn} update --force-interactive --accept edit \
  --editor-cmd "/bin/sh ${testdir}/record_filename.sh"

LC_CTYPE=C ; export LC_CTYPE
ls | od -t x1
od -t x1 ${testdir}/svn-conflict-edit-file-name.txt
]]]

The result with unpatched svn (last 4 lines):
[[[
0000000    cd  bd  c4  ea  c9  bd  2e  74  78  74  0a                    
0000013
0000000    e4  ba  88  e5  ae  9a  e8  a1  a8  2e  74  78  74  0a        
0000016
]]]

The result with patched svn (last 4 lines):
[[[
0000000    cd  bd  c4  ea  c9  bd  2e  74  78  74  0a                    
0000013
0000000    cd  bd  c4  ea  c9  bd  2e  74  78  74  0a                    
0000013
]]]

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>