You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2020/12/14 16:57:10 UTC
svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer:
mailer.py tests/mailer-t1.output tests/mailer-tweak.py
Author: stsp
Date: Mon Dec 14 16:57:10 2020
New Revision: 1884427
URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
Log:
Make mailer.py work properly with Python 3, and drop Python 2 support.
Most of the changes deal with the handling binary data vs Python strings.
I've made sure that mailer.py will work in a UTF-8 environment. In general,
UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
Environments using other encodings may not work as expected, but those will
be problematic for hook scripts in general. SVN repositories store internal
data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
or decoding of such data, and thus need to work with raw UTF-8 strings, not
Python strings.
The encoding of file and property contents is not guaranteed to be UTF-8.
This was already a problem before this change. This hook script sends email
with a content type header specifying the UTF-8 encoding. Diffs which contain
non-UTF-8 text will most likely not render properly when viewed in an email
reader. At least this problem is now obvious in mailer.py's implementation,
since all unidiff text is now written out directly as binary data.
As an additional fix, iterate file groups in sorted order. This results in
stable output and makes test cases in our tests/ subdirectory reproducible.
Tested with Python 3.7.5 which is the version I use in my SVN development
setup at present. Tests with newer versions are welcome.
* tools/hook-scripts/mailer/mailer.py:
Drop Python2-specific includes. Adjust includes as per 2to3.
(main): Decode arguments from UTF-8 to string.
(OutputBase:write): Encode string to UTF-8 and pass to write_binary().
OutputBase implementations now need to provide a self.write_binary
member which implements a write() method for binary data.
(MailedOutput): email.Header package is gone, use email.header instead,
and likewise replace use of email.Utils with email.utils
(SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
We cannot use StringIO since diffs may contain data in arbitrary encodings.
(StandardOutput): Provide self.write_binary in terms of stdout.buffer.
(PipeOutput): Provide self.write_binary in terms of pipe.stdin.
(Commit): Decode log message and paths from UTF-8 to string, and iterate
path groups from mailer.conf in sorted order.
(Lock): Decode directory entries from UTF-8 to string. Encode paths back
to UTF-8 when we ask libsvn_fs for a lock on a path.
Iterate path groups from mailer.conf in sorted order.
(DiffGenerator): Decode repository paths from UTF-8 to string.
(TextCommitRenderer): Decode author, log message, and path from UTF-8 to
string. Write diff data via write_binary, bypassing the re-encoding step.
(Config): Decode paths from UTF-8 to string before matching them against
regular expressions. Also decode the repository directory path from UTF-8.
* tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
File groups are now provided in stable sorted order. This should fix
spurious test failures in the future.
* tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
integers and pass binary data instead of strings into libsvn_fs.
Modified:
subversion/trunk/tools/hook-scripts/mailer/mailer.py
subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output
subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py
Modified: subversion/trunk/tools/hook-scripts/mailer/mailer.py
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/mailer/mailer.py?rev=1884427&r1=1884426&r2=1884427&view=diff
==============================================================================
--- subversion/trunk/tools/hook-scripts/mailer/mailer.py (original)
+++ subversion/trunk/tools/hook-scripts/mailer/mailer.py Mon Dec 14 16:57:10 2020
@@ -46,22 +46,11 @@
import os
import sys
-try:
- # Python >=3.0
- import configparser
- from urllib.parse import quote as urllib_parse_quote
-except ImportError:
- # Python <3.0
- import ConfigParser as configparser
- from urllib import quote as urllib_parse_quote
+import configparser
+from urllib.parse import quote as urllib_parse_quote
import time
import subprocess
-if sys.version_info[0] >= 3:
- # Python >=3.0
- from io import StringIO
-else:
- # Python <3.0
- from cStringIO import StringIO
+from io import BytesIO, StringIO
import smtplib
import re
import tempfile
@@ -99,9 +88,12 @@ def main(pool, cmd, config_fname, repos_
messenger = Commit(pool, cfg, repos)
elif cmd == 'propchange' or cmd == 'propchange2':
revision = int(cmd_args[0])
- author = cmd_args[1]
- propname = cmd_args[2]
- action = (cmd == 'propchange2' and cmd_args[3] or 'A')
+ author = cmd_args[1].decode('utf-8')
+ propname = cmd_args[2].decode('utf-8')
+ if cmd == 'propchange2' and cmd_args[3]:
+ action = cmd_args[3].decode('utf-8')
+ else:
+ action = 'A'
repos = Repository(repos_dir, revision, pool)
# Override the repos revision author with the author of the propchange
repos.author = author
@@ -111,7 +103,7 @@ def main(pool, cmd, config_fname, repos_
})
messenger = PropChange(pool, cfg, repos, author, propname, action)
elif cmd == 'lock' or cmd == 'unlock':
- author = cmd_args[0]
+ author = cmd_args[0].decode('utf-8')
repos = Repository(repos_dir, 0, pool) ### any old revision will do
# Override the repos revision author with the author of the lock/unlock
repos.author = author
@@ -180,7 +172,7 @@ class OutputBase:
def write(self, output):
"""Override this method.
Append the literal text string OUTPUT to the output representation."""
- raise NotImplementedError
+ return self.write_binary(output.encode('utf-8'))
def run(self, cmd):
"""Override this method, if the default implementation is not sufficient.
@@ -234,7 +226,7 @@ class MailedOutput(OutputBase):
# Return the result of splitting HDR into tokens (on space
# characters), encoding (per RFC2047) each token as necessary, and
# slapping 'em back to together again.
- from email.Header import Header
+ from email.header import Header
def _maybe_encode_header(hdr_token):
try:
@@ -246,7 +238,7 @@ class MailedOutput(OutputBase):
return ' '.join(map(_maybe_encode_header, hdr.split()))
def mail_headers(self, group, params):
- from email import Utils
+ from email import utils
subject = self._rfc2047_encode(self.make_subject(group, params))
from_hdr = self._rfc2047_encode(self.from_addr)
@@ -265,7 +257,7 @@ class MailedOutput(OutputBase):
'X-Svn-Commit-Revision: %d\n' \
'X-Svn-Commit-Repository: %s\n' \
% (from_hdr, to_hdr, subject,
- Utils.formatdate(), Utils.make_msgid(), group,
+ utils.formatdate(), utils.make_msgid(), group,
self.repos.author or 'no_author', self.repos.rev,
os.path.basename(self.repos.repos_dir))
if self.reply_to:
@@ -279,8 +271,8 @@ class SMTPOutput(MailedOutput):
def start(self, group, params):
MailedOutput.start(self, group, params)
- self.buffer = StringIO()
- self.write = self.buffer.write
+ self.buffer = BytesIO()
+ self.write_binary = self.buffer.write
self.write(self.mail_headers(group, params))
@@ -359,7 +351,7 @@ class StandardOutput(OutputBase):
def __init__(self, cfg, repos, prefix_param):
OutputBase.__init__(self, cfg, repos, prefix_param)
- self.write = sys.stdout.write
+ self.write_binary = sys.stdout.buffer.write
def start(self, group, params):
self.write("Group: " + (group or "defaults") + "\n")
@@ -388,7 +380,7 @@ class PipeOutput(MailedOutput):
# construct the pipe for talking to the mailer
self.pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
close_fds=sys.platform != "win32")
- self.write = self.pipe.stdin.write
+ self.write_binary = self.pipe.stdin.write
# start writing out the mail message
self.write(self.mail_headers(group, params))
@@ -430,6 +422,7 @@ class Commit(Messenger):
self.changelist = sorted(editor.get_changes().items())
log = repos.get_rev_prop(svn.core.SVN_PROP_REVISION_LOG) or ''
+ log = log.decode('utf-8')
# collect the set of groups and the unique sets of params for the options
self.groups = { }
@@ -448,6 +441,7 @@ class Commit(Messenger):
# figure out the changed directories
dirs = { }
for path, change in self.changelist:
+ path = path.decode('utf-8')
if change.item_kind == svn.core.svn_node_dir:
dirs[path] = None
else:
@@ -484,7 +478,7 @@ class Commit(Messenger):
# build a renderer, tied to our output stream
renderer = TextCommitRenderer(self.output)
- for (group, param_tuple), (params, paths) in self.groups.items():
+ for (group, param_tuple), (params, paths) in sorted(self.groups.items()):
try:
self.output.start(group, params)
@@ -600,7 +594,9 @@ class Lock(Messenger):
or 'unlock_subject_prefix'))
# read all the locked paths from STDIN and strip off the trailing newlines
- self.dirlist = [x.rstrip() for x in sys.stdin.readlines()]
+ self.dirlist = [x for x in sys.stdin.readlines()]
+ for i in range(len(self.dirlist)):
+ self.dirlist[i] = self.dirlist[i].decode('utf-8')
# collect the set of groups and the unique sets of params for the options
self.groups = { }
@@ -629,11 +625,12 @@ class Lock(Messenger):
# The lock comment is the same for all paths, so we can just pull
# the comment for the first path in the dirlist and cache it.
self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr,
- self.dirlist[0], self.pool)
+ self.dirlist[0].encode('utf-8'),
+ self.pool)
def generate(self):
ret = 0
- for (group, param_tuple), (params, paths) in self.groups.items():
+ for (group, param_tuple), (params, paths) in sorted(self.groups.items()):
try:
self.output.start(group, params)
@@ -879,7 +876,8 @@ class DiffGenerator:
diff = svn.fs.FileDiff(self.repos.get_root(change.base_rev),
base_path, None, None, self.pool)
- label1 = '%s\t%s\t(r%s)' % (base_path, self.date, change.base_rev)
+ label1 = '%s\t%s\t(r%s)' % (base_path.decode('utf-8'), self.date,
+ change.base_rev)
label2 = '/dev/null\t00:00:00 1970\t(deleted)'
singular = True
@@ -902,21 +900,23 @@ class DiffGenerator:
self.repos.root_this, change.path,
self.pool)
label1 = '%s\t%s\t(r%s, copy source)' \
- % (base_path, base_date, change.base_rev)
+ % (base_path.decode('utf-8'), base_date, change.base_rev)
label2 = '%s\t%s\t(r%s)' \
- % (change.path, self.date, self.repos.rev)
+ % (change.path.decode('utf-8'), self.date, \
+ self.repos.rev)
singular = False
else:
# this file was copied.
kind = 'C'
if self.diffsels.copy:
diff = svn.fs.FileDiff(None, None, self.repos.root_this,
- change.path, self.pool)
+ change.path.decode('utf-8'), self.pool)
label1 = '/dev/null\t00:00:00 1970\t' \
'(empty, because file is newly added)'
label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
- % (change.path, self.date, self.repos.rev, \
- change.base_rev, base_path)
+ % (change.path.decode('utf-8'), self.date,
+ self.repos.rev, change.base_rev,
+ base_path.decode('utf-8'))
singular = False
else:
# the file was added.
@@ -932,7 +932,7 @@ class DiffGenerator:
label1 = '/dev/null\t00:00:00 1970\t' \
'(empty, because file is newly added)'
label2 = '%s\t%s\t(r%s)' \
- % (change.path, self.date, self.repos.rev)
+ % (change.path.decode('utf-8'), self.date, self.repos.rev)
singular = True
elif not change.text_changed:
@@ -952,9 +952,9 @@ class DiffGenerator:
self.repos.root_this, change.path,
self.pool)
label1 = '%s\t%s\t(r%s)' \
- % (base_path, base_date, change.base_rev)
+ % (base_path.decode('utf-8'), base_date, change.base_rev)
label2 = '%s\t%s\t(r%s)' \
- % (change.path, self.date, self.repos.rev)
+ % (change.path.decode('utf-8'), self.date, self.repos.rev)
singular = False
if diff:
@@ -1092,7 +1092,7 @@ class TextCommitRenderer:
w = self.output.write
- w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author,
+ w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author.decode('utf-8'),
data.date,
data.rev))
@@ -1101,7 +1101,7 @@ class TextCommitRenderer:
else:
w('\n')
- w('Log:\n%s\n\n' % data.log.strip())
+ w('Log:\n%s\n\n' % data.log.strip().decode('utf-8'))
# print summary sections
self._render_list('Added', data.added_data)
@@ -1144,7 +1144,7 @@ class TextCommitRenderer:
props = ' (props changed)'
else:
props = ''
- w(' %s%s%s\n' % (d.path, is_dir, props))
+ w(' %s%s%s\n' % (d.path.decode('utf-8'), is_dir, props))
if d.copied:
if is_dir:
text = ''
@@ -1153,7 +1153,7 @@ class TextCommitRenderer:
else:
text = ' unchanged'
w(' - copied%s from r%d, %s%s\n'
- % (text, d.base_rev, d.base_path, is_dir))
+ % (text, d.base_rev, d.base_path.decode('utf-8'), is_dir))
def _render_diffs(self, diffs, section_header):
"""Render diffs. Write the SECTION_HEADER if there are actually
@@ -1170,18 +1170,20 @@ class TextCommitRenderer:
w(section_header)
section_header_printed = True
if diff.kind == 'D':
- w('\nDeleted: %s\n' % diff.base_path)
+ w('\nDeleted: %s\n' % diff.base_path.decode('utf-8'))
elif diff.kind == 'A':
- w('\nAdded: %s\n' % diff.path)
+ w('\nAdded: %s\n' % diff.path.decode('utf-8'))
elif diff.kind == 'C':
w('\nCopied: %s (from r%d, %s)\n'
- % (diff.path, diff.base_rev, diff.base_path))
+ % (diff.path.decode('utf-8'), diff.base_rev,
+ diff.base_path.decode('utf-8')))
elif diff.kind == 'W':
w('\nCopied and modified: %s (from r%d, %s)\n'
- % (diff.path, diff.base_rev, diff.base_path))
+ % (diff.path.decode('utf-8'), diff.base_rev,
+ diff.base_path.decode('utf-8')))
else:
# kind == 'M'
- w('\nModified: %s\n' % diff.path)
+ w('\nModified: %s\n' % diff.path.decode('utf-8'))
if diff.diff_url:
w('URL: %s\n' % diff.diff_url)
@@ -1198,8 +1200,9 @@ class TextCommitRenderer:
w('Binary file (source and/or target). No diff available.\n')
continue
+ wb = self.output.write_binary
for line in diff.content:
- w(line.raw)
+ wb(line.raw)
class Repository:
@@ -1426,9 +1429,9 @@ class Config:
"Return the path's associated groups."
groups = []
for group, pattern, exclude_pattern, repos_params, search_logmsg_re in self._group_re:
- match = pattern.match(path)
+ match = pattern.match(path.decode('utf-8'))
if match:
- if exclude_pattern and exclude_pattern.match(path):
+ if exclude_pattern and exclude_pattern.match(path.decode('utf-8')):
continue
params = repos_params.copy()
params.update(match.groupdict())
@@ -1508,6 +1511,7 @@ if the property was added, modified or d
cmd = sys.argv[1]
repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
+ repos_dir = repos_dir.decode('utf-8')
try:
expected_args = cmd_list[cmd]
except KeyError:
Modified: subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output?rev=1884427&r1=1884426&r2=1884427&view=diff
==============================================================================
--- subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output (original)
+++ subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output Mon Dec 14 16:57:10 2020
@@ -1,4 +1,4 @@
-Group: file
+Group: All
Subject: r1 - dir1 dir2
Author: mailer test
@@ -9,9 +9,43 @@ Log:
initial load
Added:
+ dir1/
+ dir1/file3
+ dir1/file4
+ dir2/
+ dir2/file5
+ dir2/file6
file1
file2
+Added: dir1/file3
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ dir1/file3 Sun Sep 9 01:46:40 2001 (r1)
+@@ -0,0 +1 @@
++file3
+
+Added: dir1/file4
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ dir1/file4 Sun Sep 9 01:46:40 2001 (r1)
+@@ -0,0 +1 @@
++file4
+
+Added: dir2/file5
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ dir2/file5 Sun Sep 9 01:46:40 2001 (r1)
+@@ -0,0 +1 @@
++file5
+
+Added: dir2/file6
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ dir2/file6 Sun Sep 9 01:46:40 2001 (r1)
+@@ -0,0 +1 @@
++file6
+
Added: file1
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
@@ -25,7 +59,7 @@ Added: file2
+++ file2 Sun Sep 9 01:46:40 2001 (r1)
@@ -0,0 +1 @@
+file2
-Group: file plus other areas
+Group: file
Subject: r1 - dir1 dir2
Author: mailer test
@@ -39,15 +73,6 @@ Added:
file1
file2
-Changes in other areas also in this revision:
-Added:
- dir1/
- dir1/file3
- dir1/file4
- dir2/
- dir2/file5
- dir2/file6
-
Added: file1
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
@@ -61,37 +86,7 @@ Added: file2
+++ file2 Sun Sep 9 01:46:40 2001 (r1)
@@ -0,0 +1 @@
+file2
-
-Diffs of changes in other areas also in this revision:
-
-Added: dir1/file3
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir1/file3 Sun Sep 9 01:46:40 2001 (r1)
-@@ -0,0 +1 @@
-+file3
-
-Added: dir1/file4
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir1/file4 Sun Sep 9 01:46:40 2001 (r1)
-@@ -0,0 +1 @@
-+file4
-
-Added: dir2/file5
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir2/file5 Sun Sep 9 01:46:40 2001 (r1)
-@@ -0,0 +1 @@
-+file5
-
-Added: dir2/file6
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir2/file6 Sun Sep 9 01:46:40 2001 (r1)
-@@ -0,0 +1 @@
-+file6
-Group: All
+Group: file plus other areas
Subject: r1 - dir1 dir2
Author: mailer test
@@ -102,14 +97,33 @@ Log:
initial load
Added:
+ file1
+ file2
+
+Changes in other areas also in this revision:
+Added:
dir1/
dir1/file3
dir1/file4
dir2/
dir2/file5
dir2/file6
- file1
- file2
+
+Added: file1
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ file1 Sun Sep 9 01:46:40 2001 (r1)
+@@ -0,0 +1 @@
++file1
+
+Added: file2
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ file2 Sun Sep 9 01:46:40 2001 (r1)
+@@ -0,0 +1 @@
++file2
+
+Diffs of changes in other areas also in this revision:
Added: dir1/file3
==============================================================================
@@ -138,21 +152,7 @@ Added: dir2/file6
+++ dir2/file6 Sun Sep 9 01:46:40 2001 (r1)
@@ -0,0 +1 @@
+file6
-
-Added: file1
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ file1 Sun Sep 9 01:46:40 2001 (r1)
-@@ -0,0 +1 @@
-+file1
-
-Added: file2
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ file2 Sun Sep 9 01:46:40 2001 (r1)
-@@ -0,0 +1 @@
-+file2
-Group: file
+Group: All
Subject: r2 - dir1 dir2
Author: mailer test
@@ -163,9 +163,19 @@ Log:
two file changes. Fixes Blah#123
Modified:
+ dir1/ (props changed)
+ dir2/file5
file1 (props changed)
file2 (contents, props changed)
+Modified: dir2/file5
+==============================================================================
+--- dir2/file5 Sun Sep 9 01:46:40 2001 (r1)
++++ dir2/file5 Sun Sep 9 04:33:20 2001 (r2)
+@@ -1 +1,2 @@
+ file5
++change C2
+
Modified: file2
==============================================================================
--- file2 Sun Sep 9 01:46:40 2001 (r1)
@@ -204,7 +214,7 @@ Modified: file2
@@ -1 +1,2 @@
file2
+change C1
-Group: All
+Group: file
Subject: r2 - dir1 dir2
Author: mailer test
@@ -215,19 +225,9 @@ Log:
two file changes. Fixes Blah#123
Modified:
- dir1/ (props changed)
- dir2/file5
file1 (props changed)
file2 (contents, props changed)
-Modified: dir2/file5
-==============================================================================
---- dir2/file5 Sun Sep 9 01:46:40 2001 (r1)
-+++ dir2/file5 Sun Sep 9 04:33:20 2001 (r2)
-@@ -1 +1,2 @@
- file5
-+change C2
-
Modified: file2
==============================================================================
--- file2 Sun Sep 9 01:46:40 2001 (r1)
@@ -283,16 +283,35 @@ two copies
Added:
dir2/file7
- - copied unchanged from r2, file1
+ - copied unchanged from r2, /file1
dir3/ (props changed)
- - copied from r2, dir1/
+ - copied from r2, /dir1/
+Replaced:
+ dir3/file3
+ - copied unchanged from r1, /dir1/file3
+ dir3/file4
+ - copied unchanged from r1, /dir1/file4
-Copied: dir2/file7 (from r2, file1)
+Copied: dir2/file7 (from r2, /file1)
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir2/file7 Sun Sep 9 07:20:00 2001 (r3, copy of r2, file1)
++++ dir2/file7 Sun Sep 9 07:20:00 2001 (r3, copy of r2, /file1)
@@ -0,0 +1 @@
+file1
+
+Copied: dir3/file3 (from r1, /dir1/file3)
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ dir3/file3 Sun Sep 9 07:20:00 2001 (r3, copy of r1, /dir1/file3)
+@@ -0,0 +1 @@
++file3
+
+Copied: dir3/file4 (from r1, /dir1/file4)
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ dir3/file4 Sun Sep 9 07:20:00 2001 (r3, copy of r1, /dir1/file4)
+@@ -0,0 +1 @@
++file4
Group: All
Subject: r4 - dir3
@@ -305,16 +324,16 @@ copied and changed
Added:
dir3/file8
- - copied, changed from r2, file1
+ - copied, changed from r2, /file1
-Copied and modified: dir3/file8 (from r2, file1)
+Copied and modified: dir3/file8 (from r2, /file1)
==============================================================================
---- file1 Sun Sep 9 04:33:20 2001 (r2, copy source)
+--- /file1 Sun Sep 9 04:33:20 2001 (r2, copy source)
+++ dir3/file8 Sun Sep 9 10:06:40 2001 (r4)
@@ -1 +1,2 @@
file1
+change C3
-Group: file
+Group: All
Subject: r5 - dir1 dir3
Author: mailer test
@@ -325,8 +344,10 @@ Log:
changes and deletes of properties
Modified:
+ dir1/ (props changed)
+ dir3/ (props changed)
file2 (props changed)
-Group: file plus other areas
+Group: file
Subject: r5 - dir1 dir3
Author: mailer test
@@ -338,12 +359,7 @@ changes and deletes of properties
Modified:
file2 (props changed)
-
-Changes in other areas also in this revision:
-Modified:
- dir1/ (props changed)
- dir3/ (props changed)
-Group: All
+Group: file plus other areas
Subject: r5 - dir1 dir3
Author: mailer test
@@ -354,10 +370,13 @@ Log:
changes and deletes of properties
Modified:
+ file2 (props changed)
+
+Changes in other areas also in this revision:
+Modified:
dir1/ (props changed)
dir3/ (props changed)
- file2 (props changed)
-Group: file
+Group: All
Subject: r6 - dir1 dir4
Author: mailer test
@@ -368,7 +387,18 @@ Log:
mixed addition and change. Fixes Blaz#456 Blah#987
Added:
+ dir4/
file9
+Modified:
+ dir1/file3
+
+Modified: dir1/file3
+==============================================================================
+--- dir1/file3 Sun Sep 9 12:53:20 2001 (r5)
++++ dir1/file3 Sun Sep 9 15:40:00 2001 (r6)
+@@ -1 +1,2 @@
+ file3
++change C4
Added: file9
==============================================================================
@@ -376,8 +406,8 @@ Added: file9
+++ file9 Sun Sep 9 15:40:00 2001 (r6)
@@ -0,0 +1 @@
+file9
-Group: file plus other areas
-Subject: r6 - dir1 dir4
+Group: bugtracker
+Subject: Fix for Blah#987: r6 - dir1 dir4
Author: mailer test
Date: Sun Sep 9 15:40:00 2001
@@ -387,23 +417,11 @@ Log:
mixed addition and change. Fixes Blaz#456 Blah#987
Added:
- file9
-
-Changes in other areas also in this revision:
-Added:
dir4/
+ file9
Modified:
dir1/file3
-Added: file9
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ file9 Sun Sep 9 15:40:00 2001 (r6)
-@@ -0,0 +1 @@
-+file9
-
-Diffs of changes in other areas also in this revision:
-
Modified: dir1/file3
==============================================================================
--- dir1/file3 Sun Sep 9 12:53:20 2001 (r5)
@@ -411,6 +429,13 @@ Modified: dir1/file3
@@ -1 +1,2 @@
file3
+change C4
+
+Added: file9
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ file9 Sun Sep 9 15:40:00 2001 (r6)
+@@ -0,0 +1 @@
++file9
Group: bugtracker
Subject: Fix for Blaz#456: r6 - dir1 dir4
@@ -441,8 +466,8 @@ Added: file9
+++ file9 Sun Sep 9 15:40:00 2001 (r6)
@@ -0,0 +1 @@
+file9
-Group: bugtracker
-Subject: Fix for Blah#987: r6 - dir1 dir4
+Group: file
+Subject: r6 - dir1 dir4
Author: mailer test
Date: Sun Sep 9 15:40:00 2001
@@ -452,18 +477,7 @@ Log:
mixed addition and change. Fixes Blaz#456 Blah#987
Added:
- dir4/
file9
-Modified:
- dir1/file3
-
-Modified: dir1/file3
-==============================================================================
---- dir1/file3 Sun Sep 9 12:53:20 2001 (r5)
-+++ dir1/file3 Sun Sep 9 15:40:00 2001 (r6)
-@@ -1 +1,2 @@
- file3
-+change C4
Added: file9
==============================================================================
@@ -471,7 +485,7 @@ Added: file9
+++ file9 Sun Sep 9 15:40:00 2001 (r6)
@@ -0,0 +1 @@
+file9
-Group: All
+Group: file plus other areas
Subject: r6 - dir1 dir4
Author: mailer test
@@ -482,11 +496,23 @@ Log:
mixed addition and change. Fixes Blaz#456 Blah#987
Added:
- dir4/
file9
+
+Changes in other areas also in this revision:
+Added:
+ dir4/
Modified:
dir1/file3
+Added: file9
+==============================================================================
+--- /dev/null 00:00:00 1970 (empty, because file is newly added)
++++ file9 Sun Sep 9 15:40:00 2001 (r6)
+@@ -0,0 +1 @@
++file9
+
+Diffs of changes in other areas also in this revision:
+
Modified: dir1/file3
==============================================================================
--- dir1/file3 Sun Sep 9 12:53:20 2001 (r5)
@@ -494,13 +520,47 @@ Modified: dir1/file3
@@ -1 +1,2 @@
file3
+change C4
+Group: All
+Subject: r7 - dir1 dir2 dir3 dir3/dir5
-Added: file9
+Author: mailer test
+Date: Sun Sep 9 18:26:40 2001
+New Revision: 7
+
+Log:
+adds, deletes, and a change
+
+Added:
+ dir1/file10
+ dir3/dir5/
+Deleted:
+ dir2/
+ file2
+Modified:
+ dir3/file3
+
+Added: dir1/file10
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ file9 Sun Sep 9 15:40:00 2001 (r6)
++++ dir1/file10 Sun Sep 9 18:26:40 2001 (r7)
@@ -0,0 +1 @@
-+file9
++file10
+
+Modified: dir3/file3
+==============================================================================
+--- dir3/file3 Sun Sep 9 15:40:00 2001 (r6)
++++ dir3/file3 Sun Sep 9 18:26:40 2001 (r7)
+@@ -1 +1,2 @@
+ file3
++change C5
+
+Deleted: file2
+==============================================================================
+--- file2 Sun Sep 9 18:26:40 2001 (r6)
++++ /dev/null 00:00:00 1970 (deleted)
+@@ -1,2 +0,0 @@
+-file2
+-change C1
Group: file
Subject: r7 - dir1 dir2 dir3 dir3/dir5
@@ -568,47 +628,6 @@ Modified: dir3/file3
file3
+change C5
Group: All
-Subject: r7 - dir1 dir2 dir3 dir3/dir5
-
-Author: mailer test
-Date: Sun Sep 9 18:26:40 2001
-New Revision: 7
-
-Log:
-adds, deletes, and a change
-
-Added:
- dir1/file10
- dir3/dir5/
-Deleted:
- dir2/
- file2
-Modified:
- dir3/file3
-
-Added: dir1/file10
-==============================================================================
---- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir1/file10 Sun Sep 9 18:26:40 2001 (r7)
-@@ -0,0 +1 @@
-+file10
-
-Modified: dir3/file3
-==============================================================================
---- dir3/file3 Sun Sep 9 15:40:00 2001 (r6)
-+++ dir3/file3 Sun Sep 9 18:26:40 2001 (r7)
-@@ -1 +1,2 @@
- file3
-+change C5
-
-Deleted: file2
-==============================================================================
---- file2 Sun Sep 9 18:26:40 2001 (r6)
-+++ /dev/null 00:00:00 1970 (deleted)
-@@ -1,2 +0,0 @@
--file2
--change C1
-Group: All
Subject: r8 - in dir6: . dir5
Author: mailer test
@@ -620,31 +639,31 @@ copy dir, then make a change
Added:
dir6/
- - copied from r6, dir3/
+ - copied from r6, /dir3/
dir6/dir5/
- - copied from r7, dir3/dir5/
+ - copied from r7, /dir3/dir5/
Replaced:
dir6/file3
- - copied unchanged from r7, dir3/file3
+ - copied unchanged from r7, /dir3/file3
Modified:
dir6/file4
-Copied: dir6/file3 (from r7, dir3/file3)
+Copied: dir6/file3 (from r7, /dir3/file3)
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
-+++ dir6/file3 Sun Sep 9 21:13:20 2001 (r8, copy of r7, dir3/file3)
++++ dir6/file3 Sun Sep 9 21:13:20 2001 (r8, copy of r7, /dir3/file3)
@@ -0,0 +1,2 @@
+file3
+change C5
Modified: dir6/file4
==============================================================================
---- dir3/file4 Sun Sep 9 15:40:00 2001 (r6)
+--- /dir3/file4 Sun Sep 9 15:40:00 2001 (r6)
+++ dir6/file4 Sun Sep 9 21:13:20 2001 (r8)
@@ -1 +1,2 @@
file4
+change C6
-Group: file
+Group: All
Subject: r9 -
Author: mailer test
@@ -662,7 +681,7 @@ Modified:
Added: file11
==============================================================================
Binary file. No diff available.
-Group: file plus other areas
+Group: file
Subject: r9 -
Author: mailer test
@@ -680,7 +699,7 @@ Modified:
Added: file11
==============================================================================
Binary file. No diff available.
-Group: All
+Group: file plus other areas
Subject: r9 -
Author: mailer test
@@ -698,7 +717,7 @@ Modified:
Added: file11
==============================================================================
Binary file. No diff available.
-Group: file
+Group: All
Subject: r10 -
Author: mailer test
@@ -715,7 +734,7 @@ Modified:
Modified: file11
==============================================================================
Binary file (source and/or target). No diff available.
-Group: file plus other areas
+Group: file
Subject: r10 -
Author: mailer test
@@ -732,7 +751,7 @@ Modified:
Modified: file11
==============================================================================
Binary file (source and/or target). No diff available.
-Group: All
+Group: file plus other areas
Subject: r10 -
Author: mailer test
Modified: subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py?rev=1884427&r1=1884426&r2=1884427&view=diff
==============================================================================
--- subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py (original)
+++ subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py Mon Dec 14 16:57:10 2020
@@ -50,10 +50,10 @@ def tweak_dates(pool, home='.'):
for i in range(fs.youngest_rev(fsob, pool)):
# convert secs into microseconds, then a string
- date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000L, pool)
+ date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000, pool)
#print date
fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_DATE, date, pool)
- fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, 'mailer test', pool)
+ fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, b'mailer test', pool)
def main():
if len(sys.argv) != 2:
Re: mailer.py py2/py3 change: non-UTF-8 environments (was: svn
commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py)
Posted by Stefan Sperling <st...@apache.org>.
On Mon, Dec 21, 2020 at 07:15:53AM +0000, Daniel Shahaf wrote:
> stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> > URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> > Log:
> > Make mailer.py work properly with Python 3, and drop Python 2 support.
> >
> > Most of the changes deal with the handling binary data vs Python strings.
> >
> > I've made sure that mailer.py will work in a UTF-8 environment. In general,
> > UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> > Environments using other encodings may not work as expected, but those will
> > be problematic for hook scripts in general.
>
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.
Anyone who didn't explicitly set a locale in hook-envs will run this
script in the C locale today, and this case still works.
I would not count broken email generated in non-UTF-8 locales as a regression.
mailer.py's email content encoding header was set to UTF-8 17 years ago:
https://svn.apache.org/r847862
> I suppose it's fair to classify non-UTF-8 environments as "patches
> welcome", following the precedent of libmagic support in the Windows
> build, but:
>
> 1. Can we detect non-UTF-8 environments and warn or error out hard upon
> them? «locale.getlocale()[1]» seems promising?
I don't see what value this adds. When there is an encoding problem it
will already be obvious from gargabe characters in email notifications.
And many people can live with that, so there is no reason to error out.
Given that file diffs can contain an arbitrary mix of encodings, this script
cannot guarantee readable diff output unless ASCII/UTF-8 encoding is used
for all files. I don't see a good way around this limitation. Auto-detecting
file encoding is tricky, and requring people to tag file encodings via the
svn:mime-type property is simply not going to work in pratice.
> 2. A change that hasn't been confirmed *not* to constitute a regression
> merits a release notes entry. Would you do the honours?
I think release notes should mention Python3 support for mailer.py and
recommend that mailer.py is run in the UTF-8 or C locale.
Re: mailer.py py2/py3 change: non-UTF-8 environments (was: svn
commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py)
Posted by Stefan Sperling <st...@apache.org>.
On Mon, Dec 21, 2020 at 07:15:53AM +0000, Daniel Shahaf wrote:
> stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> > URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> > Log:
> > Make mailer.py work properly with Python 3, and drop Python 2 support.
> >
> > Most of the changes deal with the handling binary data vs Python strings.
> >
> > I've made sure that mailer.py will work in a UTF-8 environment. In general,
> > UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> > Environments using other encodings may not work as expected, but those will
> > be problematic for hook scripts in general.
>
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.
Anyone who didn't explicitly set a locale in hook-envs will run this
script in the C locale today, and this case still works.
I would not count broken email generated in non-UTF-8 locales as a regression.
mailer.py's email content encoding header was set to UTF-8 17 years ago:
https://svn.apache.org/r847862
> I suppose it's fair to classify non-UTF-8 environments as "patches
> welcome", following the precedent of libmagic support in the Windows
> build, but:
>
> 1. Can we detect non-UTF-8 environments and warn or error out hard upon
> them? «locale.getlocale()[1]» seems promising?
I don't see what value this adds. When there is an encoding problem it
will already be obvious from gargabe characters in email notifications.
And many people can live with that, so there is no reason to error out.
Given that file diffs can contain an arbitrary mix of encodings, this script
cannot guarantee readable diff output unless ASCII/UTF-8 encoding is used
for all files. I don't see a good way around this limitation. Auto-detecting
file encoding is tricky, and requring people to tag file encodings via the
svn:mime-type property is simply not going to work in pratice.
> 2. A change that hasn't been confirmed *not* to constitute a regression
> merits a release notes entry. Would you do the honours?
I think release notes should mention Python3 support for mailer.py and
recommend that mailer.py is run in the UTF-8 or C locale.
Re: mailer.py py2/py3 change: non-UTF-8 environments
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Wed, Dec 23, 2020 at 03:13:42 +0900:
> On 2020/12/21 16:15, Daniel Shahaf wrote:
> > stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> >> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> >> Log:
> >> Make mailer.py work properly with Python 3, and drop Python 2 support.
> >>
> >> Most of the changes deal with the handling binary data vs Python strings.
> >>
> >> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> >> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> >> Environments using other encodings may not work as expected, but those will
> >> be problematic for hook scripts in general.
> >
> > Correct me if I'm wrong, but it sounds like you haven't ruled out the
> > possibility that this commit will constitute a regression for anyone
> > who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> > commit.
>
> It is not for the commit message, but for the committed code, it rather
> supports non-UTF-8 (or C) locale, at least the handling of REPOS-PATH
> (as I wrote another thread).
>
*nod*, but that does leave outstanding the case of non-UTF-8 locale and data
other than REPOS-PATH.
> It seems the code before commit for Python 2 could not handle REPOS-PATH
> if it contains non ascii character on non-UTF-8 environment because it
> didn't care that svn.core.svn_path_canonicalize and svn.repos.open only
> accept bytes object of UTF-8 path. Even if a user might pass REPOS-PATH
> as transcoded to UTF-8, the script searches for a default config file
> under the transcoded REPOS-PATH and it is not desirable behavior.
>
> For Python 2/3 support, I don't want to drop Python 2 support for mailer.py
> if we can easily, but I also think Python 3 support is higher priority.
+1
> And as also I wrote another thread, it seems still incomplete to support
> Python 3. I'll take a look this, but it has not made progress at all
> since I then, sorry.
No worries! I started a separate thread since I thought the issues were independent.
Thanks,
Daniel
Re: mailer.py py2/py3 change: non-UTF-8 environments
Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/12/21 16:15, Daniel Shahaf wrote:
> stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
>> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>> Log:
>> Make mailer.py work properly with Python 3, and drop Python 2 support.
>>
>> Most of the changes deal with the handling binary data vs Python strings.
>>
>> I've made sure that mailer.py will work in a UTF-8 environment. In general,
>> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>> Environments using other encodings may not work as expected, but those will
>> be problematic for hook scripts in general.
>
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.
It is not for the commit message, but for the committed code, it rather
supports non-UTF-8 (or C) locale, at least the handling of REPOS-PATH
(as I wrote another thread).
It seems the code before commit for Python 2 could not handle REPOS-PATH
if it contains non ascii character on non-UTF-8 environment because it
didn't care that svn.core.svn_path_canonicalize and svn.repos.open only
accept bytes object of UTF-8 path. Even if a user might pass REPOS-PATH
as transcoded to UTF-8, the script searches for a default config file
under the transcoded REPOS-PATH and it is not desirable behavior.
For Python 2/3 support, I don't want to drop Python 2 support for mailer.py
if we can easily, but I also think Python 3 support is higher priority.
And as also I wrote another thread, it seems still incomplete to support
Python 3. I'll take a look this, but it has not made progress at all
since I then, sorry.
Cheers,
--
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>
mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit:
r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py)
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> Log:
> Make mailer.py work properly with Python 3, and drop Python 2 support.
>
> Most of the changes deal with the handling binary data vs Python strings.
>
> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> Environments using other encodings may not work as expected, but those will
> be problematic for hook scripts in general.
Correct me if I'm wrong, but it sounds like you haven't ruled out the
possibility that this commit will constitute a regression for anyone
who runs mailer.py in a non-UTF-8 environment and will upgrade to this
commit.
I suppose it's fair to classify non-UTF-8 environments as "patches
welcome", following the precedent of libmagic support in the Windows
build, but:
1. Can we detect non-UTF-8 environments and warn or error out hard upon
them? «locale.getlocale()[1]» seems promising?
2. A change that hasn't been confirmed *not* to constitute a regression
merits a release notes entry. Would you do the honours?
Cheers,
Daniel
> SVN repositories store internal
> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> or decoding of such data, and thus need to work with raw UTF-8 strings, not
> Python strings.
>
> The encoding of file and property contents is not guaranteed to be UTF-8.
> This was already a problem before this change. This hook script sends email
> with a content type header specifying the UTF-8 encoding. Diffs which contain
> non-UTF-8 text will most likely not render properly when viewed in an email
> reader. At least this problem is now obvious in mailer.py's implementation,
> since all unidiff text is now written out directly as binary data.
>
> As an additional fix, iterate file groups in sorted order. This results in
> stable output and makes test cases in our tests/ subdirectory reproducible.
>
> Tested with Python 3.7.5 which is the version I use in my SVN development
> setup at present. Tests with newer versions are welcome.
>
> * tools/hook-scripts/mailer/mailer.py:
> Drop Python2-specific includes. Adjust includes as per 2to3.
> (main): Decode arguments from UTF-8 to string.
> (OutputBase:write): Encode string to UTF-8 and pass to write_binary().
> OutputBase implementations now need to provide a self.write_binary
> member which implements a write() method for binary data.
> (MailedOutput): email.Header package is gone, use email.header instead,
> and likewise replace use of email.Utils with email.utils
> (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
> We cannot use StringIO since diffs may contain data in arbitrary encodings.
> (StandardOutput): Provide self.write_binary in terms of stdout.buffer.
> (PipeOutput): Provide self.write_binary in terms of pipe.stdin.
> (Commit): Decode log message and paths from UTF-8 to string, and iterate
> path groups from mailer.conf in sorted order.
> (Lock): Decode directory entries from UTF-8 to string. Encode paths back
> to UTF-8 when we ask libsvn_fs for a lock on a path.
> Iterate path groups from mailer.conf in sorted order.
> (DiffGenerator): Decode repository paths from UTF-8 to string.
> (TextCommitRenderer): Decode author, log message, and path from UTF-8 to
> string. Write diff data via write_binary, bypassing the re-encoding step.
> (Config): Decode paths from UTF-8 to string before matching them against
> regular expressions. Also decode the repository directory path from UTF-8.
>
> * tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
> File groups are now provided in stable sorted order. This should fix
> spurious test failures in the future.
>
> * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
> integers and pass binary data instead of strings into libsvn_fs.
mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit:
r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py)
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> Log:
> Make mailer.py work properly with Python 3, and drop Python 2 support.
>
> Most of the changes deal with the handling binary data vs Python strings.
>
> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> Environments using other encodings may not work as expected, but those will
> be problematic for hook scripts in general.
Correct me if I'm wrong, but it sounds like you haven't ruled out the
possibility that this commit will constitute a regression for anyone
who runs mailer.py in a non-UTF-8 environment and will upgrade to this
commit.
I suppose it's fair to classify non-UTF-8 environments as "patches
welcome", following the precedent of libmagic support in the Windows
build, but:
1. Can we detect non-UTF-8 environments and warn or error out hard upon
them? «locale.getlocale()[1]» seems promising?
2. A change that hasn't been confirmed *not* to constitute a regression
merits a release notes entry. Would you do the honours?
Cheers,
Daniel
> SVN repositories store internal
> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> or decoding of such data, and thus need to work with raw UTF-8 strings, not
> Python strings.
>
> The encoding of file and property contents is not guaranteed to be UTF-8.
> This was already a problem before this change. This hook script sends email
> with a content type header specifying the UTF-8 encoding. Diffs which contain
> non-UTF-8 text will most likely not render properly when viewed in an email
> reader. At least this problem is now obvious in mailer.py's implementation,
> since all unidiff text is now written out directly as binary data.
>
> As an additional fix, iterate file groups in sorted order. This results in
> stable output and makes test cases in our tests/ subdirectory reproducible.
>
> Tested with Python 3.7.5 which is the version I use in my SVN development
> setup at present. Tests with newer versions are welcome.
>
> * tools/hook-scripts/mailer/mailer.py:
> Drop Python2-specific includes. Adjust includes as per 2to3.
> (main): Decode arguments from UTF-8 to string.
> (OutputBase:write): Encode string to UTF-8 and pass to write_binary().
> OutputBase implementations now need to provide a self.write_binary
> member which implements a write() method for binary data.
> (MailedOutput): email.Header package is gone, use email.header instead,
> and likewise replace use of email.Utils with email.utils
> (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
> We cannot use StringIO since diffs may contain data in arbitrary encodings.
> (StandardOutput): Provide self.write_binary in terms of stdout.buffer.
> (PipeOutput): Provide self.write_binary in terms of pipe.stdin.
> (Commit): Decode log message and paths from UTF-8 to string, and iterate
> path groups from mailer.conf in sorted order.
> (Lock): Decode directory entries from UTF-8 to string. Encode paths back
> to UTF-8 when we ask libsvn_fs for a lock on a path.
> Iterate path groups from mailer.conf in sorted order.
> (DiffGenerator): Decode repository paths from UTF-8 to string.
> (TextCommitRenderer): Decode author, log message, and path from UTF-8 to
> string. Write diff data via write_binary, bypassing the re-encoding step.
> (Config): Decode paths from UTF-8 to string before matching them against
> regular expressions. Also decode the repository directory path from UTF-8.
>
> * tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
> File groups are now provided in stable sorted order. This should fix
> spurious test failures in the future.
>
> * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
> integers and pass binary data instead of strings into libsvn_fs.
Re: svn commit: r1884427 - in
/subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output
tests/mailer-tweak.py
Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2021/01/15 7:30, Stefan Sperling wrote:
> On Fri, Jan 15, 2021 at 01:44:43AM +0900, Yasuhito FUTATSUKI wrote:
>> On 2020/12/29 0:49, Stefan Sperling wrote:
>>> On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
>>>> As far as I read the code again, it seems this is already support
>>>> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
>>>> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
>>>> already decoded in file system encoding (i.e. encoding corresponding
>>>> to LC_CTYPE) and Python encodes it if access to file system. On the
>>>> other hand, svn.repos.open() wrapper function accepts str and encode
>>>> it to UTF-8 string, as the svn_repos_open() C API expects.
>>>>
>>>> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
>>>> already str.
>>>>
>>>> Perhaps it needs (although the last hunk is not nessesary):
>>>
>>> Yes, indeed. If python already encodes arguments to strings then this
>>> patch looks correct to me. It also works as expected in my testing.
>> I revised the patch and checked in UTF-8 locale. Now it seems it works
>> with post-revprop-change, post-lock and post-unlock hooks.
>>
>> If this patch is OK, then I'll do:
>>
>> (1) Make it support Python 2 again, to merge it to 1.14.x.
>> (2) Fix an issue to encode Unicode paths in Subject header.
>> (3) Fix an issue of max line length in Subject header.
>>
>> Perhaps (2) and (3) will be resolved at the same time.
>
> Yes, in my opinion this patch is OK.
> The mailer.py test is still passing and your changes look fine to me.
>
> Thank you! My experience with python2/3 conversion is limited, and your
> help with this is much appreciated.
I committed it in r1885557. Perhaps If you did not commit r1884427,
I didn't work for it so quickly (yes, even this is quickly). Thank you.
Then I'll go ahead to resolve above issues.
Cheers,
--
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>
Re: svn commit: r1884427 - in
/subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py
Posted by Stefan Sperling <st...@elego.de>.
On Fri, Jan 15, 2021 at 01:44:43AM +0900, Yasuhito FUTATSUKI wrote:
> On 2020/12/29 0:49, Stefan Sperling wrote:
> > On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
> >> As far as I read the code again, it seems this is already support
> >> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
> >> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
> >> already decoded in file system encoding (i.e. encoding corresponding
> >> to LC_CTYPE) and Python encodes it if access to file system. On the
> >> other hand, svn.repos.open() wrapper function accepts str and encode
> >> it to UTF-8 string, as the svn_repos_open() C API expects.
> >>
> >> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
> >> already str.
> >>
> >> Perhaps it needs (although the last hunk is not nessesary):
> >
> > Yes, indeed. If python already encodes arguments to strings then this
> > patch looks correct to me. It also works as expected in my testing.
> I revised the patch and checked in UTF-8 locale. Now it seems it works
> with post-revprop-change, post-lock and post-unlock hooks.
>
> If this patch is OK, then I'll do:
>
> (1) Make it support Python 2 again, to merge it to 1.14.x.
> (2) Fix an issue to encode Unicode paths in Subject header.
> (3) Fix an issue of max line length in Subject header.
>
> Perhaps (2) and (3) will be resolved at the same time.
Yes, in my opinion this patch is OK.
The mailer.py test is still passing and your changes look fine to me.
Thank you! My experience with python2/3 conversion is limited, and your
help with this is much appreciated.
Regards,
Stefan
> Cheers,
> --
> Yasuhito FUTATSUKI <fu...@yf.bsclub.org>
> Follow up to r1884427: mailer.py: Additional fix for Python 3 support
>
> * tools/hook-scripts/mailer/mailer.py
> ():
> - Don't include StringIO.
> - import locale for output class 'StandardOutput'.
> (main): Don't decode cmd_args[x], because they are already str.
> (OutputBase.write_binary): New method, as the abstract base class of
> output classes.
> (OutputBase.write): Tweak comment. This method has implementation now.
> (OutputBase.run): Use self.write_binary() to write content of buf.
> (StandardOutput.write): Override default write method to use default
> encoding for console output.
> (PropChange.generate): Use sys.stdin.buffer instead of sys.stdin to read
> from stdin as bytes.
> (Lock.__init__):
> - Use sys.stdin.buffer instead of sys.stdin to read from stdin as bytes.
> - Strip new line character from items of self.dirlist
> (DiffGenerator.__getitem__): Pass change.path as is to svn.fs.FileDiff.
> (TextCommitRenderer.render): Don't decode data.author. data.author is
> already bytes (or None) here.
> (Repository.__init__): Ensure self.author is str or None.
> ((entry routine)): Encode repository path as UTF-8 before canonicalize
> explicitly.
>
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1885374)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -50,10 +50,11 @@
> from urllib.parse import quote as urllib_parse_quote
> import time
> import subprocess
> -from io import BytesIO, StringIO
> +from io import BytesIO
> import smtplib
> import re
> import tempfile
> +import locale
>
> # Minimal version of Subversion's bindings required
> _MIN_SVN_VERSION = [1, 5, 0]
> @@ -88,10 +89,10 @@
> messenger = Commit(pool, cfg, repos)
> elif cmd == 'propchange' or cmd == 'propchange2':
> revision = int(cmd_args[0])
> - author = cmd_args[1].decode('utf-8')
> - propname = cmd_args[2].decode('utf-8')
> + author = cmd_args[1]
> + propname = cmd_args[2]
> if cmd == 'propchange2' and cmd_args[3]:
> - action = cmd_args[3].decode('utf-8')
> + action = cmd_args[3]
> else:
> action = 'A'
> repos = Repository(repos_dir, revision, pool)
> @@ -103,7 +104,7 @@
> })
> messenger = PropChange(pool, cfg, repos, author, propname, action)
> elif cmd == 'lock' or cmd == 'unlock':
> - author = cmd_args[0].decode('utf-8')
> + author = cmd_args[0]
> repos = Repository(repos_dir, 0, pool) ### any old revision will do
> # Override the repos revision author with the author of the lock/unlock
> repos.author = author
> @@ -169,9 +170,13 @@
> representation."""
> raise NotImplementedError
>
> + def write_binary(self, output):
> + """Override this method.
> + Append the binary data OUTPUT to the output representation."""
> + raise NotImplementedError
> +
> def write(self, output):
> - """Override this method.
> - Append the literal text string OUTPUT to the output representation."""
> + """Append the literal text string OUTPUT to the output representation."""
> return self.write_binary(output.encode('utf-8'))
>
> def run(self, cmd):
> @@ -184,7 +189,7 @@
>
> buf = pipe_ob.stdout.read(self._CHUNKSIZE)
> while buf:
> - self.write(buf)
> + self.write_binary(buf)
> buf = pipe_ob.stdout.read(self._CHUNKSIZE)
>
> # wait on the child so we don't end up with a billion zombies
> @@ -360,7 +365,12 @@
> def finish(self):
> pass
>
> + def write(self, output):
> + """Write text as *default* encoding string"""
> + return self.write_binary(output.encode(locale.getpreferredencoding(),
> + 'backslashreplace'))
>
> +
> class PipeOutput(MailedOutput):
> "Deliver a mail message to an MTA via a pipe."
>
> @@ -532,7 +542,7 @@
> elif self.action == 'M':
> self.output.write('Property diff:\n')
> tempfile1 = tempfile.NamedTemporaryFile()
> - tempfile1.write(sys.stdin.read())
> + tempfile1.write(sys.stdin.buffer.read())
> tempfile1.flush()
> tempfile2 = tempfile.NamedTemporaryFile()
> tempfile2.write(self.repos.get_rev_prop(self.propname))
> @@ -594,9 +604,8 @@
> or 'unlock_subject_prefix'))
>
> # read all the locked paths from STDIN and strip off the trailing newlines
> - self.dirlist = [x for x in sys.stdin.readlines()]
> - for i in range(len(self.dirlist)):
> - self.dirlist[i] = self.dirlist[i].decode('utf-8')
> + self.dirlist = [x.decode('utf-8').rstrip()
> + for x in sys.stdin.buffer.readlines()]
>
> # collect the set of groups and the unique sets of params for the options
> self.groups = { }
> @@ -910,7 +919,7 @@
> kind = 'C'
> if self.diffsels.copy:
> diff = svn.fs.FileDiff(None, None, self.repos.root_this,
> - change.path.decode('utf-8'), self.pool)
> + change.path, self.pool)
> label1 = '/dev/null\t00:00:00 1970\t' \
> '(empty, because file is newly added)'
> label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
> @@ -1092,7 +1101,7 @@
>
> w = self.output.write
>
> - w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author.decode('utf-8'),
> + w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author,
> data.date,
> data.rev))
>
> @@ -1221,6 +1230,8 @@
> self.root_this = self.get_root(rev)
>
> self.author = self.get_rev_prop(svn.core.SVN_PROP_REVISION_AUTHOR)
> + if self.author is not None:
> + self.author = self.author.decode('utf-8')
>
> def get_rev_prop(self, propname, rev = None):
> if not rev:
> @@ -1510,7 +1521,7 @@
> usage()
>
> cmd = sys.argv[1]
> - repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
> + repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
> repos_dir = repos_dir.decode('utf-8')
> try:
> expected_args = cmd_list[cmd]
Re: svn commit: r1884427 - in
/subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output
tests/mailer-tweak.py
Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/12/29 0:49, Stefan Sperling wrote:
> On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
>> As far as I read the code again, it seems this is already support
>> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
>> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
>> already decoded in file system encoding (i.e. encoding corresponding
>> to LC_CTYPE) and Python encodes it if access to file system. On the
>> other hand, svn.repos.open() wrapper function accepts str and encode
>> it to UTF-8 string, as the svn_repos_open() C API expects.
>>
>> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
>> already str.
>>
>> Perhaps it needs (although the last hunk is not nessesary):
>
> Yes, indeed. If python already encodes arguments to strings then this
> patch looks correct to me. It also works as expected in my testing.
I revised the patch and checked in UTF-8 locale. Now it seems it works
with post-revprop-change, post-lock and post-unlock hooks.
If this patch is OK, then I'll do:
(1) Make it support Python 2 again, to merge it to 1.14.x.
(2) Fix an issue to encode Unicode paths in Subject header.
(3) Fix an issue of max line length in Subject header.
Perhaps (2) and (3) will be resolved at the same time.
Cheers,
--
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>
Re: svn commit: r1884427 - in
/subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py
Posted by Stefan Sperling <st...@elego.de>.
On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
> As far as I read the code again, it seems this is already support
> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
> already decoded in file system encoding (i.e. encoding corresponding
> to LC_CTYPE) and Python encodes it if access to file system. On the
> other hand, svn.repos.open() wrapper function accepts str and encode
> it to UTF-8 string, as the svn_repos_open() C API expects.
>
> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
> already str.
>
> Perhaps it needs (although the last hunk is not nessesary):
Yes, indeed. If python already encodes arguments to strings then this
patch looks correct to me. It also works as expected in my testing.
> [[[
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1884434)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -88,10 +88,10 @@
> messenger = Commit(pool, cfg, repos)
> elif cmd == 'propchange' or cmd == 'propchange2':
> revision = int(cmd_args[0])
> - author = cmd_args[1].decode('utf-8')
> - propname = cmd_args[2].decode('utf-8')
> + author = cmd_args[1]
> + propname = cmd_args[2]
> if cmd == 'propchange2' and cmd_args[3]:
> - action = cmd_args[3].decode('utf-8')
> + action = cmd_args[3]
> else:
> action = 'A'
> repos = Repository(repos_dir, revision, pool)
> @@ -103,7 +103,7 @@
> })
> messenger = PropChange(pool, cfg, repos, author, propname, action)
> elif cmd == 'lock' or cmd == 'unlock':
> - author = cmd_args[0].decode('utf-8')
> + author = cmd_args[0]
> repos = Repository(repos_dir, 0, pool) ### any old revision will do
> # Override the repos revision author with the author of the lock/unlock
> repos.author = author
> @@ -1510,7 +1510,7 @@
> usage()
>
> cmd = sys.argv[1]
> - repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
> + repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
> repos_dir = repos_dir.decode('utf-8')
> try:
> expected_args = cmd_list[cmd]
> ]]]
Re: svn commit: r1884427 - in
/subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output
tests/mailer-tweak.py
Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/12/15 5:06, Stefan Sperling wrote:
> On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
>> First of all, thank you for doing this. I worried that mailer.py didn't
>> support Python 3 for a long time.
>>
>> In message <20...@svn01-us-east.apache.org>, stsp@apache.o
>> rg writes:
>>> Author: stsp
>>> Date: Mon Dec 14 16:57:10 2020
>>> New Revision: 1884427
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>>> Log:
>>> Make mailer.py work properly with Python 3, and drop Python 2 support.
>>>
>>> Most of the changes deal with the handling binary data vs Python strings.
>>>
>>> I've made sure that mailer.py will work in a UTF-8 environment. In general,
>>> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>>> Environments using other encodings may not work as expected, but those will
>>> be problematic for hook scripts in general.
>>
>> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
>> REPOS-PATH argument. Other paths in the script are all Subversion's
>> internal paths.
>
> Yes, I agree that the REPOS-PATH argument is an exceptional case.
> Unforunately, we have to decode repos_dir because it is used with the
> os.path.join() function.
As far as I read the code again, it seems this is already support
locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
already decoded in file system encoding (i.e. encoding corresponding
to LC_CTYPE) and Python encodes it if access to file system. On the
other hand, svn.repos.open() wrapper function accepts str and encode
it to UTF-8 string, as the svn_repos_open() C API expects.
Then I overlooked sys.argv[x] and cmd_args[x] in main() are
already str.
Perhaps it needs (although the last hunk is not nessesary):
[[[
Index: tools/hook-scripts/mailer/mailer.py
===================================================================
--- tools/hook-scripts/mailer/mailer.py (revision 1884434)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -88,10 +88,10 @@
messenger = Commit(pool, cfg, repos)
elif cmd == 'propchange' or cmd == 'propchange2':
revision = int(cmd_args[0])
- author = cmd_args[1].decode('utf-8')
- propname = cmd_args[2].decode('utf-8')
+ author = cmd_args[1]
+ propname = cmd_args[2]
if cmd == 'propchange2' and cmd_args[3]:
- action = cmd_args[3].decode('utf-8')
+ action = cmd_args[3]
else:
action = 'A'
repos = Repository(repos_dir, revision, pool)
@@ -103,7 +103,7 @@
})
messenger = PropChange(pool, cfg, repos, author, propname, action)
elif cmd == 'lock' or cmd == 'unlock':
- author = cmd_args[0].decode('utf-8')
+ author = cmd_args[0]
repos = Repository(repos_dir, 0, pool) ### any old revision will do
# Override the repos revision author with the author of the lock/unlock
repos.author = author
@@ -1510,7 +1510,7 @@
usage()
cmd = sys.argv[1]
- repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
+ repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
repos_dir = repos_dir.decode('utf-8')
try:
expected_args = cmd_list[cmd]
]]]
(Also, I found svn.repos.open() and svn.core.svn_path_canonicalize() already
accepted str in commited code :))
> There is no way of knowing which character encoding is used for paths.
> At least on UNIX a filename is just a string of bytes and it is independent
> of the locale.
>
> Note that hook scripts will run in an empty environment by default.
> Which means this script will run in the "C" locale by default.
> If another locale is desired, the administrator needs to configure the
> server as documented here:
> https://subversion.apache.org/docs/release-notes/1.8.html#hooks-env
>
> My assumption is that for hook scripts like this one, the server will be
> configured with SVNUseUTF8 and the hook script environment will contain
> something like LANG=C.UTF-8.
>
> If svnadmin is then also used in the UTF-8 locale to create repositories,
> the hook script should work fine.
I assumed that hook scripts kicked by file:// or svn+ssh:// scheme.
I think even if the repos path is neither UTF-8 nor ascii but appropriate
encoding string, hook scripts themselves know it and can invoke mailer.py
script with appropriate LC_CTYPE by using env(1) utility or directly setting
environment variable within hook scripts themselves.
Of course, we can easily move or make symlink of repository path from
non UTF-8 path to UTF-8 or ascii path, so I don't insist that we should
support non UTF-8 path.
>>> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
>>> or decoding of such data, and thus need to work with raw UTF-8 strings, not
>>> Python strings.
>>
>> I have no objection the code itself, however it is a bit incorrect.
>>
>> Our Python3 bindings accept str objects as inputs for char *. ^ as well as bytes objects
If str objects are passed, v
>> It always
>> convert them to UTF-8 C strings on all APIs without regarding preferred
>> encoding or file system encoding on Python. As some of APIs accept
>> local paths and they should be encoded as locale's charset/encoding,
>> I also prefer to encode to bytes on caller side explicitly before call
>> APIs, to avoid making bugs. Of course, return values of wrapper APIs
>> corresponding char ** args and return value of C API are not decoded,
>> raw bytes objects.
>
> Thank you, I did not know this. The patch below undoes changes which I
> made in order to encode input to SVN APIs as UTF-8.
>
> The mailer.py test is still passing fine with this change.
Yes, however as I wrote above I prefer not to revert them.
I also found some bytes/str mixture still left around return values
for each call of Repository.get_rev_prop(), however I just found,
but not tested. So I continue to look over the code and I'll try
to fix the problems if they still exist.
Cheers,
--
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>
Re: svn commit: r1884427 - in
/subversion/trunk/tools/hook-scripts/mailer: mailer.py
tests/mailer-t1.output tests/mailer-tweak.py
Posted by Stefan Sperling <st...@elego.de>.
On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
> First of all, thank you for doing this. I worried that mailer.py didn't
> support Python 3 for a long time.
>
> In message <20...@svn01-us-east.apache.org>, stsp@apache.o
> rg writes:
> >Author: stsp
> >Date: Mon Dec 14 16:57:10 2020
> >New Revision: 1884427
> >
> >URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> >Log:
> >Make mailer.py work properly with Python 3, and drop Python 2 support.
> >
> >Most of the changes deal with the handling binary data vs Python strings.
> >
> >I've made sure that mailer.py will work in a UTF-8 environment. In general,
> >UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> >Environments using other encodings may not work as expected, but those will
> >be problematic for hook scripts in general.
>
> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
> REPOS-PATH argument. Other paths in the script are all Subversion's
> internal paths.
Yes, I agree that the REPOS-PATH argument is an exceptional case.
Unforunately, we have to decode repos_dir because it is used with the
os.path.join() function.
There is no way of knowing which character encoding is used for paths.
At least on UNIX a filename is just a string of bytes and it is independent
of the locale.
Note that hook scripts will run in an empty environment by default.
Which means this script will run in the "C" locale by default.
If another locale is desired, the administrator needs to configure the
server as documented here:
https://subversion.apache.org/docs/release-notes/1.8.html#hooks-env
My assumption is that for hook scripts like this one, the server will be
configured with SVNUseUTF8 and the hook script environment will contain
something like LANG=C.UTF-8.
If svnadmin is then also used in the UTF-8 locale to create repositories,
the hook script should work fine.
> >data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> >or decoding of such data, and thus need to work with raw UTF-8 strings, not
> >Python strings.
>
> I have no objection the code itself, however it is a bit incorrect.
>
> Our Python3 bindings accept str objects as inputs for char *. It always
> convert them to UTF-8 C strings on all APIs without regarding preferred
> encoding or file system encoding on Python. As some of APIs accept
> local paths and they should be encoded as locale's charset/encoding,
> I also prefer to encode to bytes on caller side explicitly before call
> APIs, to avoid making bugs. Of course, return values of wrapper APIs
> corresponding char ** args and return value of C API are not decoded,
> raw bytes objects.
Thank you, I did not know this. The patch below undoes changes which I
made in order to encode input to SVN APIs as UTF-8.
The mailer.py test is still passing fine with this change.
Index: tools/hook-scripts/mailer/mailer.py
===================================================================
--- tools/hook-scripts/mailer/mailer.py (revision 1884427)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -624,8 +624,7 @@ class Lock(Messenger):
# The lock comment is the same for all paths, so we can just pull
# the comment for the first path in the dirlist and cache it.
- self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr,
- self.dirlist[0].encode('utf-8'),
+ self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr, self.dirlist[0],
self.pool)
def generate(self):
@@ -910,7 +909,7 @@ class DiffGenerator:
kind = 'C'
if self.diffsels.copy:
diff = svn.fs.FileDiff(None, None, self.repos.root_this,
- change.path.decode('utf-8'), self.pool)
+ change.path, self.pool)
label1 = '/dev/null\t00:00:00 1970\t' \
'(empty, because file is newly added)'
label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
Index: tools/hook-scripts/mailer/tests/mailer-tweak.py
===================================================================
--- tools/hook-scripts/mailer/tests/mailer-tweak.py (revision 1884427)
+++ tools/hook-scripts/mailer/tests/mailer-tweak.py (working copy)
@@ -53,7 +53,7 @@ def tweak_dates(pool, home='.'):
date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000, pool)
#print date
fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_DATE, date, pool)
- fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, b'mailer test', pool)
+ fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, 'mailer test', pool)
def main():
if len(sys.argv) != 2: