You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2020/12/14 16:57:10 UTC

svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Author: stsp
Date: Mon Dec 14 16:57:10 2020
New Revision: 1884427

URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
Log:
Make mailer.py work properly with Python 3, and drop Python 2 support.

Most of the changes deal with the handling binary data vs Python strings.

I've made sure that mailer.py will work in a UTF-8 environment. In general,
UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
Environments using other encodings may not work as expected, but those will
be problematic for hook scripts in general. SVN repositories store internal
data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
or decoding of such data, and thus need to work with raw UTF-8 strings, not
Python strings.

The encoding of file and property contents is not guaranteed to be UTF-8.
This was already a problem before this change. This hook script sends email
with a content type header specifying the UTF-8 encoding. Diffs which contain
non-UTF-8 text will most likely not render properly when viewed in an email
reader. At least this problem is now obvious in mailer.py's implementation,
since all unidiff text is now written out directly as binary data.

As an additional fix, iterate file groups in sorted order. This results in
stable output and makes test cases in our tests/ subdirectory reproducible.

Tested with Python 3.7.5 which is the version I use in my SVN development
setup at present. Tests with newer versions are welcome.

* tools/hook-scripts/mailer/mailer.py:
  Drop Python2-specific includes. Adjust includes as per 2to3.
  (main): Decode arguments from UTF-8 to string.
  (OutputBase:write): Encode string to UTF-8 and pass to write_binary().
   OutputBase implementations now need to provide a self.write_binary
   member which implements a write() method for binary data.
  (MailedOutput): email.Header package is gone, use email.header instead,
   and likewise replace use of email.Utils with email.utils
  (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
   We cannot use StringIO since diffs may contain data in arbitrary encodings.
  (StandardOutput): Provide self.write_binary in terms of stdout.buffer.
  (PipeOutput): Provide self.write_binary in terms of pipe.stdin.
  (Commit): Decode log message and paths from UTF-8 to string, and iterate
    path groups from mailer.conf in sorted order.
  (Lock): Decode directory entries from UTF-8 to string. Encode paths back
   to UTF-8 when we ask libsvn_fs for a lock on a path.
    Iterate path groups from mailer.conf in sorted order.
  (DiffGenerator): Decode repository paths from UTF-8 to string.
  (TextCommitRenderer): Decode author, log message, and path from UTF-8 to
    string. Write diff data via write_binary, bypassing the re-encoding step.
  (Config): Decode paths from UTF-8 to string before matching them against
   regular expressions. Also decode the repository directory path from UTF-8.

* tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
   File groups are now provided in stable sorted order. This should fix
   spurious test failures in the future.

 * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
    integers and pass binary data instead of strings into libsvn_fs.

Modified:
    subversion/trunk/tools/hook-scripts/mailer/mailer.py
    subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output
    subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py

Modified: subversion/trunk/tools/hook-scripts/mailer/mailer.py
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/mailer/mailer.py?rev=1884427&r1=1884426&r2=1884427&view=diff
==============================================================================
--- subversion/trunk/tools/hook-scripts/mailer/mailer.py (original)
+++ subversion/trunk/tools/hook-scripts/mailer/mailer.py Mon Dec 14 16:57:10 2020
@@ -46,22 +46,11 @@
 
 import os
 import sys
-try:
-  # Python >=3.0
-  import configparser
-  from urllib.parse import quote as urllib_parse_quote
-except ImportError:
-  # Python <3.0
-  import ConfigParser as configparser
-  from urllib import quote as urllib_parse_quote
+import configparser
+from urllib.parse import quote as urllib_parse_quote
 import time
 import subprocess
-if sys.version_info[0] >= 3:
-  # Python >=3.0
-  from io import StringIO
-else:
-  # Python <3.0
-  from cStringIO import StringIO
+from io import BytesIO, StringIO
 import smtplib
 import re
 import tempfile
@@ -99,9 +88,12 @@ def main(pool, cmd, config_fname, repos_
     messenger = Commit(pool, cfg, repos)
   elif cmd == 'propchange' or cmd == 'propchange2':
     revision = int(cmd_args[0])
-    author = cmd_args[1]
-    propname = cmd_args[2]
-    action = (cmd == 'propchange2' and cmd_args[3] or 'A')
+    author = cmd_args[1].decode('utf-8')
+    propname = cmd_args[2].decode('utf-8')
+    if cmd == 'propchange2' and cmd_args[3]:
+      action = cmd_args[3].decode('utf-8')
+    else:
+      action = 'A'
     repos = Repository(repos_dir, revision, pool)
     # Override the repos revision author with the author of the propchange
     repos.author = author
@@ -111,7 +103,7 @@ def main(pool, cmd, config_fname, repos_
                  })
     messenger = PropChange(pool, cfg, repos, author, propname, action)
   elif cmd == 'lock' or cmd == 'unlock':
-    author = cmd_args[0]
+    author = cmd_args[0].decode('utf-8')
     repos = Repository(repos_dir, 0, pool) ### any old revision will do
     # Override the repos revision author with the author of the lock/unlock
     repos.author = author
@@ -180,7 +172,7 @@ class OutputBase:
   def write(self, output):
     """Override this method.
     Append the literal text string OUTPUT to the output representation."""
-    raise NotImplementedError
+    return self.write_binary(output.encode('utf-8'))
 
   def run(self, cmd):
     """Override this method, if the default implementation is not sufficient.
@@ -234,7 +226,7 @@ class MailedOutput(OutputBase):
     # Return the result of splitting HDR into tokens (on space
     # characters), encoding (per RFC2047) each token as necessary, and
     # slapping 'em back to together again.
-    from email.Header import Header
+    from email.header import Header
 
     def _maybe_encode_header(hdr_token):
       try:
@@ -246,7 +238,7 @@ class MailedOutput(OutputBase):
     return ' '.join(map(_maybe_encode_header, hdr.split()))
 
   def mail_headers(self, group, params):
-    from email import Utils
+    from email import utils
 
     subject  = self._rfc2047_encode(self.make_subject(group, params))
     from_hdr = self._rfc2047_encode(self.from_addr)
@@ -265,7 +257,7 @@ class MailedOutput(OutputBase):
            'X-Svn-Commit-Revision: %d\n' \
            'X-Svn-Commit-Repository: %s\n' \
            % (from_hdr, to_hdr, subject,
-              Utils.formatdate(), Utils.make_msgid(), group,
+              utils.formatdate(), utils.make_msgid(), group,
               self.repos.author or 'no_author', self.repos.rev,
               os.path.basename(self.repos.repos_dir))
     if self.reply_to:
@@ -279,8 +271,8 @@ class SMTPOutput(MailedOutput):
   def start(self, group, params):
     MailedOutput.start(self, group, params)
 
-    self.buffer = StringIO()
-    self.write = self.buffer.write
+    self.buffer = BytesIO()
+    self.write_binary = self.buffer.write
 
     self.write(self.mail_headers(group, params))
 
@@ -359,7 +351,7 @@ class StandardOutput(OutputBase):
 
   def __init__(self, cfg, repos, prefix_param):
     OutputBase.__init__(self, cfg, repos, prefix_param)
-    self.write = sys.stdout.write
+    self.write_binary = sys.stdout.buffer.write
 
   def start(self, group, params):
     self.write("Group: " + (group or "defaults") + "\n")
@@ -388,7 +380,7 @@ class PipeOutput(MailedOutput):
     # construct the pipe for talking to the mailer
     self.pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
                                  close_fds=sys.platform != "win32")
-    self.write = self.pipe.stdin.write
+    self.write_binary = self.pipe.stdin.write
 
     # start writing out the mail message
     self.write(self.mail_headers(group, params))
@@ -430,6 +422,7 @@ class Commit(Messenger):
     self.changelist = sorted(editor.get_changes().items())
 
     log = repos.get_rev_prop(svn.core.SVN_PROP_REVISION_LOG) or ''
+    log = log.decode('utf-8')
 
     # collect the set of groups and the unique sets of params for the options
     self.groups = { }
@@ -448,6 +441,7 @@ class Commit(Messenger):
     # figure out the changed directories
     dirs = { }
     for path, change in self.changelist:
+      path = path.decode('utf-8')
       if change.item_kind == svn.core.svn_node_dir:
         dirs[path] = None
       else:
@@ -484,7 +478,7 @@ class Commit(Messenger):
     # build a renderer, tied to our output stream
     renderer = TextCommitRenderer(self.output)
 
-    for (group, param_tuple), (params, paths) in self.groups.items():
+    for (group, param_tuple), (params, paths) in sorted(self.groups.items()):
       try:
         self.output.start(group, params)
 
@@ -600,7 +594,9 @@ class Lock(Messenger):
                         or 'unlock_subject_prefix'))
 
     # read all the locked paths from STDIN and strip off the trailing newlines
-    self.dirlist = [x.rstrip() for x in sys.stdin.readlines()]
+    self.dirlist = [x for x in sys.stdin.readlines()]
+    for i in range(len(self.dirlist)):
+      self.dirlist[i] = self.dirlist[i].decode('utf-8')
 
     # collect the set of groups and the unique sets of params for the options
     self.groups = { }
@@ -629,11 +625,12 @@ class Lock(Messenger):
     # The lock comment is the same for all paths, so we can just pull
     # the comment for the first path in the dirlist and cache it.
     self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr,
-                                       self.dirlist[0], self.pool)
+                                       self.dirlist[0].encode('utf-8'),
+                                       self.pool)
 
   def generate(self):
     ret = 0
-    for (group, param_tuple), (params, paths) in self.groups.items():
+    for (group, param_tuple), (params, paths) in sorted(self.groups.items()):
       try:
         self.output.start(group, params)
 
@@ -879,7 +876,8 @@ class DiffGenerator:
           diff = svn.fs.FileDiff(self.repos.get_root(change.base_rev),
                                  base_path, None, None, self.pool)
 
-          label1 = '%s\t%s\t(r%s)' % (base_path, self.date, change.base_rev)
+          label1 = '%s\t%s\t(r%s)' % (base_path.decode('utf-8'), self.date,
+              change.base_rev)
           label2 = '/dev/null\t00:00:00 1970\t(deleted)'
           singular = True
 
@@ -902,21 +900,23 @@ class DiffGenerator:
                                      self.repos.root_this, change.path,
                                      self.pool)
               label1 = '%s\t%s\t(r%s, copy source)' \
-                       % (base_path, base_date, change.base_rev)
+                       % (base_path.decode('utf-8'), base_date, change.base_rev)
               label2 = '%s\t%s\t(r%s)' \
-                       % (change.path, self.date, self.repos.rev)
+                       % (change.path.decode('utf-8'), self.date, \
+                          self.repos.rev)
               singular = False
           else:
             # this file was copied.
             kind = 'C'
             if self.diffsels.copy:
               diff = svn.fs.FileDiff(None, None, self.repos.root_this,
-                                     change.path, self.pool)
+                                     change.path.decode('utf-8'), self.pool)
               label1 = '/dev/null\t00:00:00 1970\t' \
                        '(empty, because file is newly added)'
               label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
-                       % (change.path, self.date, self.repos.rev, \
-                          change.base_rev, base_path)
+                       % (change.path.decode('utf-8'), self.date,
+                          self.repos.rev, change.base_rev,
+                          base_path.decode('utf-8'))
               singular = False
         else:
           # the file was added.
@@ -932,7 +932,7 @@ class DiffGenerator:
             label1 = '/dev/null\t00:00:00 1970\t' \
                      '(empty, because file is newly added)'
             label2 = '%s\t%s\t(r%s)' \
-                     % (change.path, self.date, self.repos.rev)
+                     % (change.path.decode('utf-8'), self.date, self.repos.rev)
             singular = True
 
       elif not change.text_changed:
@@ -952,9 +952,9 @@ class DiffGenerator:
                                  self.repos.root_this, change.path,
                                  self.pool)
           label1 = '%s\t%s\t(r%s)' \
-                   % (base_path, base_date, change.base_rev)
+                   % (base_path.decode('utf-8'), base_date, change.base_rev)
           label2 = '%s\t%s\t(r%s)' \
-                   % (change.path, self.date, self.repos.rev)
+                   % (change.path.decode('utf-8'), self.date, self.repos.rev)
           singular = False
 
       if diff:
@@ -1092,7 +1092,7 @@ class TextCommitRenderer:
 
     w = self.output.write
 
-    w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author,
+    w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author.decode('utf-8'),
                                                       data.date,
                                                       data.rev))
 
@@ -1101,7 +1101,7 @@ class TextCommitRenderer:
     else:
       w('\n')
 
-    w('Log:\n%s\n\n' % data.log.strip())
+    w('Log:\n%s\n\n' % data.log.strip().decode('utf-8'))
 
     # print summary sections
     self._render_list('Added', data.added_data)
@@ -1144,7 +1144,7 @@ class TextCommitRenderer:
           props = '   (props changed)'
       else:
         props = ''
-      w('   %s%s%s\n' % (d.path, is_dir, props))
+      w('   %s%s%s\n' % (d.path.decode('utf-8'), is_dir, props))
       if d.copied:
         if is_dir:
           text = ''
@@ -1153,7 +1153,7 @@ class TextCommitRenderer:
         else:
           text = ' unchanged'
         w('      - copied%s from r%d, %s%s\n'
-          % (text, d.base_rev, d.base_path, is_dir))
+          % (text, d.base_rev, d.base_path.decode('utf-8'), is_dir))
 
   def _render_diffs(self, diffs, section_header):
     """Render diffs. Write the SECTION_HEADER if there are actually
@@ -1170,18 +1170,20 @@ class TextCommitRenderer:
         w(section_header)
         section_header_printed = True
       if diff.kind == 'D':
-        w('\nDeleted: %s\n' % diff.base_path)
+        w('\nDeleted: %s\n' % diff.base_path.decode('utf-8'))
       elif diff.kind == 'A':
-        w('\nAdded: %s\n' % diff.path)
+        w('\nAdded: %s\n' % diff.path.decode('utf-8'))
       elif diff.kind == 'C':
         w('\nCopied: %s (from r%d, %s)\n'
-          % (diff.path, diff.base_rev, diff.base_path))
+          % (diff.path.decode('utf-8'), diff.base_rev,
+             diff.base_path.decode('utf-8')))
       elif diff.kind == 'W':
         w('\nCopied and modified: %s (from r%d, %s)\n'
-          % (diff.path, diff.base_rev, diff.base_path))
+          % (diff.path.decode('utf-8'), diff.base_rev,
+             diff.base_path.decode('utf-8')))
       else:
         # kind == 'M'
-        w('\nModified: %s\n' % diff.path)
+        w('\nModified: %s\n' % diff.path.decode('utf-8'))
 
       if diff.diff_url:
         w('URL: %s\n' % diff.diff_url)
@@ -1198,8 +1200,9 @@ class TextCommitRenderer:
           w('Binary file (source and/or target). No diff available.\n')
         continue
 
+      wb = self.output.write_binary
       for line in diff.content:
-        w(line.raw)
+        wb(line.raw)
 
 
 class Repository:
@@ -1426,9 +1429,9 @@ class Config:
     "Return the path's associated groups."
     groups = []
     for group, pattern, exclude_pattern, repos_params, search_logmsg_re in self._group_re:
-      match = pattern.match(path)
+      match = pattern.match(path.decode('utf-8'))
       if match:
-        if exclude_pattern and exclude_pattern.match(path):
+        if exclude_pattern and exclude_pattern.match(path.decode('utf-8')):
           continue
         params = repos_params.copy()
         params.update(match.groupdict())
@@ -1508,6 +1511,7 @@ if the property was added, modified or d
 
   cmd = sys.argv[1]
   repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
+  repos_dir = repos_dir.decode('utf-8')
   try:
     expected_args = cmd_list[cmd]
   except KeyError:

Modified: subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output?rev=1884427&r1=1884426&r2=1884427&view=diff
==============================================================================
--- subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output (original)
+++ subversion/trunk/tools/hook-scripts/mailer/tests/mailer-t1.output Mon Dec 14 16:57:10 2020
@@ -1,4 +1,4 @@
-Group: file
+Group: All
 Subject: r1 -  dir1 dir2
 
 Author: mailer test
@@ -9,9 +9,43 @@ Log:
 initial load
 
 Added:
+   dir1/
+   dir1/file3
+   dir1/file4
+   dir2/
+   dir2/file5
+   dir2/file6
    file1
    file2
 
+Added: dir1/file3
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ dir1/file3	Sun Sep  9 01:46:40 2001	(r1)
+@@ -0,0 +1 @@
++file3
+
+Added: dir1/file4
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ dir1/file4	Sun Sep  9 01:46:40 2001	(r1)
+@@ -0,0 +1 @@
++file4
+
+Added: dir2/file5
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ dir2/file5	Sun Sep  9 01:46:40 2001	(r1)
+@@ -0,0 +1 @@
++file5
+
+Added: dir2/file6
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ dir2/file6	Sun Sep  9 01:46:40 2001	(r1)
+@@ -0,0 +1 @@
++file6
+
 Added: file1
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
@@ -25,7 +59,7 @@ Added: file2
 +++ file2	Sun Sep  9 01:46:40 2001	(r1)
 @@ -0,0 +1 @@
 +file2
-Group: file plus other areas
+Group: file
 Subject: r1 -  dir1 dir2
 
 Author: mailer test
@@ -39,15 +73,6 @@ Added:
    file1
    file2
 
-Changes in other areas also in this revision:
-Added:
-   dir1/
-   dir1/file3
-   dir1/file4
-   dir2/
-   dir2/file5
-   dir2/file6
-
 Added: file1
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
@@ -61,37 +86,7 @@ Added: file2
 +++ file2	Sun Sep  9 01:46:40 2001	(r1)
 @@ -0,0 +1 @@
 +file2
-
-Diffs of changes in other areas also in this revision:
-
-Added: dir1/file3
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir1/file3	Sun Sep  9 01:46:40 2001	(r1)
-@@ -0,0 +1 @@
-+file3
-
-Added: dir1/file4
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir1/file4	Sun Sep  9 01:46:40 2001	(r1)
-@@ -0,0 +1 @@
-+file4
-
-Added: dir2/file5
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir2/file5	Sun Sep  9 01:46:40 2001	(r1)
-@@ -0,0 +1 @@
-+file5
-
-Added: dir2/file6
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir2/file6	Sun Sep  9 01:46:40 2001	(r1)
-@@ -0,0 +1 @@
-+file6
-Group: All
+Group: file plus other areas
 Subject: r1 -  dir1 dir2
 
 Author: mailer test
@@ -102,14 +97,33 @@ Log:
 initial load
 
 Added:
+   file1
+   file2
+
+Changes in other areas also in this revision:
+Added:
    dir1/
    dir1/file3
    dir1/file4
    dir2/
    dir2/file5
    dir2/file6
-   file1
-   file2
+
+Added: file1
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ file1	Sun Sep  9 01:46:40 2001	(r1)
+@@ -0,0 +1 @@
++file1
+
+Added: file2
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ file2	Sun Sep  9 01:46:40 2001	(r1)
+@@ -0,0 +1 @@
++file2
+
+Diffs of changes in other areas also in this revision:
 
 Added: dir1/file3
 ==============================================================================
@@ -138,21 +152,7 @@ Added: dir2/file6
 +++ dir2/file6	Sun Sep  9 01:46:40 2001	(r1)
 @@ -0,0 +1 @@
 +file6
-
-Added: file1
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ file1	Sun Sep  9 01:46:40 2001	(r1)
-@@ -0,0 +1 @@
-+file1
-
-Added: file2
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ file2	Sun Sep  9 01:46:40 2001	(r1)
-@@ -0,0 +1 @@
-+file2
-Group: file
+Group: All
 Subject: r2 -  dir1 dir2
 
 Author: mailer test
@@ -163,9 +163,19 @@ Log:
 two file changes.  Fixes Blah#123
 
 Modified:
+   dir1/   (props changed)
+   dir2/file5
    file1   (props changed)
    file2   (contents, props changed)
 
+Modified: dir2/file5
+==============================================================================
+--- dir2/file5	Sun Sep  9 01:46:40 2001	(r1)
++++ dir2/file5	Sun Sep  9 04:33:20 2001	(r2)
+@@ -1 +1,2 @@
+ file5
++change C2
+
 Modified: file2
 ==============================================================================
 --- file2	Sun Sep  9 01:46:40 2001	(r1)
@@ -204,7 +214,7 @@ Modified: file2
 @@ -1 +1,2 @@
  file2
 +change C1
-Group: All
+Group: file
 Subject: r2 -  dir1 dir2
 
 Author: mailer test
@@ -215,19 +225,9 @@ Log:
 two file changes.  Fixes Blah#123
 
 Modified:
-   dir1/   (props changed)
-   dir2/file5
    file1   (props changed)
    file2   (contents, props changed)
 
-Modified: dir2/file5
-==============================================================================
---- dir2/file5	Sun Sep  9 01:46:40 2001	(r1)
-+++ dir2/file5	Sun Sep  9 04:33:20 2001	(r2)
-@@ -1 +1,2 @@
- file5
-+change C2
-
 Modified: file2
 ==============================================================================
 --- file2	Sun Sep  9 01:46:40 2001	(r1)
@@ -283,16 +283,35 @@ two copies
 
 Added:
    dir2/file7
-      - copied unchanged from r2, file1
+      - copied unchanged from r2, /file1
    dir3/   (props changed)
-      - copied from r2, dir1/
+      - copied from r2, /dir1/
+Replaced:
+   dir3/file3
+      - copied unchanged from r1, /dir1/file3
+   dir3/file4
+      - copied unchanged from r1, /dir1/file4
 
-Copied: dir2/file7 (from r2, file1)
+Copied: dir2/file7 (from r2, /file1)
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir2/file7	Sun Sep  9 07:20:00 2001	(r3, copy of r2, file1)
++++ dir2/file7	Sun Sep  9 07:20:00 2001	(r3, copy of r2, /file1)
 @@ -0,0 +1 @@
 +file1
+
+Copied: dir3/file3 (from r1, /dir1/file3)
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ dir3/file3	Sun Sep  9 07:20:00 2001	(r3, copy of r1, /dir1/file3)
+@@ -0,0 +1 @@
++file3
+
+Copied: dir3/file4 (from r1, /dir1/file4)
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ dir3/file4	Sun Sep  9 07:20:00 2001	(r3, copy of r1, /dir1/file4)
+@@ -0,0 +1 @@
++file4
 Group: All
 Subject: r4 - dir3
 
@@ -305,16 +324,16 @@ copied and changed
 
 Added:
    dir3/file8
-      - copied, changed from r2, file1
+      - copied, changed from r2, /file1
 
-Copied and modified: dir3/file8 (from r2, file1)
+Copied and modified: dir3/file8 (from r2, /file1)
 ==============================================================================
---- file1	Sun Sep  9 04:33:20 2001	(r2, copy source)
+--- /file1	Sun Sep  9 04:33:20 2001	(r2, copy source)
 +++ dir3/file8	Sun Sep  9 10:06:40 2001	(r4)
 @@ -1 +1,2 @@
  file1
 +change C3
-Group: file
+Group: All
 Subject: r5 -  dir1 dir3
 
 Author: mailer test
@@ -325,8 +344,10 @@ Log:
 changes and deletes of properties
 
 Modified:
+   dir1/   (props changed)
+   dir3/   (props changed)
    file2   (props changed)
-Group: file plus other areas
+Group: file
 Subject: r5 -  dir1 dir3
 
 Author: mailer test
@@ -338,12 +359,7 @@ changes and deletes of properties
 
 Modified:
    file2   (props changed)
-
-Changes in other areas also in this revision:
-Modified:
-   dir1/   (props changed)
-   dir3/   (props changed)
-Group: All
+Group: file plus other areas
 Subject: r5 -  dir1 dir3
 
 Author: mailer test
@@ -354,10 +370,13 @@ Log:
 changes and deletes of properties
 
 Modified:
+   file2   (props changed)
+
+Changes in other areas also in this revision:
+Modified:
    dir1/   (props changed)
    dir3/   (props changed)
-   file2   (props changed)
-Group: file
+Group: All
 Subject: r6 -  dir1 dir4
 
 Author: mailer test
@@ -368,7 +387,18 @@ Log:
 mixed addition and change.  Fixes Blaz#456 Blah#987
 
 Added:
+   dir4/
    file9
+Modified:
+   dir1/file3
+
+Modified: dir1/file3
+==============================================================================
+--- dir1/file3	Sun Sep  9 12:53:20 2001	(r5)
++++ dir1/file3	Sun Sep  9 15:40:00 2001	(r6)
+@@ -1 +1,2 @@
+ file3
++change C4
 
 Added: file9
 ==============================================================================
@@ -376,8 +406,8 @@ Added: file9
 +++ file9	Sun Sep  9 15:40:00 2001	(r6)
 @@ -0,0 +1 @@
 +file9
-Group: file plus other areas
-Subject: r6 -  dir1 dir4
+Group: bugtracker
+Subject: Fix for Blah#987: r6 -  dir1 dir4
 
 Author: mailer test
 Date: Sun Sep  9 15:40:00 2001
@@ -387,23 +417,11 @@ Log:
 mixed addition and change.  Fixes Blaz#456 Blah#987
 
 Added:
-   file9
-
-Changes in other areas also in this revision:
-Added:
    dir4/
+   file9
 Modified:
    dir1/file3
 
-Added: file9
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ file9	Sun Sep  9 15:40:00 2001	(r6)
-@@ -0,0 +1 @@
-+file9
-
-Diffs of changes in other areas also in this revision:
-
 Modified: dir1/file3
 ==============================================================================
 --- dir1/file3	Sun Sep  9 12:53:20 2001	(r5)
@@ -411,6 +429,13 @@ Modified: dir1/file3
 @@ -1 +1,2 @@
  file3
 +change C4
+
+Added: file9
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ file9	Sun Sep  9 15:40:00 2001	(r6)
+@@ -0,0 +1 @@
++file9
 Group: bugtracker
 Subject: Fix for Blaz#456: r6 -  dir1 dir4
 
@@ -441,8 +466,8 @@ Added: file9
 +++ file9	Sun Sep  9 15:40:00 2001	(r6)
 @@ -0,0 +1 @@
 +file9
-Group: bugtracker
-Subject: Fix for Blah#987: r6 -  dir1 dir4
+Group: file
+Subject: r6 -  dir1 dir4
 
 Author: mailer test
 Date: Sun Sep  9 15:40:00 2001
@@ -452,18 +477,7 @@ Log:
 mixed addition and change.  Fixes Blaz#456 Blah#987
 
 Added:
-   dir4/
    file9
-Modified:
-   dir1/file3
-
-Modified: dir1/file3
-==============================================================================
---- dir1/file3	Sun Sep  9 12:53:20 2001	(r5)
-+++ dir1/file3	Sun Sep  9 15:40:00 2001	(r6)
-@@ -1 +1,2 @@
- file3
-+change C4
 
 Added: file9
 ==============================================================================
@@ -471,7 +485,7 @@ Added: file9
 +++ file9	Sun Sep  9 15:40:00 2001	(r6)
 @@ -0,0 +1 @@
 +file9
-Group: All
+Group: file plus other areas
 Subject: r6 -  dir1 dir4
 
 Author: mailer test
@@ -482,11 +496,23 @@ Log:
 mixed addition and change.  Fixes Blaz#456 Blah#987
 
 Added:
-   dir4/
    file9
+
+Changes in other areas also in this revision:
+Added:
+   dir4/
 Modified:
    dir1/file3
 
+Added: file9
+==============================================================================
+--- /dev/null	00:00:00 1970	(empty, because file is newly added)
++++ file9	Sun Sep  9 15:40:00 2001	(r6)
+@@ -0,0 +1 @@
++file9
+
+Diffs of changes in other areas also in this revision:
+
 Modified: dir1/file3
 ==============================================================================
 --- dir1/file3	Sun Sep  9 12:53:20 2001	(r5)
@@ -494,13 +520,47 @@ Modified: dir1/file3
 @@ -1 +1,2 @@
  file3
 +change C4
+Group: All
+Subject: r7 -  dir1 dir2 dir3 dir3/dir5
 
-Added: file9
+Author: mailer test
+Date: Sun Sep  9 18:26:40 2001
+New Revision: 7
+
+Log:
+adds, deletes, and a change
+
+Added:
+   dir1/file10
+   dir3/dir5/
+Deleted:
+   dir2/
+   file2
+Modified:
+   dir3/file3
+
+Added: dir1/file10
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ file9	Sun Sep  9 15:40:00 2001	(r6)
++++ dir1/file10	Sun Sep  9 18:26:40 2001	(r7)
 @@ -0,0 +1 @@
-+file9
++file10
+
+Modified: dir3/file3
+==============================================================================
+--- dir3/file3	Sun Sep  9 15:40:00 2001	(r6)
++++ dir3/file3	Sun Sep  9 18:26:40 2001	(r7)
+@@ -1 +1,2 @@
+ file3
++change C5
+
+Deleted: file2
+==============================================================================
+--- file2	Sun Sep  9 18:26:40 2001	(r6)
++++ /dev/null	00:00:00 1970	(deleted)
+@@ -1,2 +0,0 @@
+-file2
+-change C1
 Group: file
 Subject: r7 -  dir1 dir2 dir3 dir3/dir5
 
@@ -568,47 +628,6 @@ Modified: dir3/file3
  file3
 +change C5
 Group: All
-Subject: r7 -  dir1 dir2 dir3 dir3/dir5
-
-Author: mailer test
-Date: Sun Sep  9 18:26:40 2001
-New Revision: 7
-
-Log:
-adds, deletes, and a change
-
-Added:
-   dir1/file10
-   dir3/dir5/
-Deleted:
-   dir2/
-   file2
-Modified:
-   dir3/file3
-
-Added: dir1/file10
-==============================================================================
---- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir1/file10	Sun Sep  9 18:26:40 2001	(r7)
-@@ -0,0 +1 @@
-+file10
-
-Modified: dir3/file3
-==============================================================================
---- dir3/file3	Sun Sep  9 15:40:00 2001	(r6)
-+++ dir3/file3	Sun Sep  9 18:26:40 2001	(r7)
-@@ -1 +1,2 @@
- file3
-+change C5
-
-Deleted: file2
-==============================================================================
---- file2	Sun Sep  9 18:26:40 2001	(r6)
-+++ /dev/null	00:00:00 1970	(deleted)
-@@ -1,2 +0,0 @@
--file2
--change C1
-Group: All
 Subject: r8 - in dir6: . dir5
 
 Author: mailer test
@@ -620,31 +639,31 @@ copy dir, then make a change
 
 Added:
    dir6/
-      - copied from r6, dir3/
+      - copied from r6, /dir3/
    dir6/dir5/
-      - copied from r7, dir3/dir5/
+      - copied from r7, /dir3/dir5/
 Replaced:
    dir6/file3
-      - copied unchanged from r7, dir3/file3
+      - copied unchanged from r7, /dir3/file3
 Modified:
    dir6/file4
 
-Copied: dir6/file3 (from r7, dir3/file3)
+Copied: dir6/file3 (from r7, /dir3/file3)
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
-+++ dir6/file3	Sun Sep  9 21:13:20 2001	(r8, copy of r7, dir3/file3)
++++ dir6/file3	Sun Sep  9 21:13:20 2001	(r8, copy of r7, /dir3/file3)
 @@ -0,0 +1,2 @@
 +file3
 +change C5
 
 Modified: dir6/file4
 ==============================================================================
---- dir3/file4	Sun Sep  9 15:40:00 2001	(r6)
+--- /dir3/file4	Sun Sep  9 15:40:00 2001	(r6)
 +++ dir6/file4	Sun Sep  9 21:13:20 2001	(r8)
 @@ -1 +1,2 @@
  file4
 +change C6
-Group: file
+Group: All
 Subject: r9 - 
 
 Author: mailer test
@@ -662,7 +681,7 @@ Modified:
 Added: file11
 ==============================================================================
 Binary file. No diff available.
-Group: file plus other areas
+Group: file
 Subject: r9 - 
 
 Author: mailer test
@@ -680,7 +699,7 @@ Modified:
 Added: file11
 ==============================================================================
 Binary file. No diff available.
-Group: All
+Group: file plus other areas
 Subject: r9 - 
 
 Author: mailer test
@@ -698,7 +717,7 @@ Modified:
 Added: file11
 ==============================================================================
 Binary file. No diff available.
-Group: file
+Group: All
 Subject: r10 - 
 
 Author: mailer test
@@ -715,7 +734,7 @@ Modified:
 Modified: file11
 ==============================================================================
 Binary file (source and/or target). No diff available.
-Group: file plus other areas
+Group: file
 Subject: r10 - 
 
 Author: mailer test
@@ -732,7 +751,7 @@ Modified:
 Modified: file11
 ==============================================================================
 Binary file (source and/or target). No diff available.
-Group: All
+Group: file plus other areas
 Subject: r10 - 
 
 Author: mailer test

Modified: subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py?rev=1884427&r1=1884426&r2=1884427&view=diff
==============================================================================
--- subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py (original)
+++ subversion/trunk/tools/hook-scripts/mailer/tests/mailer-tweak.py Mon Dec 14 16:57:10 2020
@@ -50,10 +50,10 @@ def tweak_dates(pool, home='.'):
 
   for i in range(fs.youngest_rev(fsob, pool)):
     # convert secs into microseconds, then a string
-    date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000L, pool)
+    date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000, pool)
     #print date
     fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_DATE, date, pool)
-    fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, 'mailer test', pool)
+    fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, b'mailer test', pool)
 
 def main():
   if len(sys.argv) != 2:



Re: mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py)

Posted by Stefan Sperling <st...@apache.org>.
On Mon, Dec 21, 2020 at 07:15:53AM +0000, Daniel Shahaf wrote:
> stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> > URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> > Log:
> > Make mailer.py work properly with Python 3, and drop Python 2 support.
> > 
> > Most of the changes deal with the handling binary data vs Python strings.
> > 
> > I've made sure that mailer.py will work in a UTF-8 environment. In general,
> > UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> > Environments using other encodings may not work as expected, but those will
> > be problematic for hook scripts in general.
> 
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.

Anyone who didn't explicitly set a locale in hook-envs will run this
script in the C locale today, and this case still works.

I would not count broken email generated in non-UTF-8 locales as a regression.
mailer.py's email content encoding header was set to UTF-8 17 years ago:
https://svn.apache.org/r847862

> I suppose it's fair to classify non-UTF-8 environments as "patches
> welcome", following the precedent of libmagic support in the Windows
> build, but:
> 
> 1. Can we detect non-UTF-8 environments and warn or error out hard upon
> them?  «locale.getlocale()[1]» seems promising?

I don't see what value this adds. When there is an encoding problem it
will already be obvious from gargabe characters in email notifications.
And many people can live with that, so there is no reason to error out.

Given that file diffs can contain an arbitrary mix of encodings, this script
cannot guarantee readable diff output unless ASCII/UTF-8 encoding is used
for all files. I don't see a good way around this limitation. Auto-detecting
file encoding is tricky, and requring people to tag file encodings via the
svn:mime-type property is simply not going to work in pratice.

> 2. A change that hasn't been confirmed *not* to constitute a regression
> merits a release notes entry.  Would you do the honours?

I think release notes should mention Python3 support for mailer.py and
recommend that mailer.py is run in the UTF-8 or C locale.

Re: mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py)

Posted by Stefan Sperling <st...@apache.org>.
On Mon, Dec 21, 2020 at 07:15:53AM +0000, Daniel Shahaf wrote:
> stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> > URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> > Log:
> > Make mailer.py work properly with Python 3, and drop Python 2 support.
> > 
> > Most of the changes deal with the handling binary data vs Python strings.
> > 
> > I've made sure that mailer.py will work in a UTF-8 environment. In general,
> > UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> > Environments using other encodings may not work as expected, but those will
> > be problematic for hook scripts in general.
> 
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.

Anyone who didn't explicitly set a locale in hook-envs will run this
script in the C locale today, and this case still works.

I would not count broken email generated in non-UTF-8 locales as a regression.
mailer.py's email content encoding header was set to UTF-8 17 years ago:
https://svn.apache.org/r847862

> I suppose it's fair to classify non-UTF-8 environments as "patches
> welcome", following the precedent of libmagic support in the Windows
> build, but:
> 
> 1. Can we detect non-UTF-8 environments and warn or error out hard upon
> them?  «locale.getlocale()[1]» seems promising?

I don't see what value this adds. When there is an encoding problem it
will already be obvious from gargabe characters in email notifications.
And many people can live with that, so there is no reason to error out.

Given that file diffs can contain an arbitrary mix of encodings, this script
cannot guarantee readable diff output unless ASCII/UTF-8 encoding is used
for all files. I don't see a good way around this limitation. Auto-detecting
file encoding is tricky, and requring people to tag file encodings via the
svn:mime-type property is simply not going to work in pratice.

> 2. A change that hasn't been confirmed *not* to constitute a regression
> merits a release notes entry.  Would you do the honours?

I think release notes should mention Python3 support for mailer.py and
recommend that mailer.py is run in the UTF-8 or C locale.

Re: mailer.py py2/py3 change: non-UTF-8 environments

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Wed, Dec 23, 2020 at 03:13:42 +0900:
> On 2020/12/21 16:15, Daniel Shahaf wrote:
> > stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> >> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> >> Log:
> >> Make mailer.py work properly with Python 3, and drop Python 2 support.
> >>
> >> Most of the changes deal with the handling binary data vs Python strings.
> >>
> >> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> >> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> >> Environments using other encodings may not work as expected, but those will
> >> be problematic for hook scripts in general.
> > 
> > Correct me if I'm wrong, but it sounds like you haven't ruled out the
> > possibility that this commit will constitute a regression for anyone
> > who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> > commit.
> 
> It is not for the commit message, but for the committed code, it rather
> supports non-UTF-8 (or C) locale, at least the handling of REPOS-PATH
> (as I wrote another thread).
> 

*nod*, but that does leave outstanding the case of non-UTF-8 locale and data
other than REPOS-PATH.

> It seems the code before commit for Python 2 could not handle REPOS-PATH
> if it contains non ascii character on non-UTF-8 environment because it
> didn't care that svn.core.svn_path_canonicalize and svn.repos.open only
> accept bytes object of UTF-8 path. Even if a user might pass REPOS-PATH
> as transcoded to UTF-8, the script searches for a default config file 
> under the transcoded REPOS-PATH and it is not desirable behavior.
> 
> For Python 2/3 support, I don't want to drop Python 2 support for mailer.py
> if we can easily, but I also think Python 3 support is higher priority.

+1

> And as also I wrote another thread, it seems still incomplete to support
> Python 3. I'll take a look this, but it has not made progress at all
> since I then, sorry.

No worries!  I started a separate thread since I thought the issues were independent.

Thanks,

Daniel

Re: mailer.py py2/py3 change: non-UTF-8 environments

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/12/21 16:15, Daniel Shahaf wrote:
> stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
>> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>> Log:
>> Make mailer.py work properly with Python 3, and drop Python 2 support.
>>
>> Most of the changes deal with the handling binary data vs Python strings.
>>
>> I've made sure that mailer.py will work in a UTF-8 environment. In general,
>> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>> Environments using other encodings may not work as expected, but those will
>> be problematic for hook scripts in general.
> 
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.

It is not for the commit message, but for the committed code, it rather
supports non-UTF-8 (or C) locale, at least the handling of REPOS-PATH
(as I wrote another thread).

It seems the code before commit for Python 2 could not handle REPOS-PATH
if it contains non ascii character on non-UTF-8 environment because it
didn't care that svn.core.svn_path_canonicalize and svn.repos.open only
accept bytes object of UTF-8 path. Even if a user might pass REPOS-PATH
as transcoded to UTF-8, the script searches for a default config file 
under the transcoded REPOS-PATH and it is not desirable behavior.

For Python 2/3 support, I don't want to drop Python 2 support for mailer.py
if we can easily, but I also think Python 3 support is higher priority.

And as also I wrote another thread, it seems still incomplete to support
Python 3. I'll take a look this, but it has not made progress at all
since I then, sorry.

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> Log:
> Make mailer.py work properly with Python 3, and drop Python 2 support.
> 
> Most of the changes deal with the handling binary data vs Python strings.
> 
> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> Environments using other encodings may not work as expected, but those will
> be problematic for hook scripts in general.

Correct me if I'm wrong, but it sounds like you haven't ruled out the
possibility that this commit will constitute a regression for anyone
who runs mailer.py in a non-UTF-8 environment and will upgrade to this
commit.

I suppose it's fair to classify non-UTF-8 environments as "patches
welcome", following the precedent of libmagic support in the Windows
build, but:

1. Can we detect non-UTF-8 environments and warn or error out hard upon
them?  «locale.getlocale()[1]» seems promising?

2. A change that hasn't been confirmed *not* to constitute a regression
merits a release notes entry.  Would you do the honours?

Cheers,

Daniel

>                                            SVN repositories store internal
> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> or decoding of such data, and thus need to work with raw UTF-8 strings, not
> Python strings.
> 
> The encoding of file and property contents is not guaranteed to be UTF-8.
> This was already a problem before this change. This hook script sends email
> with a content type header specifying the UTF-8 encoding. Diffs which contain
> non-UTF-8 text will most likely not render properly when viewed in an email
> reader. At least this problem is now obvious in mailer.py's implementation,
> since all unidiff text is now written out directly as binary data.
> 
> As an additional fix, iterate file groups in sorted order. This results in
> stable output and makes test cases in our tests/ subdirectory reproducible.
> 
> Tested with Python 3.7.5 which is the version I use in my SVN development
> setup at present. Tests with newer versions are welcome.
> 
> * tools/hook-scripts/mailer/mailer.py:
>   Drop Python2-specific includes. Adjust includes as per 2to3.
>   (main): Decode arguments from UTF-8 to string.
>   (OutputBase:write): Encode string to UTF-8 and pass to write_binary().
>    OutputBase implementations now need to provide a self.write_binary
>    member which implements a write() method for binary data.
>   (MailedOutput): email.Header package is gone, use email.header instead,
>    and likewise replace use of email.Utils with email.utils
>   (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
>    We cannot use StringIO since diffs may contain data in arbitrary encodings.
>   (StandardOutput): Provide self.write_binary in terms of stdout.buffer.
>   (PipeOutput): Provide self.write_binary in terms of pipe.stdin.
>   (Commit): Decode log message and paths from UTF-8 to string, and iterate
>     path groups from mailer.conf in sorted order.
>   (Lock): Decode directory entries from UTF-8 to string. Encode paths back
>    to UTF-8 when we ask libsvn_fs for a lock on a path.
>     Iterate path groups from mailer.conf in sorted order.
>   (DiffGenerator): Decode repository paths from UTF-8 to string.
>   (TextCommitRenderer): Decode author, log message, and path from UTF-8 to
>     string. Write diff data via write_binary, bypassing the re-encoding step.
>   (Config): Decode paths from UTF-8 to string before matching them against
>    regular expressions. Also decode the repository directory path from UTF-8.
> 
> * tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
>    File groups are now provided in stable sorted order. This should fix
>    spurious test failures in the future.
> 
>  * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
>     integers and pass binary data instead of strings into libsvn_fs.

mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
stsp@apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> Log:
> Make mailer.py work properly with Python 3, and drop Python 2 support.
> 
> Most of the changes deal with the handling binary data vs Python strings.
> 
> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> Environments using other encodings may not work as expected, but those will
> be problematic for hook scripts in general.

Correct me if I'm wrong, but it sounds like you haven't ruled out the
possibility that this commit will constitute a regression for anyone
who runs mailer.py in a non-UTF-8 environment and will upgrade to this
commit.

I suppose it's fair to classify non-UTF-8 environments as "patches
welcome", following the precedent of libmagic support in the Windows
build, but:

1. Can we detect non-UTF-8 environments and warn or error out hard upon
them?  «locale.getlocale()[1]» seems promising?

2. A change that hasn't been confirmed *not* to constitute a regression
merits a release notes entry.  Would you do the honours?

Cheers,

Daniel

>                                            SVN repositories store internal
> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> or decoding of such data, and thus need to work with raw UTF-8 strings, not
> Python strings.
> 
> The encoding of file and property contents is not guaranteed to be UTF-8.
> This was already a problem before this change. This hook script sends email
> with a content type header specifying the UTF-8 encoding. Diffs which contain
> non-UTF-8 text will most likely not render properly when viewed in an email
> reader. At least this problem is now obvious in mailer.py's implementation,
> since all unidiff text is now written out directly as binary data.
> 
> As an additional fix, iterate file groups in sorted order. This results in
> stable output and makes test cases in our tests/ subdirectory reproducible.
> 
> Tested with Python 3.7.5 which is the version I use in my SVN development
> setup at present. Tests with newer versions are welcome.
> 
> * tools/hook-scripts/mailer/mailer.py:
>   Drop Python2-specific includes. Adjust includes as per 2to3.
>   (main): Decode arguments from UTF-8 to string.
>   (OutputBase:write): Encode string to UTF-8 and pass to write_binary().
>    OutputBase implementations now need to provide a self.write_binary
>    member which implements a write() method for binary data.
>   (MailedOutput): email.Header package is gone, use email.header instead,
>    and likewise replace use of email.Utils with email.utils
>   (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
>    We cannot use StringIO since diffs may contain data in arbitrary encodings.
>   (StandardOutput): Provide self.write_binary in terms of stdout.buffer.
>   (PipeOutput): Provide self.write_binary in terms of pipe.stdin.
>   (Commit): Decode log message and paths from UTF-8 to string, and iterate
>     path groups from mailer.conf in sorted order.
>   (Lock): Decode directory entries from UTF-8 to string. Encode paths back
>    to UTF-8 when we ask libsvn_fs for a lock on a path.
>     Iterate path groups from mailer.conf in sorted order.
>   (DiffGenerator): Decode repository paths from UTF-8 to string.
>   (TextCommitRenderer): Decode author, log message, and path from UTF-8 to
>     string. Write diff data via write_binary, bypassing the re-encoding step.
>   (Config): Decode paths from UTF-8 to string before matching them against
>    regular expressions. Also decode the repository directory path from UTF-8.
> 
> * tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
>    File groups are now provided in stable sorted order. This should fix
>    spurious test failures in the future.
> 
>  * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
>     integers and pass binary data instead of strings into libsvn_fs.

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2021/01/15 7:30, Stefan Sperling wrote:
> On Fri, Jan 15, 2021 at 01:44:43AM +0900, Yasuhito FUTATSUKI wrote:
>> On 2020/12/29 0:49, Stefan Sperling wrote:
>>> On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
>>>> As far as I read the code again, it seems this is already support
>>>> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
>>>> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
>>>> already decoded in file system encoding (i.e. encoding corresponding
>>>> to LC_CTYPE) and Python encodes it if access to file system. On the
>>>> other hand, svn.repos.open() wrapper function accepts str and encode
>>>> it to UTF-8 string, as the svn_repos_open() C API expects.
>>>>
>>>> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
>>>> already str.
>>>>
>>>> Perhaps it needs (although the last hunk is not nessesary):
>>>
>>> Yes, indeed. If python already encodes arguments to strings then this
>>> patch looks correct to me. It also works as expected in my testing.
>> I revised the patch and checked in UTF-8 locale. Now it seems it works
>> with post-revprop-change, post-lock and post-unlock hooks.
>>
>> If this patch is OK, then I'll do:
>>
>> (1) Make it support Python 2 again, to merge it to 1.14.x.
>> (2) Fix an issue to encode Unicode paths in Subject header.
>> (3) Fix an issue of max line length in Subject header.
>>
>> Perhaps (2) and (3) will be resolved at the same time.
> 
> Yes, in my opinion this patch is OK.
> The mailer.py test is still passing and your changes look fine to me.
> 
> Thank you! My experience with python2/3 conversion is limited, and your
> help with this is much appreciated.

I committed it in r1885557. Perhaps If you did not commit r1884427,
I didn't work for it so quickly (yes, even this is quickly). Thank you.

Then I'll go ahead to resolve above issues.

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Jan 15, 2021 at 01:44:43AM +0900, Yasuhito FUTATSUKI wrote:
> On 2020/12/29 0:49, Stefan Sperling wrote:
> > On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
> >> As far as I read the code again, it seems this is already support
> >> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
> >> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
> >> already decoded in file system encoding (i.e. encoding corresponding
> >> to LC_CTYPE) and Python encodes it if access to file system. On the
> >> other hand, svn.repos.open() wrapper function accepts str and encode
> >> it to UTF-8 string, as the svn_repos_open() C API expects.
> >>
> >> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
> >> already str.
> >>
> >> Perhaps it needs (although the last hunk is not nessesary):
> > 
> > Yes, indeed. If python already encodes arguments to strings then this
> > patch looks correct to me. It also works as expected in my testing.
> I revised the patch and checked in UTF-8 locale. Now it seems it works
> with post-revprop-change, post-lock and post-unlock hooks.
> 
> If this patch is OK, then I'll do:
> 
> (1) Make it support Python 2 again, to merge it to 1.14.x.
> (2) Fix an issue to encode Unicode paths in Subject header.
> (3) Fix an issue of max line length in Subject header.
> 
> Perhaps (2) and (3) will be resolved at the same time.

Yes, in my opinion this patch is OK.
The mailer.py test is still passing and your changes look fine to me.

Thank you! My experience with python2/3 conversion is limited, and your
help with this is much appreciated.

Regards,
Stefan

> Cheers,
> -- 
> Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

> Follow up to r1884427: mailer.py: Additional fix for Python 3 support 
> 
> * tools/hook-scripts/mailer/mailer.py
>   ():
>    - Don't include StringIO.
>    - import locale for output class 'StandardOutput'.
>   (main): Don't decode cmd_args[x], because they are already str.
>   (OutputBase.write_binary): New method, as the abstract base class of
>    output classes.
>   (OutputBase.write): Tweak comment. This method has implementation now.
>   (OutputBase.run): Use self.write_binary() to write content of buf.
>   (StandardOutput.write): Override default write method to use default
>    encoding for console output.
>   (PropChange.generate): Use sys.stdin.buffer instead of sys.stdin to read
>    from stdin as bytes.
>   (Lock.__init__):
>     - Use sys.stdin.buffer instead of sys.stdin to read from stdin as bytes.
>     - Strip new line character from items of self.dirlist
>   (DiffGenerator.__getitem__): Pass change.path as is to svn.fs.FileDiff.
>   (TextCommitRenderer.render): Don't decode data.author. data.author is
>    already bytes (or None) here.
>   (Repository.__init__): Ensure self.author is str or None.
>   ((entry routine)): Encode repository path as UTF-8 before canonicalize
>    explicitly.
>    
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py	(revision 1885374)
> +++ tools/hook-scripts/mailer/mailer.py	(working copy)
> @@ -50,10 +50,11 @@
>  from urllib.parse import quote as urllib_parse_quote
>  import time
>  import subprocess
> -from io import BytesIO, StringIO
> +from io import BytesIO
>  import smtplib
>  import re
>  import tempfile
> +import locale
>  
>  # Minimal version of Subversion's bindings required
>  _MIN_SVN_VERSION = [1, 5, 0]
> @@ -88,10 +89,10 @@
>      messenger = Commit(pool, cfg, repos)
>    elif cmd == 'propchange' or cmd == 'propchange2':
>      revision = int(cmd_args[0])
> -    author = cmd_args[1].decode('utf-8')
> -    propname = cmd_args[2].decode('utf-8')
> +    author = cmd_args[1]
> +    propname = cmd_args[2]
>      if cmd == 'propchange2' and cmd_args[3]:
> -      action = cmd_args[3].decode('utf-8')
> +      action = cmd_args[3]
>      else:
>        action = 'A'
>      repos = Repository(repos_dir, revision, pool)
> @@ -103,7 +104,7 @@
>                   })
>      messenger = PropChange(pool, cfg, repos, author, propname, action)
>    elif cmd == 'lock' or cmd == 'unlock':
> -    author = cmd_args[0].decode('utf-8')
> +    author = cmd_args[0]
>      repos = Repository(repos_dir, 0, pool) ### any old revision will do
>      # Override the repos revision author with the author of the lock/unlock
>      repos.author = author
> @@ -169,9 +170,13 @@
>      representation."""
>      raise NotImplementedError
>  
> +  def write_binary(self, output):
> +    """Override this method.
> +    Append the binary data OUTPUT to the output representation."""
> +    raise NotImplementedError
> +
>    def write(self, output):
> -    """Override this method.
> -    Append the literal text string OUTPUT to the output representation."""
> +    """Append the literal text string OUTPUT to the output representation."""
>      return self.write_binary(output.encode('utf-8'))
>  
>    def run(self, cmd):
> @@ -184,7 +189,7 @@
>  
>      buf = pipe_ob.stdout.read(self._CHUNKSIZE)
>      while buf:
> -      self.write(buf)
> +      self.write_binary(buf)
>        buf = pipe_ob.stdout.read(self._CHUNKSIZE)
>  
>      # wait on the child so we don't end up with a billion zombies
> @@ -360,7 +365,12 @@
>    def finish(self):
>      pass
>  
> +  def write(self, output):
> +    """Write text as *default* encoding string"""
> +    return self.write_binary(output.encode(locale.getpreferredencoding(),
> +                                           'backslashreplace'))
>  
> +
>  class PipeOutput(MailedOutput):
>    "Deliver a mail message to an MTA via a pipe."
>  
> @@ -532,7 +542,7 @@
>          elif self.action == 'M':
>            self.output.write('Property diff:\n')
>            tempfile1 = tempfile.NamedTemporaryFile()
> -          tempfile1.write(sys.stdin.read())
> +          tempfile1.write(sys.stdin.buffer.read())
>            tempfile1.flush()
>            tempfile2 = tempfile.NamedTemporaryFile()
>            tempfile2.write(self.repos.get_rev_prop(self.propname))
> @@ -594,9 +604,8 @@
>                          or 'unlock_subject_prefix'))
>  
>      # read all the locked paths from STDIN and strip off the trailing newlines
> -    self.dirlist = [x for x in sys.stdin.readlines()]
> -    for i in range(len(self.dirlist)):
> -      self.dirlist[i] = self.dirlist[i].decode('utf-8')
> +    self.dirlist = [x.decode('utf-8').rstrip()
> +                    for x in sys.stdin.buffer.readlines()]
>  
>      # collect the set of groups and the unique sets of params for the options
>      self.groups = { }
> @@ -910,7 +919,7 @@
>              kind = 'C'
>              if self.diffsels.copy:
>                diff = svn.fs.FileDiff(None, None, self.repos.root_this,
> -                                     change.path.decode('utf-8'), self.pool)
> +                                     change.path, self.pool)
>                label1 = '/dev/null\t00:00:00 1970\t' \
>                         '(empty, because file is newly added)'
>                label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
> @@ -1092,7 +1101,7 @@
>  
>      w = self.output.write
>  
> -    w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author.decode('utf-8'),
> +    w('Author: %s\nDate: %s\nNew Revision: %s\n' % (data.author,
>                                                        data.date,
>                                                        data.rev))
>  
> @@ -1221,6 +1230,8 @@
>      self.root_this = self.get_root(rev)
>  
>      self.author = self.get_rev_prop(svn.core.SVN_PROP_REVISION_AUTHOR)
> +    if self.author is not None:
> +      self.author = self.author.decode('utf-8')
>  
>    def get_rev_prop(self, propname, rev = None):
>      if not rev:
> @@ -1510,7 +1521,7 @@
>      usage()
>  
>    cmd = sys.argv[1]
> -  repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
> +  repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
>    repos_dir = repos_dir.decode('utf-8')
>    try:
>      expected_args = cmd_list[cmd]


Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/12/29 0:49, Stefan Sperling wrote:
> On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
>> As far as I read the code again, it seems this is already support
>> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
>> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
>> already decoded in file system encoding (i.e. encoding corresponding
>> to LC_CTYPE) and Python encodes it if access to file system. On the
>> other hand, svn.repos.open() wrapper function accepts str and encode
>> it to UTF-8 string, as the svn_repos_open() C API expects.
>>
>> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
>> already str.
>>
>> Perhaps it needs (although the last hunk is not nessesary):
> 
> Yes, indeed. If python already encodes arguments to strings then this
> patch looks correct to me. It also works as expected in my testing.
I revised the patch and checked in UTF-8 locale. Now it seems it works
with post-revprop-change, post-lock and post-unlock hooks.

If this patch is OK, then I'll do:

(1) Make it support Python 2 again, to merge it to 1.14.x.
(2) Fix an issue to encode Unicode paths in Subject header.
(3) Fix an issue of max line length in Subject header.

Perhaps (2) and (3) will be resolved at the same time.

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Dec 15, 2020 at 07:47:59PM +0900, Yasuhito FUTATSUKI wrote:
> As far as I read the code again, it seems this is already support
> locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
> LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
> already decoded in file system encoding (i.e. encoding corresponding
> to LC_CTYPE) and Python encodes it if access to file system. On the
> other hand, svn.repos.open() wrapper function accepts str and encode
> it to UTF-8 string, as the svn_repos_open() C API expects.
> 
> Then I overlooked sys.argv[x] and cmd_args[x] in main() are
> already str.
> 
> Perhaps it needs (although the last hunk is not nessesary):

Yes, indeed. If python already encodes arguments to strings then this
patch looks correct to me. It also works as expected in my testing.

> [[[
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1884434)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -88,10 +88,10 @@
>      messenger = Commit(pool, cfg, repos)
>    elif cmd == 'propchange' or cmd == 'propchange2':
>      revision = int(cmd_args[0])
> -    author = cmd_args[1].decode('utf-8')
> -    propname = cmd_args[2].decode('utf-8')
> +    author = cmd_args[1]
> +    propname = cmd_args[2]
>      if cmd == 'propchange2' and cmd_args[3]:
> -      action = cmd_args[3].decode('utf-8')
> +      action = cmd_args[3]
>      else:
>        action = 'A'
>      repos = Repository(repos_dir, revision, pool)
> @@ -103,7 +103,7 @@
>                   })
>      messenger = PropChange(pool, cfg, repos, author, propname, action)
>    elif cmd == 'lock' or cmd == 'unlock':
> -    author = cmd_args[0].decode('utf-8')
> +    author = cmd_args[0]
>      repos = Repository(repos_dir, 0, pool) ### any old revision will do
>      # Override the repos revision author with the author of the lock/unlock
>      repos.author = author
> @@ -1510,7 +1510,7 @@
>      usage()
>  
>    cmd = sys.argv[1]
> -  repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
> +  repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
>    repos_dir = repos_dir.decode('utf-8')
>    try:
>      expected_args = cmd_list[cmd]
> ]]]

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2020/12/15 5:06, Stefan Sperling wrote:
> On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
>> First of all, thank you for doing this. I worried that mailer.py didn't
>> support Python 3 for a long time.
>>
>> In message <20...@svn01-us-east.apache.org>, stsp@apache.o
>> rg writes:
>>> Author: stsp
>>> Date: Mon Dec 14 16:57:10 2020
>>> New Revision: 1884427
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>>> Log:
>>> Make mailer.py work properly with Python 3, and drop Python 2 support.
>>>
>>> Most of the changes deal with the handling binary data vs Python strings.
>>>
>>> I've made sure that mailer.py will work in a UTF-8 environment. In general,
>>> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>>> Environments using other encodings may not work as expected, but those will
>>> be problematic for hook scripts in general.
>>
>> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
>> REPOS-PATH argument. Other paths in the script are all Subversion's
>> internal paths.
> 
> Yes, I agree that the REPOS-PATH argument is an exceptional case.
> Unforunately, we have to decode repos_dir because it is used with the
> os.path.join() function.

As far as I read the code again, it seems this is already support
locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
already decoded in file system encoding (i.e. encoding corresponding
to LC_CTYPE) and Python encodes it if access to file system. On the
other hand, svn.repos.open() wrapper function accepts str and encode
it to UTF-8 string, as the svn_repos_open() C API expects.

Then I overlooked sys.argv[x] and cmd_args[x] in main() are
already str.

Perhaps it needs (although the last hunk is not nessesary):
[[[
Index: tools/hook-scripts/mailer/mailer.py
===================================================================
--- tools/hook-scripts/mailer/mailer.py (revision 1884434)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -88,10 +88,10 @@
     messenger = Commit(pool, cfg, repos)
   elif cmd == 'propchange' or cmd == 'propchange2':
     revision = int(cmd_args[0])
-    author = cmd_args[1].decode('utf-8')
-    propname = cmd_args[2].decode('utf-8')
+    author = cmd_args[1]
+    propname = cmd_args[2]
     if cmd == 'propchange2' and cmd_args[3]:
-      action = cmd_args[3].decode('utf-8')
+      action = cmd_args[3]
     else:
       action = 'A'
     repos = Repository(repos_dir, revision, pool)
@@ -103,7 +103,7 @@
                  })
     messenger = PropChange(pool, cfg, repos, author, propname, action)
   elif cmd == 'lock' or cmd == 'unlock':
-    author = cmd_args[0].decode('utf-8')
+    author = cmd_args[0]
     repos = Repository(repos_dir, 0, pool) ### any old revision will do
     # Override the repos revision author with the author of the lock/unlock
     repos.author = author
@@ -1510,7 +1510,7 @@
     usage()
 
   cmd = sys.argv[1]
-  repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
+  repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
   repos_dir = repos_dir.decode('utf-8')
   try:
     expected_args = cmd_list[cmd]
]]]

(Also, I found svn.repos.open() and svn.core.svn_path_canonicalize() already
accepted str in commited code :))

> There is no way of knowing which character encoding is used for paths.
> At least on UNIX a filename is just a string of bytes and it is independent
> of the locale.
> 
> Note that hook scripts will run in an empty environment by default.
> Which means this script will run in the "C" locale by default.
> If another locale is desired, the administrator needs to configure the
> server as documented here:
> https://subversion.apache.org/docs/release-notes/1.8.html#hooks-env
> 
> My assumption is that for hook scripts like this one, the server will be
> configured with SVNUseUTF8 and the hook script environment will contain
> something like LANG=C.UTF-8.
> 
> If svnadmin is then also used in the UTF-8 locale to create repositories,
> the hook script should work fine.

I assumed that hook scripts kicked by file:// or svn+ssh:// scheme.

I think even if the repos path is neither UTF-8 nor ascii but appropriate
encoding string, hook scripts themselves know it and can invoke mailer.py
script with appropriate LC_CTYPE by using env(1) utility or directly setting
environment variable within hook scripts themselves.

Of course, we can easily move or make symlink of repository path from
non UTF-8 path to UTF-8 or ascii path, so I don't insist that we should
support non UTF-8 path.

>>> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
>>> or decoding of such data, and thus need to work with raw UTF-8 strings, not
>>> Python strings.
>>
>> I have no objection the code itself, however it is a bit incorrect.
>>
>> Our Python3 bindings accept str objects as inputs for char *.                                           ^ as well as bytes objects

                                     If str objects are passed, v
>>                                                               It always
>> convert them to UTF-8 C strings on all APIs without regarding preferred
>> encoding or file system encoding on Python.  As some of APIs accept
>> local paths and they should be encoded as locale's charset/encoding,
>> I also prefer to encode to bytes on caller side explicitly before call
>> APIs, to avoid making bugs.  Of course, return values of wrapper APIs
>> corresponding char ** args and return value of C API are not decoded,
>> raw bytes objects. 
> 
> Thank you, I did not know this. The patch below undoes changes which I
> made in order to encode input to SVN APIs as UTF-8.
> 
> The mailer.py test is still passing fine with this change.

Yes, however as I wrote above I prefer not to revert them.
 
I also found some bytes/str mixture still left around return values
for each call of Repository.get_rev_prop(), however I just found,
but not tested. So I continue to look over the code and I'll try
to fix the problems if they still exist.

Cheers,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsclub.org>

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
> First of all, thank you for doing this. I worried that mailer.py didn't
> support Python 3 for a long time.
> 
> In message <20...@svn01-us-east.apache.org>, stsp@apache.o
> rg writes:
> >Author: stsp
> >Date: Mon Dec 14 16:57:10 2020
> >New Revision: 1884427
> >
> >URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> >Log:
> >Make mailer.py work properly with Python 3, and drop Python 2 support.
> >
> >Most of the changes deal with the handling binary data vs Python strings.
> >
> >I've made sure that mailer.py will work in a UTF-8 environment. In general,
> >UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> >Environments using other encodings may not work as expected, but those will
> >be problematic for hook scripts in general.
> 
> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
> REPOS-PATH argument. Other paths in the script are all Subversion's
> internal paths.

Yes, I agree that the REPOS-PATH argument is an exceptional case.
Unforunately, we have to decode repos_dir because it is used with the
os.path.join() function.

There is no way of knowing which character encoding is used for paths.
At least on UNIX a filename is just a string of bytes and it is independent
of the locale.

Note that hook scripts will run in an empty environment by default.
Which means this script will run in the "C" locale by default.
If another locale is desired, the administrator needs to configure the
server as documented here:
https://subversion.apache.org/docs/release-notes/1.8.html#hooks-env

My assumption is that for hook scripts like this one, the server will be
configured with SVNUseUTF8 and the hook script environment will contain
something like LANG=C.UTF-8.

If svnadmin is then also used in the UTF-8 locale to create repositories,
the hook script should work fine.

> >data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> >or decoding of such data, and thus need to work with raw UTF-8 strings, not
> >Python strings.
> 
> I have no objection the code itself, however it is a bit incorrect.
> 
> Our Python3 bindings accept str objects as inputs for char *. It always
> convert them to UTF-8 C strings on all APIs without regarding preferred
> encoding or file system encoding on Python.  As some of APIs accept
> local paths and they should be encoded as locale's charset/encoding,
> I also prefer to encode to bytes on caller side explicitly before call
> APIs, to avoid making bugs.  Of course, return values of wrapper APIs
> corresponding char ** args and return value of C API are not decoded,
> raw bytes objects. 

Thank you, I did not know this. The patch below undoes changes which I
made in order to encode input to SVN APIs as UTF-8.

The mailer.py test is still passing fine with this change.

Index: tools/hook-scripts/mailer/mailer.py
===================================================================
--- tools/hook-scripts/mailer/mailer.py	(revision 1884427)
+++ tools/hook-scripts/mailer/mailer.py	(working copy)
@@ -624,8 +624,7 @@ class Lock(Messenger):
 
     # The lock comment is the same for all paths, so we can just pull
     # the comment for the first path in the dirlist and cache it.
-    self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr,
-                                       self.dirlist[0].encode('utf-8'),
+    self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr, self.dirlist[0],
                                        self.pool)
 
   def generate(self):
@@ -910,7 +909,7 @@ class DiffGenerator:
             kind = 'C'
             if self.diffsels.copy:
               diff = svn.fs.FileDiff(None, None, self.repos.root_this,
-                                     change.path.decode('utf-8'), self.pool)
+                                     change.path, self.pool)
               label1 = '/dev/null\t00:00:00 1970\t' \
                        '(empty, because file is newly added)'
               label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
Index: tools/hook-scripts/mailer/tests/mailer-tweak.py
===================================================================
--- tools/hook-scripts/mailer/tests/mailer-tweak.py	(revision 1884427)
+++ tools/hook-scripts/mailer/tests/mailer-tweak.py	(working copy)
@@ -53,7 +53,7 @@ def tweak_dates(pool, home='.'):
     date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000, pool)
     #print date
     fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_DATE, date, pool)
-    fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, b'mailer test', pool)
+    fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, 'mailer test', pool)
 
 def main():
   if len(sys.argv) != 2: