You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Kenneth Porter <sh...@sewingwitch.com> on 2018/01/23 15:14:47 UTC

mailer.py commit says TypeError: must be unicode, not str

I upgraded my repo server from CentOS 6 to 7 and my commit hook is now 
failing with this message. I manually upgraded mailer.py from that found 
in Subversion 1.7.14 to the latest in trunk and still get the same 
error. So I suspect it's something in the Subversion Python bindings and 
not the mailer script. Or do I need to upgrade my repo through a dump 
and reload? (Repo format is 3.)

Here's the full traceback. I do get the email.

bash-4.2$ /usr/local/bin/subversion/mailer.py commit /srv/svn/MPA 7598 
/usr/local/bin/subversion/mailer.conf
Traceback (most recent call last):
   File "/usr/local/bin/subversion/mailer.py", line 1465, in <module>
     sys.argv[3:3+expected_args])
   File "/usr/lib64/python2.7/site-packages/svn/core.py", line 307, in 
run_app
     return func(application_pool, *args, **kw)
   File "/usr/local/bin/subversion/mailer.py", line 132, in main
     messenger.generate()
   File "/usr/local/bin/subversion/mailer.py", line 439, in generate
     group, params, paths, subpool)
   File "/usr/local/bin/subversion/mailer.py", line 709, in generate_content
     renderer.render(data)
   File "/usr/local/bin/subversion/mailer.py", line 1056, in render
     self._render_diffs(data.diffs, '')
   File "/usr/local/bin/subversion/mailer.py", line 1099, in _render_diffs
     for diff in diffs:
   File "/usr/local/bin/subversion/mailer.py", line 898, in __getitem__
     src_fname, dst_fname = diff.get_files()
   File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 103, in 
get_files
     self._dump_contents(self.tempfile2, self.root2, self.path2)
   File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 87, in 
_dump_contents
     fp.write(chunk)
TypeError: must be unicode, not str

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Thursday, February 08, 2018 2:27 AM +0000 Troy Curtis Jr 
<tr...@gmail.com> wrote:

> I believe that Kenneth is actually referring to putting the type(fp) test
> in svn/fs.py, where it is doing the prefixing internally, not that it is
> leaking out into his application. I think he was simply trying to debug
> our different experiences.

That's correct. I was just trying to debug the fs.py module. 

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Wed, Feb 7, 2018 at 1:06 PM Daniel Shahaf <d....@daniel.shahaf.name> wrote:

> Kenneth Porter wrote on Wed, 07 Feb 2018 09:35 -0800:
> > The overload of type was happening due to the unprefix stuff in fs.py.
> Red
> > herring, though interesting.
>
> This should only happen with 1.7 and earlier, see
> https://subversion.apache.org/docs/release-notes/1.8#swig-py-star
>
>
I believe that Kenneth is actually referring to putting the type(fp) test
in svn/fs.py,
where it is doing the prefixing internally, not that it is leaking out into
his application.
I think he was simply trying to debug our different experiences.

Troy


> Cheers,
>
> Daniel
>

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Kenneth Porter wrote on Wed, 07 Feb 2018 09:35 -0800:
> The overload of type was happening due to the unprefix stuff in fs.py. Red 
> herring, though interesting.

This should only happen with 1.7 and earlier, see
https://subversion.apache.org/docs/release-notes/1.8#swig-py-star

Cheers,

Daniel

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Troy Curtis Jr wrote on Sun, 11 Feb 2018 14:27 +0000:
> All this being said, I'm fine reverting the import swap if that is what is
> desired, as all indications are that it works either way now.  I simply
> liked the communication of intent better with this order.

I think the primary reason for the 'try: import py3name...' convention
is to make the code a little faster in the common case, which is
presumed to be the newer language version.  I'm not sure whether the
difference is significant enough to be worth the churn of another commit
unchanging the order.  It certainly isn't significant enough to block the
backport (the bugfix outweighs it by orders of magnitude).

I'm not sure I understand exactly in what way you see the order in HEAD
as better.  As I see it, the code wants to import the module that's
known as '__builtin__' on py2 and 'builtins' on py3 because it wants to
call <that module>.open(), an alias of the builtin open() function.  I
don't think the code should care whether "import builtins" succeeds
because it runs under py3 or under py2+future, so long as "import
builtins" results in locals()['builtins'] having the semantics it would
have under py3.  I view this as a form of duck typing.

This may be an academic question, of course; the important thing is that
the patch is committed so trunk now works with and without future
installed.

Cheers,

Daniel

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Sun, Feb 11, 2018 at 2:38 AM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Troy Curtis Jr wrote on Sun, 11 Feb 2018 02:28 +0000:
> > I committed the fix to the bindings in
> > https://svn.apache.org/viewvc?view=revision&revision=1823802 .  In
> addition
> > to Kenneth's suggestion of opening in binary mode, I switched the imports
> > so that the python2-future's implementation would not get inadvertently
> > pulled in.  Everything looked fine with the how python2-future's open
> > worked (since it did in fact use the Python 3's open() semantics), but I
> > think it best that on the intended modules are included.  I also added a
> > test which duplicated the issue (with python2-future installed at least),
> > and confirms the fix.
> >
> > This is a relatively isolated change, but fixes surprising behavior (as
> > Kenneth can attest to), does something like this make sense to propose
> for
> > the 1.10 branch?
>
> The change is non-invasive, backwards-compatible, fixes a user-visible
> bug, is unlikely to introduce new ones, and has a test. That makes it a
> good backport candidate.
>
> Usually we have just two 'live' branches in addition to trunk, but
> currently
> we have three (1.8 under security support, 1.9 supported, 1.10 in beta), so
> you might consider nominating this to 1.9 in addition to 1.10.
>
> That said, I'm a bit puzzled at changing the order of imports.  Our
> practise throughout the codebase appears to be 'try: import py3name;
> except ImportError: import py2name', and I don't understand why this one
> case should be different.  Secondly, if I understand correctly, the
> 'past' package makes 'import __builtins__' work on py3, so changing the
> order of imports simply trades one potential problem for another,
> doesn't it?  And thirdly, why would it be a problem if 'from builtins
> import
> open' worked under py2, so long as it had the same semantics as it
> does under py3?  (I.e., so long as it set locals()['open'] to a function
> with
> the same semantics as py3's open() function)
>
>
Not only is that convention, but with Kenneth's suggestion, the open()
works correctly, with or without the python future package installed (in my
testing at least).  So it should work fine either way.

For me it was more of a semantics thing.  The intention of the original
block was to import builtins when running in Python 3 (which would not have
actually worked anyway, since this module is obviously not actually ready
to run under Python 3), and __builtin__ in Python 2. But on some systems,
those with python2-future installed, it actually imports a separate
'builtins' package even in Python 2.  You could make the case that as long
as the end result works, it doesn't really matter, even if that wasn't the
intention of the code.  In that case the prudent thing to do would be to
add a test environment with python2-future installed and a different one
without.  Admittedly, for this extremely limited usage, adding a new
"dedicated" builder is likely way overkill.

I hadn't noticed the 'past' package.  Poking around at it though it looks
like the Python 2 behavior is tucked behind a 'past.builtins' module:

  /usr/lib/python3.6/site-packages/past/builtins

So I don't think there will be an issue with the import __builtin__ pulling
in a separate, external, module even when the python future package is
installed.

Of course, the "real fix" is to make all the modules Py2/Py3 compatible,
then the import of Py3's open() semantics is perfectly acceptable and
expected throughout the module ;)

All this being said, I'm fine reverting the import swap if that is what is
desired, as all indications are that it works either way now.  I simply
liked the communication of intent better with this order.

Troy

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Troy Curtis Jr wrote on Sun, 11 Feb 2018 02:28 +0000:
> I committed the fix to the bindings in
> https://svn.apache.org/viewvc?view=revision&revision=1823802 .  In addition
> to Kenneth's suggestion of opening in binary mode, I switched the imports
> so that the python2-future's implementation would not get inadvertently
> pulled in.  Everything looked fine with the how python2-future's open
> worked (since it did in fact use the Python 3's open() semantics), but I
> think it best that on the intended modules are included.  I also added a
> test which duplicated the issue (with python2-future installed at least),
> and confirms the fix.
> 
> This is a relatively isolated change, but fixes surprising behavior (as
> Kenneth can attest to), does something like this make sense to propose for
> the 1.10 branch?

The change is non-invasive, backwards-compatible, fixes a user-visible
bug, is unlikely to introduce new ones, and has a test. That makes it a
good backport candidate.

Usually we have just two 'live' branches in addition to trunk, but currently
we have three (1.8 under security support, 1.9 supported, 1.10 in beta), so
you might consider nominating this to 1.9 in addition to 1.10.

That said, I'm a bit puzzled at changing the order of imports.  Our
practise throughout the codebase appears to be 'try: import py3name;
except ImportError: import py2name', and I don't understand why this one
case should be different.  Secondly, if I understand correctly, the
'past' package makes 'import __builtins__' work on py3, so changing the
order of imports simply trades one potential problem for another,
doesn't it?  And thirdly, why would it be a problem if 'from builtins import
open' worked under py2, so long as it had the same semantics as it
does under py3?  (I.e., so long as it set locals()['open'] to a function with
the same semantics as py3's open() function)

Cheers,

Daniel

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Troy Curtis Jr wrote on Sun, 11 Feb 2018 03:36 +0000:
> I believe that Subversion has an internal diff generation utility, could
> someone point me in the right direction?

tools/diff/diff

svn_diff_diff_2()

> I think just changing the implementation to using an internal diff
> would make this more portable.

Yes, or we could use 'import difflib' from the standard library, couldn't we?

Excuse brevity

Daniel

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Sat, Feb 10, 2018 at 8:28 PM Troy Curtis Jr <tr...@gmail.com>
wrote:

> On Thu, Feb 8, 2018 at 12:22 PM Kenneth Porter <sh...@sewingwitch.com>
> wrote:
>
>> On 2/8/2018 4:35 AM, Daniel Shahaf wrote:
>> > So, just to be clear, the problem is that svn/fs.py is not py3
>> > compatible, and having the 'builtins' module under py2 merely
>> > exposes that?  I.e., we have no reason to suspect a bug in the 'future'
>> > package's implementation of builtins.open() under py2.
>>
>> That's my interpretation. As I said at the start of the thread, it was
>> never clear why the internal temporary file was opened in text mode and
>> update mode when the code was first written.
>>
>>
> I committed the fix to the bindings in
> https://svn.apache.org/viewvc?view=revision&revision=1823802 .  In
> addition to Kenneth's suggestion of opening in binary mode, I switched the
> imports so that the python2-future's implementation would not get
> inadvertently pulled in.  Everything looked fine with the how
> python2-future's open worked (since it did in fact use the Python 3's
> open() semantics), but I think it best that on the intended modules are
> included.  I also added a test which duplicated the issue (with
> python2-future installed at least), and confirms the fix.
>
> This is a relatively isolated change, but fixes surprising behavior (as
> Kenneth can attest to), does something like this make sense to propose for
> the 1.10 branch?
>
>
https://ci.apache.org/builders/svn-windows-ra/builds/1998

It did occur to me at one point, "I wonder how well this works on Windows",
since it shells out to external 'diff' command.  I presume fs.FileDiff has
never worked on Windows (well presumably it would if a compatible 'diff'
command was in the PATH), but there was never a test for it previously.

I believe that Subversion has an internal diff generation utility, could
someone point me in the right direction?  I think just changing the
implementation to using an internal diff would make this more portable.

Troy

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Thu, Feb 8, 2018 at 12:22 PM Kenneth Porter <sh...@sewingwitch.com>
wrote:

> On 2/8/2018 4:35 AM, Daniel Shahaf wrote:
> > So, just to be clear, the problem is that svn/fs.py is not py3
> > compatible, and having the 'builtins' module under py2 merely
> > exposes that?  I.e., we have no reason to suspect a bug in the 'future'
> > package's implementation of builtins.open() under py2.
>
> That's my interpretation. As I said at the start of the thread, it was
> never clear why the internal temporary file was opened in text mode and
> update mode when the code was first written.
>
>
I committed the fix to the bindings in
https://svn.apache.org/viewvc?view=revision&revision=1823802 .  In addition
to Kenneth's suggestion of opening in binary mode, I switched the imports
so that the python2-future's implementation would not get inadvertently
pulled in.  Everything looked fine with the how python2-future's open
worked (since it did in fact use the Python 3's open() semantics), but I
think it best that on the intended modules are included.  I also added a
test which duplicated the issue (with python2-future installed at least),
and confirms the fix.

This is a relatively isolated change, but fixes surprising behavior (as
Kenneth can attest to), does something like this make sense to propose for
the 1.10 branch?

Troy

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
On 2/8/2018 4:35 AM, Daniel Shahaf wrote:
> So, just to be clear, the problem is that svn/fs.py is not py3
> compatible, and having the 'builtins' module under py2 merely
> exposes that?  I.e., we have no reason to suspect a bug in the 'future'
> package's implementation of builtins.open() under py2.

That's my interpretation. As I said at the start of the thread, it was 
never clear why the internal temporary file was opened in text mode and 
update mode when the code was first written.


Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Troy Curtis Jr wrote on Thu, 08 Feb 2018 03:29 +0000:
> but sure enough, a 'rpm -ql python2-future' reveals:
> ...
> /usr/lib/python2.7/site-packages/builtins
> ...
> 
> That was certainly unexpected..and presumably a bad way to detect
> python3/python2.  Though the case could be
> made that if the code was py3 compatible, and python2-futures brought the
> py3 features, all should be well.  In this case the py3
> compat was only starting to happen, but hadn't finished yet.
> 
> It certainly seems like something to address, the mere presence of the
> compat library causes the bindings to misbehave.
> 
> I'll take a look at it tonight or tomorrow night unless someone beats me to
> it.

So, just to be clear, the problem is that svn/fs.py is not py3
compatible, and having the 'builtins' module under py2 merely
exposes that?  I.e., we have no reason to suspect a bug in the 'future'
package's implementation of builtins.open() under py2.

Cheers,

Daniel

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Wed, Feb 7, 2018 at 8:53 PM Kenneth Porter <sh...@sewingwitch.com> wrote:

> --On Thursday, February 08, 2018 2:30 AM +0000 Troy Curtis Jr
> <tr...@gmail.com> wrote:
>
> > If that package was being imported, I'd definitely expect that kind of
> > behavior, but merely it's presence on your system should not be enough to
> > actually cause your python scripts to use the "future" behavior within
> > python 2.7.  Has the mailer.py been changed, or is it being imported by
> > some other script as a module that does import the future module?
>
> It happens at the top of fs.py with this sequence:
>
> try:
>   # Python >=3.0
>   import builtins
> except ImportError:
>   # Python <3.0
>   import __builtin__ as builtins
>
> The code should be taking the except clause but the presence of the
> compatibility package fools fs.py into thinking it's on a 3.0 system. This
> might work except for the improper Unicode assumption about the kind of
> object returned by open().
>

Egads! You are right!  I had in mind the 'import __future__ <blah>'
statement, thinking you'd be
safe as long as you didn't do that, but sure enough, a 'rpm -ql
python2-future' reveals:
...
/usr/lib/python2.7/site-packages/builtins
...

That was certainly unexpected..and presumably a bad way to detect
python3/python2.  Though the case could be
made that if the code was py3 compatible, and python2-futures brought the
py3 features, all should be well.  In this case the py3
compat was only starting to happen, but hadn't finished yet.

It certainly seems like something to address, the mere presence of the
compat library causes the bindings to misbehave.

I'll take a look at it tonight or tomorrow night unless someone beats me to
it.

Troy

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Thursday, February 08, 2018 2:30 AM +0000 Troy Curtis Jr 
<tr...@gmail.com> wrote:

> If that package was being imported, I'd definitely expect that kind of
> behavior, but merely it's presence on your system should not be enough to
> actually cause your python scripts to use the "future" behavior within
> python 2.7.  Has the mailer.py been changed, or is it being imported by
> some other script as a module that does import the future module?

It happens at the top of fs.py with this sequence:

try:
  # Python >=3.0
  import builtins
except ImportError:
  # Python <3.0
  import __builtin__ as builtins

The code should be taking the except clause but the presence of the 
compatibility package fools fs.py into thinking it's on a 3.0 system. This 
might work except for the improper Unicode assumption about the kind of 
object returned by open(). 

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Wed, Feb 7, 2018 at 11:54 AM Kenneth Porter <sh...@sewingwitch.com>
wrote:

> --On Wednesday, February 07, 2018 9:35 AM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
>
> > So there's a builtins package hiding in this system somewhere.
>
> Found it. The system has the python2-future package which is a dependency
> of the certbot package. I think I got that from the epel repo.
>
> Here's the metadata for that package:
>
>  rpm -qi python2-future
> Name        : python2-future
> Version     : 0.16.0
> Release     : 6.el7
> Architecture: noarch
> Install Date: Sun Jan  7 21:05:17 2018
> Group       : Applications/Engineering
> Size        : 3796094
> License     : MIT
> Signature   : RSA/SHA256, Fri Dec 15 05:27:05 2017, Key ID 6a2faea2352c64e5
> Source RPM  : future-0.16.0-6.el7.src.rpm
> Build Date  : Fri Dec 15 05:24:22 2017
> Build Host  : buildvm-aarch64-18.arm.fedoraproject.org
> Relocations : (not relocatable)
> Packager    : Fedora Project
> Vendor      : Fedora Project
> URL         : http://python-future.org/
> Summary     : Easy, clean, reliable Python 2/3 compatibility
> Description :
> Python2 future is the missing compatibility layer between Python 2 and
> Python 3. It allows you to use a single, clean Python 3.x-compatible
> codebase to support both Python 2 and Python 3 with minimal overhead.
>
> It provides ``future`` and ``past`` packages with backports and forward
> ports of features from Python 3 and 2. It also comes with ``futurize`` and
> ``pasteurize``, customized 2to3-based scripts that helps you to convert
> either Py2 or Py3 code easily to support both Python 2 and 3 in a single
> clean Py3-style codebase, module by module.
>
>
If that package was being imported, I'd definitely expect that kind of
behavior, but merely it's presence on your system should not be enough to
actually cause your python scripts to use the "future" behavior within
python 2.7.  Has the mailer.py been changed, or is it being imported by
some other script as a module that does import the future module?

Troy

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Wednesday, February 07, 2018 9:35 AM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

> So there's a builtins package hiding in this system somewhere.

Found it. The system has the python2-future package which is a dependency 
of the certbot package. I think I got that from the epel repo.

Here's the metadata for that package:

 rpm -qi python2-future
Name        : python2-future
Version     : 0.16.0
Release     : 6.el7
Architecture: noarch
Install Date: Sun Jan  7 21:05:17 2018
Group       : Applications/Engineering
Size        : 3796094
License     : MIT
Signature   : RSA/SHA256, Fri Dec 15 05:27:05 2017, Key ID 6a2faea2352c64e5
Source RPM  : future-0.16.0-6.el7.src.rpm
Build Date  : Fri Dec 15 05:24:22 2017
Build Host  : buildvm-aarch64-18.arm.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://python-future.org/
Summary     : Easy, clean, reliable Python 2/3 compatibility
Description :
Python2 future is the missing compatibility layer between Python 2 and
Python 3. It allows you to use a single, clean Python 3.x-compatible
codebase to support both Python 2 and Python 3 with minimal overhead.

It provides ``future`` and ``past`` packages with backports and forward
ports of features from Python 3 and 2. It also comes with ``futurize`` and
``pasteurize``, customized 2to3-based scripts that helps you to convert
either Py2 or Py3 code easily to support both Python 2 and 3 in a single
clean Py3-style codebase, module by module.


Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
The overload of type was happening due to the unprefix stuff in fs.py. Red 
herring, though interesting.

I distilled the failure down to this foo.py program. Of interest is that 
"import builtins" succeeds on a Python 2.5 system. It should be throwing 
unless Python is v3. So there's a builtins package hiding in this system 
somewhere.

#!/usr/bin/env python

# test subversion-python bug revealed by mailer.py

import tempfile as _tempfile
import builtins

tempfile = _tempfile.mktemp()
print tempfile
#print sys.version
fp = builtins.open(tempfile, 'w+')
print type(fp)
chunk = ""
print type(chunk)
fp.write(chunk)
fp.close()


Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
Here's an interesting clue. I added "print type(fp)" after the temp file is 
opened to see what kind of file object was returned by the open call and 
got this traceback. How is the type function getting overloaded with 
svn_fs_type? So far I haven't been able to recreate this with a short 
hello-world program. So I'm sure something is going wrong in the chain of 
imports somewhere between mailer.py and fs.py.

  File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 80, in 
_dump_contents
    print type(fp)
  File "/usr/lib64/python2.7/site-packages/libsvn/fs.py", line 122, in 
svn_fs_type
    return _fs.svn_fs_type(*args)
TypeError: must be string, not _io.TextIOWrapper


Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
On 2/6/2018 7:56 PM, Troy Curtis Jr wrote:
> Kenneth, I'm having trouble reproducing your issue.  Any other hints 
> at what might causing the trouble in your environment that you can 
> think of?  I've tried changing my locale, changing the files diffed to 
> being utf8, all with no luck.  Regardless your suggested change needs 
> to be done on my swig-py3 branch, since it for sure needs it for 
> Python 3, but I'd really like to understand what is going on here to 
> make sure the issue is well and truly resolved.

Alas, I'm not familiar with how Python resolves symbols, so I'm not sure 
how to find the true source of the problem. (I'm a long-time C/C++/asm 
programmer and hardware designer but very green with Python.) I'll see 
if I can reproduce it with a "hello world" program to see if it's 
specific to the symbols being resolved in the library or some more 
general problem with this build of Python.


Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
>> to temporary file. Update isn't needed, and the code just needs to dump
>> the
>> raw data into a file for the external diff to access, so no
>> encoding/decoding should occur. Hence we should open the file in binary
>> mode. I just tested this edit and it seems to cure the problem.
>>
>> It looks like this line is the same since it was originally added in
>> r843330 and hasn't changed in Troy's swig-py3 branch.
>>
>
> I've been leaning heavily on the test coverage for validating my py3
> updates. At first glance it looks like this FileDiff isn't referenced in
> any existing test. I'll add a test and confirm the behavior, and then test
> with your fix, unless you'd like to do so.
>
>
Kenneth, I'm having trouble reproducing your issue.  Any other hints at
what might causing the trouble in your environment that you can think of?
I've tried changing my locale, changing the files diffed to being utf8, all
with no luck.  Regardless your suggested change needs to be done on my
swig-py3 branch, since it for sure needs it for Python 3, but I'd really
like to understand what is going on here to make sure the issue is well and
truly resolved.

Troy


>> >From my initial report in the users list:
>>
>> <https://svn.haxx.se/users/archive-2018-01/0094.shtml>
>> <https://svn.haxx.se/users/archive-2018-02/0000.shtml>
>>
>> I'm using mailer.py in my post-commit hook and it's throwing a Unicode
>> type
>> error during the diff phase. Digging through the source code, I figured
>> out
>> that it's happening during the creation of the two temporary files for
>> diff'ing. Somehow the output file is getting opened in Unicode text mode
>> but the input source (the Subversion object stream) is a raw byte stream.
>> The write call fails.
>>
>> OS: CentOS 7.4
>> subversion-python-1.7.14-11.el7_4.x86_64
>> python-2.7.5-58.el7.x86_64
>>
>>
>>

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
>> to temporary file. Update isn't needed, and the code just needs to dump
>> the
>> raw data into a file for the external diff to access, so no
>> encoding/decoding should occur. Hence we should open the file in binary
>> mode. I just tested this edit and it seems to cure the problem.
>>
>> It looks like this line is the same since it was originally added in
>> r843330 and hasn't changed in Troy's swig-py3 branch.
>>
>
> I've been leaning heavily on the test coverage for validating my py3
> updates. At first glance it looks like this FileDiff isn't referenced in
> any existing test. I'll add a test and confirm the behavior, and then test
> with your fix, unless you'd like to do so.
>
>
Kenneth, I'm having trouble reproducing your issue.  Any other hints at
what might causing the trouble in your environment that you can think of?
I've tried changing my locale, changing the files diffed to being utf8, all
with no luck.  Regardless your suggested change needs to be done on my
swig-py3 branch, since it for sure needs it for Python 3, but I'd really
like to understand what is going on here to make sure the issue is well and
truly resolved.

Troy


>> >From my initial report in the users list:
>>
>> <https://svn.haxx.se/users/archive-2018-01/0094.shtml>
>> <https://svn.haxx.se/users/archive-2018-02/0000.shtml>
>>
>> I'm using mailer.py in my post-commit hook and it's throwing a Unicode
>> type
>> error during the diff phase. Digging through the source code, I figured
>> out
>> that it's happening during the creation of the two temporary files for
>> diff'ing. Somehow the output file is getting opened in Unicode text mode
>> but the input source (the Subversion object stream) is a raw byte stream.
>> The write call fails.
>>
>> OS: CentOS 7.4
>> subversion-python-1.7.14-11.el7_4.x86_64
>> python-2.7.5-58.el7.x86_64
>>
>>
>>

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Thu, Feb 1, 2018, 12:34 PM Kenneth Porter <sh...@sewingwitch.com> wrote:

> [moving discussion to dev list as I think this is now the correct fix.]
>
> --On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
>
> > --On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
> > <sh...@sewingwitch.com> wrote:
> >
> >>     fp = builtins.open(file, 'w+') # avoid namespace clash with
> >>                                    # trimmed-down svn_fs_open()
> >
> > I'm now thinking the problem is in the open call, and that I'm somehow
> > getting a Python 3 open function even though I've got Python 2.7
> > installed. Should the mode be 'wb' instead of 'w+'? That would insure
> > that the raw data from the Subversion object is getting dumped into the
> > temporary fle without interpretation. I don't understand why update
> > (denoted by the plus) is wanted. The temp file isn't being read from.
>
>
>
That seems strange, for py3 sure, but certainly odd on py2. Perhaps your
locale is set to utf8? I'll have to research to see if that even makes
sense.

Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
> to temporary file. Update isn't needed, and the code just needs to dump the
> raw data into a file for the external diff to access, so no
> encoding/decoding should occur. Hence we should open the file in binary
> mode. I just tested this edit and it seems to cure the problem.
>
> It looks like this line is the same since it was originally added in
> r843330 and hasn't changed in Troy's swig-py3 branch.
>

I've been leaning heavily on the test coverage for validating my py3
updates. At first glance it looks like this FileDiff isn't referenced in
any existing test. I'll add a test and confirm the behavior, and then test
with your fix, unless you'd like to do so.

Troy


> >From my initial report in the users list:
>
> <https://svn.haxx.se/users/archive-2018-01/0094.shtml>
> <https://svn.haxx.se/users/archive-2018-02/0000.shtml>
>
> I'm using mailer.py in my post-commit hook and it's throwing a Unicode type
> error during the diff phase. Digging through the source code, I figured out
> that it's happening during the creation of the two temporary files for
> diff'ing. Somehow the output file is getting opened in Unicode text mode
> but the input source (the Subversion object stream) is a raw byte stream.
> The write call fails.
>
> OS: CentOS 7.4
> subversion-python-1.7.14-11.el7_4.x86_64
> python-2.7.5-58.el7.x86_64
>
>
>

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Thu, Feb 1, 2018, 12:34 PM Kenneth Porter <sh...@sewingwitch.com> wrote:

> [moving discussion to dev list as I think this is now the correct fix.]
>
> --On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
>
> > --On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
> > <sh...@sewingwitch.com> wrote:
> >
> >>     fp = builtins.open(file, 'w+') # avoid namespace clash with
> >>                                    # trimmed-down svn_fs_open()
> >
> > I'm now thinking the problem is in the open call, and that I'm somehow
> > getting a Python 3 open function even though I've got Python 2.7
> > installed. Should the mode be 'wb' instead of 'w+'? That would insure
> > that the raw data from the Subversion object is getting dumped into the
> > temporary fle without interpretation. I don't understand why update
> > (denoted by the plus) is wanted. The temp file isn't being read from.
>
>
>
That seems strange, for py3 sure, but certainly odd on py2. Perhaps your
locale is set to utf8? I'll have to research to see if that even makes
sense.

Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
> to temporary file. Update isn't needed, and the code just needs to dump the
> raw data into a file for the external diff to access, so no
> encoding/decoding should occur. Hence we should open the file in binary
> mode. I just tested this edit and it seems to cure the problem.
>
> It looks like this line is the same since it was originally added in
> r843330 and hasn't changed in Troy's swig-py3 branch.
>

I've been leaning heavily on the test coverage for validating my py3
updates. At first glance it looks like this FileDiff isn't referenced in
any existing test. I'll add a test and confirm the behavior, and then test
with your fix, unless you'd like to do so.

Troy


> >From my initial report in the users list:
>
> <https://svn.haxx.se/users/archive-2018-01/0094.shtml>
> <https://svn.haxx.se/users/archive-2018-02/0000.shtml>
>
> I'm using mailer.py in my post-commit hook and it's throwing a Unicode type
> error during the diff phase. Digging through the source code, I figured out
> that it's happening during the creation of the two temporary files for
> diff'ing. Somehow the output file is getting opened in Unicode text mode
> but the input source (the Subversion object stream) is a raw byte stream.
> The write call fails.
>
> OS: CentOS 7.4
> subversion-python-1.7.14-11.el7_4.x86_64
> python-2.7.5-58.el7.x86_64
>
>
>

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
[moving discussion to dev list as I think this is now the correct fix.]

--On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

> --On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
>
>>     fp = builtins.open(file, 'w+') # avoid namespace clash with
>>                                    # trimmed-down svn_fs_open()
>
> I'm now thinking the problem is in the open call, and that I'm somehow
> getting a Python 3 open function even though I've got Python 2.7
> installed. Should the mode be 'wb' instead of 'w+'? That would insure
> that the raw data from the Subversion object is getting dumped into the
> temporary fle without interpretation. I don't understand why update
> (denoted by the plus) is wanted. The temp file isn't being read from.

Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object 
to temporary file. Update isn't needed, and the code just needs to dump the 
raw data into a file for the external diff to access, so no 
encoding/decoding should occur. Hence we should open the file in binary 
mode. I just tested this edit and it seems to cure the problem.

It looks like this line is the same since it was originally added in 
r843330 and hasn't changed in Troy's swig-py3 branch.

>From my initial report in the users list:

<https://svn.haxx.se/users/archive-2018-01/0094.shtml>
<https://svn.haxx.se/users/archive-2018-02/0000.shtml>

I'm using mailer.py in my post-commit hook and it's throwing a Unicode type 
error during the diff phase. Digging through the source code, I figured out 
that it's happening during the creation of the two temporary files for 
diff'ing. Somehow the output file is getting opened in Unicode text mode 
but the input source (the Subversion object stream) is a raw byte stream. 
The write call fails.

OS: CentOS 7.4
subversion-python-1.7.14-11.el7_4.x86_64
python-2.7.5-58.el7.x86_64



Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
[moving discussion to dev list as I think this is now the correct fix.]

--On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

> --On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
>
>>     fp = builtins.open(file, 'w+') # avoid namespace clash with
>>                                    # trimmed-down svn_fs_open()
>
> I'm now thinking the problem is in the open call, and that I'm somehow
> getting a Python 3 open function even though I've got Python 2.7
> installed. Should the mode be 'wb' instead of 'w+'? That would insure
> that the raw data from the Subversion object is getting dumped into the
> temporary fle without interpretation. I don't understand why update
> (denoted by the plus) is wanted. The temp file isn't being read from.

Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object 
to temporary file. Update isn't needed, and the code just needs to dump the 
raw data into a file for the external diff to access, so no 
encoding/decoding should occur. Hence we should open the file in binary 
mode. I just tested this edit and it seems to cure the problem.

It looks like this line is the same since it was originally added in 
r843330 and hasn't changed in Troy's swig-py3 branch.

>From my initial report in the users list:

<https://svn.haxx.se/users/archive-2018-01/0094.shtml>
<https://svn.haxx.se/users/archive-2018-02/0000.shtml>

I'm using mailer.py in my post-commit hook and it's throwing a Unicode type 
error during the diff phase. Digging through the source code, I figured out 
that it's happening during the creation of the two temporary files for 
diff'ing. Somehow the output file is getting opened in Unicode text mode 
but the input source (the Subversion object stream) is a raw byte stream. 
The write call fails.

OS: CentOS 7.4
subversion-python-1.7.14-11.el7_4.x86_64
python-2.7.5-58.el7.x86_64



Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Wed, Jan 31, 2018 at 10:40 PM, Kenneth Porter <sh...@sewingwitch.com> wrote:
> --On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
>
>>     fp = builtins.open(file, 'w+') # avoid namespace clash with
>>                                    # trimmed-down svn_fs_open()
>
>
> I'm now thinking the problem is in the open call, and that I'm somehow
> getting a Python 3 open function even though I've got Python 2.7 installed.
> Should the mode be 'wb' instead of 'w+'? That would insure that the raw data
> from the Subversion object is getting dumped into the temporary fle without
> interpretation. I don't understand why update (denoted by the plus) is
> wanted. The temp file isn't being read from.

If you are on RHEL based operating systems, some of them install both,
and sometimes with some fascinating "PATH" settings. It can be very
handy to edit customized .py scripts to explicitly call the version of
Python you reall want to use.

Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

>     fp = builtins.open(file, 'w+') # avoid namespace clash with
>                                    # trimmed-down svn_fs_open()

I'm now thinking the problem is in the open call, and that I'm somehow 
getting a Python 3 open function even though I've got Python 2.7 installed. 
Should the mode be 'wb' instead of 'w+'? That would insure that the raw 
data from the Subversion object is getting dumped into the temporary fle 
without interpretation. I don't understand why update (denoted by the plus) 
is wanted. The temp file isn't being read from.



Re: mailer.py commit says TypeError: must be unicode, not str

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Tuesday, January 23, 2018 7:14 AM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

>    File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 87, in
> _dump_contents
>      fp.write(chunk)
> TypeError: must be unicode, not str

Here's the code where this is going wrong. I think svn_stream_read is 
returning a byte stream and the file object here is expecting a unicode 
string. Is there a missing decode('utf-8') call? (I'm a very new Python 
coder but have lots of experience in C++ and some understanding of unicode.)

Package is subversion-python-1.7.14-11.el7_4.x86_64 in CentOS 7.4.

>From /usr/lib64/python2.7/site-packages/svn/fs.py

  def _dump_contents(self, file, root, path, pool=None):
    fp = builtins.open(file, 'w+') # avoid namespace clash with
                                   # trimmed-down svn_fs_open()
    if path is not None:
      stream = file_contents(root, path, pool)
      try:
        while True:
          chunk = _svncore.svn_stream_read(stream, 
_svncore.SVN_STREAM_CHUNK_SIZE)
          if not chunk:
            break
          fp.write(chunk)
      finally:
        _svncore.svn_stream_close(stream)
    fp.close()

BTW, I found this nice treatment of unicode in Python 2 and 3:

<https://nedbatchelder.com/text/unipain.html>