You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org> on 2019/01/14 09:30:12 UTC

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

On 1/14/19 3:10 PM, Yasuhito FUTATSUKI wrote:
> In article <CA...@mail.gmail.com>
> troycurtisjr@gmail.com writes:
>> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
>> <fu...@yf.bsdclub.org> wrote:
> 
>>> The patch attached modifies 4 kind of input argment translations.
>>>
>>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>>>       typemap(in) const char * MAY_BE_NULL; allows NULL
>>> These had done by using 'parse=' typemap modifier, however there is no
>>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>>> and use it in for new typemap definition.
>>>
>>> consideration:
>>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>>>     - It isn't allow Unicode for input, however 's' and 'z' format units
>>>      PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>>>      it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>>>      add code to conversion for py2)
>>
>>
>> Yes I think you should support Unicode in this case, but it turns out
>> you are most of the way there. If you just remove the IS_PY3
>> conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
>> functions are wrappers provided by the py3c library.  The names point
>> to the concept that they target, and then map it appropriately in Py2
>> and Py3. So
>>
>> PyBytes: Sequence of byte values, e.g. "raw data"
>>     In Py2: str
>>     In Py3: bytes
>>
>> PyStr: Character data
>>     In Py2: Unicode
>>     In Py3: str
> 
> Unfortunately, PyStr in py3c compatibility layer API is the intersection
> of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
> use PyUnicode_* for handling Unicode in py2.

In py2, it seems there is no C API to return (const) char * buffer
corresponding to Unicode object directly In py3, PyUnicode_AsUTF8() returns
(const) char *, without extra object reference (and py3c uses it as the entity
of PyStr_AsUTF8()).

So, for py2, we should explicitly convert Unicode object into new Bytes object
and hold its reference while API call, then release it after API call
in %typemap(freearg).

Alternatively, revert to use %typemap(in, parse="s"), %typemap(in, parse="z")
in py2 (only).

-- 
Yasuhito FUTATSUKI

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2/20/19 11:05 PM, Troy Curtis Jr wrote:
> On Sun, Feb 17, 2019, 10:32 AM Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> 
>> Yasuhito FUTATSUKI wrote on Sun, 17 Feb 2019 06:13 +00:00:
>>> On 2/17/19 9:04 AM, Daniel Shahaf wrote:

>>> * If we want to install py2 bindings and py3 bindings at the same time,
>>>     we must rename libsvn_swig_py library for py3.
>>
>> Good question, but I don't know the answer.
>>
> 
> Do we think we want to address this directly, or rely on packagers to do
> whatever makes since for their environment? Perhaps we should just include
> an easy, optional, library name override option?

I think the former is better to save users who build from source from
unintentionally overwrite. It is also convenient to distinguish what
version of Python has been used if target Python major/minor version
number is added to libsvn_swig_py for py3 just like libpythonX.Ym is.

However, it doesn't intend to separate build tree immediately. To build
py3 bindings in addition to py2 bindings (or contrary), as we just do
currently, (1) clean swig-py stuff, (2) configure or gen-make.py again,
(3) and then rebuild swig-py.

Isn't it sufficient to include an easy, optional, library name override
option with different default value between py2 and py3?

Cheers,
-- 
Yasuhito FUTATSUKI

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Sun, Feb 17, 2019, 10:32 AM Daniel Shahaf <d....@daniel.shahaf.name> wrote:

> Yasuhito FUTATSUKI wrote on Sun, 17 Feb 2019 06:13 +00:00:
> > On 2/17/19 9:04 AM, Daniel Shahaf wrote:
> >
> > > I was wondering what are the blockers for merging swig-py3 into trunk?
> > > The only item in BRANCH-README is "Fix a compile error on windows with
> > > py3", but I don't even know if that's a current issue or an old one
> that has
> > > since been fixed.
> >
> > I'm sorry I'm not sure the compile error still exists or not. (I don't
> > have Windows build environment yet)
> >
>
> There's no need to apologize; we don't expect every developer to be able
> to build on every platform.  If you are able to set up a Windows build
> environment that would be very helpful, of course, but if not, a Windows
> developer can do that.
>
> If you just need a quick tests run, you can also trigger buildbot builds
> of the swig-py3 branch on Windows by joining #svn-dev on IRC and
> issueing an appropriate command to buildbot's IRC bot.
>

The Py3 windows build issue still exists. I have started rebuilding my
Windows Dev environment to see about addressing it, but I do not yet have a
functioning build system. :P

It does sound like there may be a Py2 issue introduced now as well. I could
see the Py2 issue being a blocker for a trunk merge, but presumably the Py3
wouldn't necessarily be one.


> > The only thing I know about it, my patch committed on r1849784 might
> break
> > Windows build because of "PY3" macro for swig which added only Unix/Linux
> > build. However, it might not be a compile error, but might produce
> improper
> > codes under py3 environment, and this might be dissolved on r1853592,
> > for "PY3" macro in swig typemaps no longer used.
> >
>
> OK.
>
> >
> > I think there are still some other points need to look over.
> >
> > * Do scrpts under tools/ work on py3?
>
> Good call: the branch should not cause regressions in tools/ scripts
> that use the SWIG bindings.  That said, I would assume that Python
> scripts that don't use the swig bindings are orthogonal to the branch;
> they _should_ be ported to Python 3, but that work can happen on trunk,
> before or after the merge.
>
> > * If we want to install py2 bindings and py3 bindings at the same time,
> >    we must rename libsvn_swig_py library for py3.
>
> Good question, but I don't know the answer.
>

Do we think we want to address this directly, or rely on packagers to do
whatever makes since for their environment? Perhaps we should just include
an easy, optional, library name override option?


> Thanks,
>
> Daniel
>

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Sun, 17 Feb 2019 06:13 +00:00:
> On 2/17/19 9:04 AM, Daniel Shahaf wrote:
> 
> > I was wondering what are the blockers for merging swig-py3 into trunk?
> > The only item in BRANCH-README is "Fix a compile error on windows with
> > py3", but I don't even know if that's a current issue or an old one that has
> > since been fixed.
> 
> I'm sorry I'm not sure the compile error still exists or not. (I don't
> have Windows build environment yet)
> 

There's no need to apologize; we don't expect every developer to be able
to build on every platform.  If you are able to set up a Windows build
environment that would be very helpful, of course, but if not, a Windows
developer can do that.

If you just need a quick tests run, you can also trigger buildbot builds
of the swig-py3 branch on Windows by joining #svn-dev on IRC and
issueing an appropriate command to buildbot's IRC bot.

> The only thing I know about it, my patch committed on r1849784 might break
> Windows build because of "PY3" macro for swig which added only Unix/Linux
> build. However, it might not be a compile error, but might produce improper
> codes under py3 environment, and this might be dissolved on r1853592,
> for "PY3" macro in swig typemaps no longer used.
> 

OK.

> 
> I think there are still some other points need to look over.
> 
> * Do scrpts under tools/ work on py3?

Good call: the branch should not cause regressions in tools/ scripts
that use the SWIG bindings.  That said, I would assume that Python
scripts that don't use the swig bindings are orthogonal to the branch;
they _should_ be ported to Python 3, but that work can happen on trunk,
before or after the merge.

> * If we want to install py2 bindings and py3 bindings at the same time,
>    we must rename libsvn_swig_py library for py3.

Good question, but I don't know the answer.

Thanks,

Daniel

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 2/17/19 9:04 AM, Daniel Shahaf wrote:

> I was wondering what are the blockers for merging swig-py3 into trunk?
> The only item in BRANCH-README is "Fix a compile error on windows with
> py3", but I don't even know if that's a current issue or an old one that has
> since been fixed.

I'm sorry I'm not sure the compile error still exists or not. (I don't
have Windows build environment yet)

The only thing I know about it, my patch committed on r1849784 might break
Windows build because of "PY3" macro for swig which added only Unix/Linux
build. However, it might not be a compile error, but might produce improper
codes under py3 environment, and this might be dissolved on r1853592,
for "PY3" macro in swig typemaps no longer used.


I think there are still some other points need to look over.

* Do scrpts under tools/ work on py3?
* If we want to install py2 bindings and py3 bindings at the same time,
   we must rename libsvn_swig_py library for py3.

Cheers,
-- 
Yasuhito FUTATSUKI



Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Sat, 16 Feb 2019 07:37 +00:00:
> On 1/20/19 5:50 PM, Yasuhito FUTATSUKI wrote:
> > Patch updated.
> 
> I've commited it with modifications below, as r1853592.

Thanks!

I was wondering what are the blockers for merging swig-py3 into trunk?
The only item in BRANCH-README is "Fix a compile error on windows with
py3", but I don't even know if that's a current issue or an old one that has
since been fixed.

Cheers,

Daniel

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 1/20/19 5:50 PM, Yasuhito FUTATSUKI wrote:
> Patch updated.

I've commited it with modifications below, as r1853592.

* Add support to convert py_file from unicode file name in
  svn_swig_py_make_file()
* Revise test case for svn_stream_write() in python/tests/core.py

Thanks,
-- 
Yasuhito FUTATSUKI

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
Patch updated.

In article <CA...@mail.gmail.com>
troycurtisjr@gmail.com writes:
> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
> <fu...@yf.bsdclub.org> wrote:

>> The patch attached modifies 4 kind of input argment translations.
>>
>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>>      typemap(in) const char * MAY_BE_NULL; allows NULL
>> These had done by using 'parse=' typemap modifier, however there is no
>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>> and use it in for new typemap definition.
>>
>> consideration:
>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>>    - It isn't allow Unicode for input, however 's' and 'z' format units
>>     PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>>     it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>>     add code to conversion for py2)
> 
> 
> Yes I think you should support Unicode in this case, but it turns out
> you are most of the way there. If you just remove the IS_PY3
> conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
> functions are wrappers provided by the py3c library.  The names point
> to the concept that they target, and then map it appropriately in Py2
> and Py3. So
> 
> PyBytes: Sequence of byte values, e.g. "raw data"
>    In Py2: str
>    In Py3: bytes
> 
> PyStr: Character data
>    In Py2: Unicode
>    In Py3: str

Now it uses PyUnicode_Check() and PyStr_UTF8(), then it works in py2 and py3
without IS_PY3 conditional here.

>>    - Difference of TypeError exception message. Pyrg_Parse() reports
>>     argment by argnum in Python wrapper function. However it can't
>>     determin in typemap definition code, so my patch code uses argment
>>     symbol instead.

This is still left, if we treat it as a regression.

>> * For py3, it seems to need more kindly Exception message for
>>   UnicodeEncodeError, which can be caused if input str contains surrogate
>>   data (U+DC00 - U+DCFF).

I reconsidered it and abandoned, because of UnicodeEncodeError structure.
It is enought to analyse why it causes.

> * test case for UnicodeEncodeError is needed

Added.

>> (2) in core.i, typemap for svn_stream_write
>>
>> consideration:
>> * As this typemap doesn't seems to accept Unicode on py2, there seems to
>>   be no regression like described in (1)
>> * As this typemap only used for svn_stream_write wrapper which have only
>>   one char * arg, default UnicodeEncodeError message seems to be sufficient.

Not changed in this patch.

>> (3) typemap(in) apr_hash_t * (for various types)
>> These are using make_string_from_ob() for hash key string conversion,
>> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
>> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
>> make_svn_strinf_from_ob() for hash value conversion
>>
>> consideration:
>> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>>   but current implementation of conversion function doesn't allows. Isn't
>>   it needed? (I added test for this case, but disabled until it make clear)
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)
>
>
> Yes this looks like a good catch, the generic conversion function
> should handle the NULL/None case.

I've fixed it. To fix this, check key and value separately in
svn_swig_py_*_from_dict().

Test case for UnicodeEncodeError is added, and it detect make_svn_strinf_from_ob()
couldn't handle this case :) Yes, I've fixed it, of course.

>> (4) typemap(in) apr_array_header_t *STRINGLIST
>> This typemap is using svn_swig_py_unwrap_string() through the function
>> svn_swig_py_seq_to_array().
>>
>> consideration:
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)

Test case for UnicodeEncodeError is added in tests/client.py.


> And here is the start of my patch review:

I've intended to fix problems pointed out, except the comment for
svn_swig_py_string_to_cstring(). For svn_swig_py_string_to_cstring(),
I've added a comment for the reason of return type and why it seems to
sufficient not to duplicate string entity.

Thanks,
-- 
Yasuhito FUTATSUKI

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Posted by Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>.
On 1/14/19 6:30 PM, Yasuhito FUTATSUKI wrote:
> On 1/14/19 3:10 PM, Yasuhito FUTATSUKI wrote:
>> In article <CA...@mail.gmail.com>
>> troycurtisjr@gmail.com writes:
>>> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI


>>> PyBytes: Sequence of byte values, e.g. "raw data"
>>>     In Py2: str
>>>     In Py3: bytes
>>>
>>> PyStr: Character data
>>>     In Py2: Unicode
>>>     In Py3: str
>>
>> Unfortunately, PyStr in py3c compatibility layer API is the intersection
>> of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
>> use PyUnicode_* for handling Unicode in py2.

Aha, both of those are partially correct and partially wrong.

PyStr_Check:
     In Py2: PyString_Check, accept string and its subtype object.
             (not accept unicode object)
     In Py3: PyUnicode_Check, accept string and its subtype object.

PyStr_UTF8:
     In Py2: PyString_AsString, accept both of bytes(str) and unicode.
     In Py3: PyUnicode_AsUTF8, accept str(Unicode object).

and, PyUnicode_Check() is also provided for py2.

So, we can use code fragment like:

         ...
     if (PyBytes_Check(input))
       {
         retval = PyBytes_AsString(input);
       }
     else if (PyUnicode_Check(input))
       {
         retval = (char *)PyStr_AsUTF8(input);
       }
      else
          ...

-- 
Yasuhito FUTATSUKI