You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@allura.apache.org by Dave Brondsema <da...@brondsema.net> on 2015/04/23 22:27:17 UTC

__future__ unicode_literals

Fairly often we have accidental unicode issues that are a result of our code
using '...' instead of u'...'.  And then we come along with a simple fix like
https://forge-allura.apache.org/p/allura/git/ci/db076001f36e41d432f03c08da6fe7af785251fe/

If we had `from __future__ import unicode_literals` at the top of the file, then
'...' would be unicode by default, instead of str and we wouldn't have these
accidental bugs.  I'm interested in maybe putting that in all our files.  I
wonder if anything would break?

If we wanted to be even more forward-looking, we could do `from __future__
import absolute_import, division, print_function, unicode_literals`

Thoughts?

-- 
Dave Brondsema : dave@brondsema.net
http://www.brondsema.net : personal
http://www.splike.com : programming
              <><

Re: __future__ unicode_literals

Posted by Dave Brondsema <da...@brondsema.net>.
Interesting, I came across a different case of this today and discovered that %
interpolation will "upgrade" to unicode automatically but .format() doesn't.

    In [3]: '%s %s' % ('foo', u'b£ar')
    Out[3]: u'foo b\xc2\xa3ar'

    In [4]: '{} {}'.format('foo', u'b£ar')
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2:
ordinal not in range(128)


On 4/24/15 2:43 PM, Dave Brondsema wrote:
> I believe really_unicode will not change, since it deals with conversion and
> unicode_literals setting only affects literal strings we type into our .py files.
> 
> The standard library incompatibilities seem small to me:
> http://python-future.org/stdlib_incompatibilities.html#stdlib-incompatibilities
>  But it is quite possible we have other issues with other libraries, e.g. HTTP
> headers needing to be str type instead of unicode type.
> 
> I agree that it'd be worth trying it for a few hours and see how much work is
> needed.
> 
> -Dave
> 
> On 4/24/15 6:15 AM, Igor Bondarenko wrote:
>> I agree with Heith. I also like the idea and want Allura to move towards
>> python 3, but also read the article and there can be some subtle issues
>> with “global flag day” approach...
>>
>> I guess we can test how `h.really_unicode` will behave, but there are still
>> can exist hard to predict issues with standard library or something.
>>
>> Perhaps, we can spent a couple of hours to try this approach and see how
>> much will break by running tests and testing by hand. If a lot of things
>> breaks, maybe we need more incremental approach
>>
>> On Fri, Apr 24, 2015 at 12:47 AM, heith seewald <he...@gmail.com> wrote:
>>
>>> I had the same basic question yesterday while working on a unicode related
>>> issue.  I read this article:
>>> http://python-future.org/imports.html#unicode-literals  -- which broke
>>> down
>>> the pros/cons of using unicode_literals.
>>>
>>> My main concern was the number of modules being called from templates and
>>> ran through the 'h.really_unicode' helper function.  But that may not even
>>> be an issue.
>>>
>>> That said, I like the idea in theory.  In addition to reducing the number
>>> of unicode related errors now,  it would be one of the larger steps towards
>>> becoming python3 compatible (which may be useful in the distant future).
>>>
>>> On Thu, Apr 23, 2015 at 4:27 PM, Dave Brondsema <da...@brondsema.net>
>>> wrote:
>>>
>>>> Fairly often we have accidental unicode issues that are a result of our
>>>> code
>>>> using '...' instead of u'...'.  And then we come along with a simple fix
>>>> like
>>>>
>>>>
>>> https://forge-allura.apache.org/p/allura/git/ci/db076001f36e41d432f03c08da6fe7af785251fe/
>>>>
>>>> If we had `from __future__ import unicode_literals` at the top of the
>>>> file, then
>>>> '...' would be unicode by default, instead of str and we wouldn't have
>>>> these
>>>> accidental bugs.  I'm interested in maybe putting that in all our
>>> files.  I
>>>> wonder if anything would break?
>>>>
>>>> If we wanted to be even more forward-looking, we could do `from
>>> __future__
>>>> import absolute_import, division, print_function, unicode_literals`
>>>>
>>>> Thoughts?
>>>>
>>>> --
>>>> Dave Brondsema : dave@brondsema.net
>>>> http://www.brondsema.net : personal
>>>> http://www.splike.com : programming
>>>>               <><
>>>>
>>>
>>
> 
> 
> 



-- 
Dave Brondsema : dave@brondsema.net
http://www.brondsema.net : personal
http://www.splike.com : programming
              <><

Re: __future__ unicode_literals

Posted by Dave Brondsema <da...@brondsema.net>.
I believe really_unicode will not change, since it deals with conversion and
unicode_literals setting only affects literal strings we type into our .py files.

The standard library incompatibilities seem small to me:
http://python-future.org/stdlib_incompatibilities.html#stdlib-incompatibilities
 But it is quite possible we have other issues with other libraries, e.g. HTTP
headers needing to be str type instead of unicode type.

I agree that it'd be worth trying it for a few hours and see how much work is
needed.

-Dave

On 4/24/15 6:15 AM, Igor Bondarenko wrote:
> I agree with Heith. I also like the idea and want Allura to move towards
> python 3, but also read the article and there can be some subtle issues
> with “global flag day” approach...
> 
> I guess we can test how `h.really_unicode` will behave, but there are still
> can exist hard to predict issues with standard library or something.
> 
> Perhaps, we can spent a couple of hours to try this approach and see how
> much will break by running tests and testing by hand. If a lot of things
> breaks, maybe we need more incremental approach
> 
> On Fri, Apr 24, 2015 at 12:47 AM, heith seewald <he...@gmail.com> wrote:
> 
>> I had the same basic question yesterday while working on a unicode related
>> issue.  I read this article:
>> http://python-future.org/imports.html#unicode-literals  -- which broke
>> down
>> the pros/cons of using unicode_literals.
>>
>> My main concern was the number of modules being called from templates and
>> ran through the 'h.really_unicode' helper function.  But that may not even
>> be an issue.
>>
>> That said, I like the idea in theory.  In addition to reducing the number
>> of unicode related errors now,  it would be one of the larger steps towards
>> becoming python3 compatible (which may be useful in the distant future).
>>
>> On Thu, Apr 23, 2015 at 4:27 PM, Dave Brondsema <da...@brondsema.net>
>> wrote:
>>
>>> Fairly often we have accidental unicode issues that are a result of our
>>> code
>>> using '...' instead of u'...'.  And then we come along with a simple fix
>>> like
>>>
>>>
>> https://forge-allura.apache.org/p/allura/git/ci/db076001f36e41d432f03c08da6fe7af785251fe/
>>>
>>> If we had `from __future__ import unicode_literals` at the top of the
>>> file, then
>>> '...' would be unicode by default, instead of str and we wouldn't have
>>> these
>>> accidental bugs.  I'm interested in maybe putting that in all our
>> files.  I
>>> wonder if anything would break?
>>>
>>> If we wanted to be even more forward-looking, we could do `from
>> __future__
>>> import absolute_import, division, print_function, unicode_literals`
>>>
>>> Thoughts?
>>>
>>> --
>>> Dave Brondsema : dave@brondsema.net
>>> http://www.brondsema.net : personal
>>> http://www.splike.com : programming
>>>               <><
>>>
>>
> 



-- 
Dave Brondsema : dave@brondsema.net
http://www.brondsema.net : personal
http://www.splike.com : programming
              <><

Re: __future__ unicode_literals

Posted by Igor Bondarenko <je...@gmail.com>.
I agree with Heith. I also like the idea and want Allura to move towards
python 3, but also read the article and there can be some subtle issues
with “global flag day” approach...

I guess we can test how `h.really_unicode` will behave, but there are still
can exist hard to predict issues with standard library or something.

Perhaps, we can spent a couple of hours to try this approach and see how
much will break by running tests and testing by hand. If a lot of things
breaks, maybe we need more incremental approach

On Fri, Apr 24, 2015 at 12:47 AM, heith seewald <he...@gmail.com> wrote:

> I had the same basic question yesterday while working on a unicode related
> issue.  I read this article:
> http://python-future.org/imports.html#unicode-literals  -- which broke
> down
> the pros/cons of using unicode_literals.
>
> My main concern was the number of modules being called from templates and
> ran through the 'h.really_unicode' helper function.  But that may not even
> be an issue.
>
> That said, I like the idea in theory.  In addition to reducing the number
> of unicode related errors now,  it would be one of the larger steps towards
> becoming python3 compatible (which may be useful in the distant future).
>
> On Thu, Apr 23, 2015 at 4:27 PM, Dave Brondsema <da...@brondsema.net>
> wrote:
>
> > Fairly often we have accidental unicode issues that are a result of our
> > code
> > using '...' instead of u'...'.  And then we come along with a simple fix
> > like
> >
> >
> https://forge-allura.apache.org/p/allura/git/ci/db076001f36e41d432f03c08da6fe7af785251fe/
> >
> > If we had `from __future__ import unicode_literals` at the top of the
> > file, then
> > '...' would be unicode by default, instead of str and we wouldn't have
> > these
> > accidental bugs.  I'm interested in maybe putting that in all our
> files.  I
> > wonder if anything would break?
> >
> > If we wanted to be even more forward-looking, we could do `from
> __future__
> > import absolute_import, division, print_function, unicode_literals`
> >
> > Thoughts?
> >
> > --
> > Dave Brondsema : dave@brondsema.net
> > http://www.brondsema.net : personal
> > http://www.splike.com : programming
> >               <><
> >
>

Re: __future__ unicode_literals

Posted by heith seewald <he...@gmail.com>.
I had the same basic question yesterday while working on a unicode related
issue.  I read this article:
http://python-future.org/imports.html#unicode-literals  -- which broke down
the pros/cons of using unicode_literals.

My main concern was the number of modules being called from templates and
ran through the 'h.really_unicode' helper function.  But that may not even
be an issue.

That said, I like the idea in theory.  In addition to reducing the number
of unicode related errors now,  it would be one of the larger steps towards
becoming python3 compatible (which may be useful in the distant future).

On Thu, Apr 23, 2015 at 4:27 PM, Dave Brondsema <da...@brondsema.net> wrote:

> Fairly often we have accidental unicode issues that are a result of our
> code
> using '...' instead of u'...'.  And then we come along with a simple fix
> like
>
> https://forge-allura.apache.org/p/allura/git/ci/db076001f36e41d432f03c08da6fe7af785251fe/
>
> If we had `from __future__ import unicode_literals` at the top of the
> file, then
> '...' would be unicode by default, instead of str and we wouldn't have
> these
> accidental bugs.  I'm interested in maybe putting that in all our files.  I
> wonder if anything would break?
>
> If we wanted to be even more forward-looking, we could do `from __future__
> import absolute_import, division, print_function, unicode_literals`
>
> Thoughts?
>
> --
> Dave Brondsema : dave@brondsema.net
> http://www.brondsema.net : personal
> http://www.splike.com : programming
>               <><
>