You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Terry Jones (JIRA)" <ji...@apache.org> on 2009/07/09 12:29:15 UTC

[jira] Commented: (THRIFT-395) Python library + compiler does not support unicode strings

    [ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729145#action_12729145 ] 

Terry Jones commented on THRIFT-395:
------------------------------------

It seems that a decision/consensus was almost reached here, specifically David's suggestion at http://bit.ly/ofFr0

Can we re-animate this issue and get it resolved?  I somehow skipped this discussion when it was going on as I knew (or thought I knew) that strings were sent as UTF-8 and was mistakenly assuming that the Python support did the Right Thing and that if an app passed a Python unicode object in a call you'd get a Python unicode object out on the other end. Last night I found out to my great surprise that that's not the case.

It would be *really* nice to have this resolved. Otherwise it's going to mean a bunch of crufty manual coding decoding. And it's made worse in our case as we have a dozen internal services that all speak to each other extensively using Thrift. So not only do we need to deal with outside clients being able to somehow pass unicode, we'd have to manually decode each arg in each method in each service, and then manually encode them again to call another Thrift method inside our own service. Either that or keep things as UTF-8 strings, which isn't an option.

The patches are in, and backwards compatibility is not an issue with David's suggestion. Real users need it ASAP to avoid real pain :-)  What's still stopping this from being resolved/applied/committed?

Terry


> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.2
>
>         Attachments: 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.