You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "XB (JIRA)" <ji...@apache.org> on 2012/10/11 19:33:03 UTC

[jira] [Created] (THRIFT-1726) Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"

XB created THRIFT-1726:
--------------------------

             Summary: Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"
                 Key: THRIFT-1726
                 URL: https://issues.apache.org/jira/browse/THRIFT-1726
             Project: Thrift
          Issue Type: Bug
          Components: Ruby - Library
    Affects Versions: 0.9
         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
            Reporter: XB


When reading a thrift object using the Thrift::BinaryProtocol and this thrift object has a field of type "binary", then accessing this field yields a string whose encoding is "UTF-8".
The encoding should be "ASCII-8BIT" instead. It may be right to assume that "string" fields have a character encoding (such as "UTF-8"), but "binary" fields do not have a character encoding at all. For these cases, there is the pseudo-encoding "ASCII-8BIT" where we deal just with an opaque sequence of bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (THRIFT-1726) Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"

Posted by "XB (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

XB updated THRIFT-1726:
-----------------------

    Patch Info: Patch Available
    
> Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"
> -----------------------------------------------------------------------------
>
>                 Key: THRIFT-1726
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1726
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When reading a thrift object using the Thrift::BinaryProtocol and this thrift object has a field of type "binary", then accessing this field yields a string whose encoding is "UTF-8".
> The encoding should be "ASCII-8BIT" instead. It may be right to assume that "string" fields have a character encoding (such as "UTF-8"), but "binary" fields do not have a character encoding at all. For these cases, there is the pseudo-encoding "ASCII-8BIT" where we deal just with an opaque sequence of bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1726) Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"

Posted by "XB (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474383#comment-13474383 ] 

XB commented on THRIFT-1726:
----------------------------

This patch should fix this issue:
{noformat}
diff --git a/lib/rb/lib/thrift/struct_union.rb b/lib/rb/lib/thrift/struct_union.rb
index 4e0afcf..7df859c 100644
--- a/lib/rb/lib/thrift/struct_union.rb
+++ b/lib/rb/lib/thrift/struct_union.rb
@@ -100,6 +100,12 @@ module Thrift
           end
         end
         iprot.read_set_end
+      when Types::STRING
+        if field[:binary]
+          value = Bytes.force_binary_encoding(iprot.read_type(field[:type]))
+        else
+          value = iprot.read_type(field[:type])
+        end
       else
         value = iprot.read_type(field[:type])
       end
{noformat}
                
> Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"
> -----------------------------------------------------------------------------
>
>                 Key: THRIFT-1726
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1726
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When reading a thrift object using the Thrift::BinaryProtocol and this thrift object has a field of type "binary", then accessing this field yields a string whose encoding is "UTF-8".
> The encoding should be "ASCII-8BIT" instead. It may be right to assume that "string" fields have a character encoding (such as "UTF-8"), but "binary" fields do not have a character encoding at all. For these cases, there is the pseudo-encoding "ASCII-8BIT" where we deal just with an opaque sequence of bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1726) Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"

Posted by "XB (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474371#comment-13474371 ] 

XB commented on THRIFT-1726:
----------------------------

This is related to the fixes of https://issues.apache.org/jira/browse/THRIFT-1023
                
> Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"
> -----------------------------------------------------------------------------
>
>                 Key: THRIFT-1726
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1726
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When reading a thrift object using the Thrift::BinaryProtocol and this thrift object has a field of type "binary", then accessing this field yields a string whose encoding is "UTF-8".
> The encoding should be "ASCII-8BIT" instead. It may be right to assume that "string" fields have a character encoding (such as "UTF-8"), but "binary" fields do not have a character encoding at all. For these cases, there is the pseudo-encoding "ASCII-8BIT" where we deal just with an opaque sequence of bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1726) Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"

Posted by "Nathan Beyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477532#comment-13477532 ] 

Nathan Beyer commented on THRIFT-1726:
--------------------------------------

[~xb] Can you add some test cases to your patch?
                
> Ruby-1.9: "binary" fields are represented by string whose encoding is "UTF-8"
> -----------------------------------------------------------------------------
>
>                 Key: THRIFT-1726
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1726
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When reading a thrift object using the Thrift::BinaryProtocol and this thrift object has a field of type "binary", then accessing this field yields a string whose encoding is "UTF-8".
> The encoding should be "ASCII-8BIT" instead. It may be right to assume that "string" fields have a character encoding (such as "UTF-8"), but "binary" fields do not have a character encoding at all. For these cases, there is the pseudo-encoding "ASCII-8BIT" where we deal just with an opaque sequence of bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira