You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Nathan Beyer (JIRA)" <ji...@apache.org> on 2012/10/20 01:54:12 UTC

[jira] [Comment Edited] (THRIFT-1727) Ruby-1.9: data loss: "binary" fields are re-encoded

    [ https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480537#comment-13480537 ] 

Nathan Beyer edited comment on THRIFT-1727 at 10/19/12 11:53 PM:
-----------------------------------------------------------------

{quote}
XB added a comment - 17/Oct/12 21:26
Are there places where 'convert_to_utf8_buffer' is used for things other than Thrift 'string' fields?
Yes. Everywhere where 'convert_to_utf8_buffer' is used.{quote}

I'm going to assume you're not being snarky with this comment for the moment ...

As such, we must not be talking about the same thing because there are valid uses of that method. Rather than describing a use case to me, can you please provide either a spec/test or a simple code sample demonstrating the issue.
                
      was (Author: nbeyer):
    {quote}
XB added a comment - 17/Oct/12 21:26
Are there places where 'convert_to_utf8_buffer' is used for things other than Thrift 'string' fields?
Yes. Everywhere where 'convert_to_utf8_buffer' is used.{quote}

I'm going to assume your not being snarky with this comment for the moment ...

As such, we must not be talking about the same thing because there are valid uses of that method. Rather than describing a use case to me, can you please provide either a spec/test or a simple code sample demonstrating the issue.
                  
> Ruby-1.9: data loss: "binary" fields are re-encoded
> ---------------------------------------------------
>
>                 Key: THRIFT-1727
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1727
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.9
>         Environment: JRuby 1.6.8 using "--1.9" command line parameter.
>            Reporter: XB
>
> When setting a binary field of a Thrift object with some binary data (e.g. a string whose encoding is "ASCII-8BIT") and then serializing this object, the binary data is re-encoded. That is, it is encoded as if it were not a sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 encoding. This assumed ISO-8859-1 sequence of characters is then converted into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are converted into multi-byte sequences. This leads to data corruption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira