You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Alessandro Morandi (JIRA)" <ji...@apache.org> on 2011/06/28 18:24:16 UTC

[jira] [Created] (THRIFT-1224) Cannot insert UTF-8 text

Cannot insert UTF-8 text
------------------------

                 Key: THRIFT-1224
                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
             Project: Thrift
          Issue Type: Bug
          Components: Ruby - Library
    Affects Versions: 0.6
         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
            Reporter: Alessandro Morandi


I can't seem to find a way to save UTF-8 data into Cassandra.

I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).

As an example, the following code

bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})

will raise an exception:

bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}


The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.

This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Issue Comment Edited] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Alexis (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151134#comment-13151134 ] 

Alexis edited comment on THRIFT-1224 at 11/17/11 6:02 PM:
----------------------------------------------------------

That's a Ruby 1.9 issue.
As suggested we convert non ASCII strings to binary before writing them: In ~/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/protocol/binary_protocol.rb, this is a patch suggestion:

{code}
110     def write_string(str)
111       if str.encoding.to_s != "US-ASCII"
112         str = str.unpack("a*").first
113       end
114       write_i32(str.length)
115       trans.write(str)
116     end
{code}

                
      was (Author: alexis779):
    That's a Ruby 1.9 issue.
Trying to convert UTF-8 strings in write /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb with something like a force_encoding("UTF-8"):

{code}
 85       str = sz ? buf[0...sz] : buf
 86       if str.encoding.to_s == "UTF-8"
 87         str = str.unpack("a*").first
 88       end
 89       @wbuf << str
{code}

there is an exception for socket readability: "Socket: Timed out reading 4 bytes from 127.0.0.1:9160". Any clue?

{noformat}
/Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/base_transport.rb:87:in `read_all'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb:105:in `read_frame'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/protocol/binary_protocol.rb:192:in `read_i32'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/protocol/binary_protocol.rb:118:in `read_message_begin'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/client.rb:45:in `receive_message'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/vendor/0.8/gen-rb/cassandra.rb:251:in `recv_batch_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/vendor/0.8/gen-rb/cassandra.rb:243:in `batch_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/lib/cassandra/protocol.rb:7:in `_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/lib/cassandra/cassandra.rb:459:in `insert'
{noformat} 
                  
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.8
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059143#comment-13059143 ] 

Jonathan Ellis commented on THRIFT-1224:
----------------------------------------

is this the ruby equivalent of THRIFT-395?

> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Alexis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151134#comment-13151134 ] 

Alexis commented on THRIFT-1224:
--------------------------------

That's a Ruby 1.9 issue.
Trying to convert UTF-8 strings in write /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb with something like a force_encoding("UTF-8"):

{code}
 85       str = sz ? buf[0...sz] : buf
 86       if str.encoding.to_s == "UTF-8"
 87         str = str.unpack("a*").first
 88       end
 89       @wbuf << str
{code}

there is an exception for socket readability: "Socket: Timed out reading 4 bytes from 127.0.0.1:9160". Any clue?

{noformat}
/Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/base_transport.rb:87:in `read_all'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb:105:in `read_frame'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/protocol/binary_protocol.rb:192:in `read_i32'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/protocol/binary_protocol.rb:118:in `read_message_begin'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/client.rb:45:in `receive_message'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/vendor/0.8/gen-rb/cassandra.rb:251:in `recv_batch_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/vendor/0.8/gen-rb/cassandra.rb:243:in `batch_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/lib/cassandra/protocol.rb:7:in `_mutate'
	from /Users/alexis/.rvm/gems/ruby-1.9.2-p290/gems/cassandra-0.12.1/lib/cassandra/cassandra.rb:459:in `insert'
{noformat} 
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Alessandro Morandi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Morandi updated THRIFT-1224:
---------------------------------------

    Description: 
I can't seem to find a way to save UTF-8 data into Cassandra.

I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).

As an example, the following code[1]

bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})

will raise an exception:

bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}


The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.

This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.

[1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

  was:
I can't seem to find a way to save UTF-8 data into Cassandra.

I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).

As an example, the following code

bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})

will raise an exception:

bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}


The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.

This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.


> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Assigned] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Jake Farrell (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Farrell reassigned THRIFT-1224:
------------------------------------

    Assignee: Jake Farrell
    
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>            Assignee: Jake Farrell
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Arya Goudarzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420062#comment-13420062 ] 

Arya Goudarzi commented on THRIFT-1224:
---------------------------------------

Alexis solution has fixed most of our issues with inserting UTF-8 strings into Cassandra, however, we have now been getting the classic 'end of file reached' error sporadically which tells me it somehow messed up the proper frame for the transport.
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>            Assignee: Jake Farrell
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Jake Farrell (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Farrell updated THRIFT-1224:
---------------------------------

    Fix Version/s: 0.8
    
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.8
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056653#comment-13056653 ] 

Bryan Duxbury commented on THRIFT-1224:
---------------------------------------

Sounds like we should detect UTF-8 strings and convert them before concatenating?

> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Closed] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Jake Farrell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Farrell closed THRIFT-1224.
--------------------------------

    Resolution: Duplicate

Duplicates THRIFT-1023. closing to avoid multiple threads for this issue.
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>            Assignee: Jake Farrell
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Sven Casimir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204748#comment-13204748 ] 

Sven Casimir commented on THRIFT-1224:
--------------------------------------

Also of importance might be that I am perfectly able to insert UTF8 strings (as column values) into cassandra when I'm not using super column families (at least it works with german umlauts) like this

insert(cf, key, "test" => "Ä")

It won't work with super column families, e.g.

insert(cf, key, "super" => {"test" => "Ä"})
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Aaron Kimball (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197503#comment-13197503 ] 

Aaron Kimball commented on THRIFT-1224:
---------------------------------------

@Alexis,

I have a problem encoding certain binary data in Thrift (0.8.0)... I get the following bug:

{code}
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and US-ASCII
{code}

Your patch doesn't fix it for me.. but if I  replace {{if str.encoding.to_s != "US-ASCII"}} with {{if str.encoding.to_s != "ASCII-8BIT"}} it works.
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Sven Casimir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165454#comment-13165454 ] 

Sven Casimir commented on THRIFT-1224:
--------------------------------------

Okay, thanks for the response. Meanwhile I tried Alexis' suggestion and so far it seems to work.
At least I am able to write UTF8 strings into the database and correctly read them back.
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Sven Casimir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165340#comment-13165340 ] 

Sven Casimir commented on THRIFT-1224:
--------------------------------------

I get the same error when trying to insert umlauts.
The error occurs at 
thrift (0.8.0) lib/thrift/transport/framed_transport.rb:84:in `write'
thrift (0.8.0) lib/thrift/protocol/binary_protocol.rb:112:in `write_string'
thrift (0.8.0) lib/thrift/client.rb:35:in `write'
thrift (0.8.0) lib/thrift/client.rb:35:in `send_message'

Will this issue ever be fixed? Or do you have any suggestions on what to do about it?
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Issue Comment Edited] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Sven Casimir (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165340#comment-13165340 ] 

Sven Casimir edited comment on THRIFT-1224 at 12/8/11 5:24 PM:
---------------------------------------------------------------

I get the same error when trying to insert umlauts.
The error occurs at 
thrift (0.8.0) lib/thrift/transport/framed_transport.rb:84:in `write'
thrift (0.8.0) lib/thrift/protocol/binary_protocol.rb:112:in `write_string'
thrift (0.8.0) lib/thrift/client.rb:35:in `write'
thrift (0.8.0) lib/thrift/client.rb:35:in `send_message'

Will this issue ever be fixed in the official release?
                
      was (Author: fewking):
    I get the same error when trying to insert umlauts.
The error occurs at 
thrift (0.8.0) lib/thrift/transport/framed_transport.rb:84:in `write'
thrift (0.8.0) lib/thrift/protocol/binary_protocol.rb:112:in `write_string'
thrift (0.8.0) lib/thrift/client.rb:35:in `write'
thrift (0.8.0) lib/thrift/client.rb:35:in `send_message'

Will this issue ever be fixed? Or do you have any suggestions on what to do about it?
                  
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (THRIFT-1224) Cannot insert UTF-8 text

Posted by "Jake Farrell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165397#comment-13165397 ] 

Jake Farrell commented on THRIFT-1224:
--------------------------------------

This was a know issue during the 0.8 release and was pushed back due to it pertaining only to ruby 1.9.x. It is slated currently to be looked at in the 0.9 release 
                
> Cannot insert UTF-8 text
> ------------------------
>
>                 Key: THRIFT-1224
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1224
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.6
>         Environment: Ruby 1.9.2, Cassandra 0.8, thrift_client gem 0.6.2, cassandra gem 0.11.1
>            Reporter: Alessandro Morandi
>              Labels: charset, encoding, ruby, utf, utf-8, utf8
>             Fix For: 0.9
>
>
> I can't seem to find a way to save UTF-8 data into Cassandra.
> I'm using the cassandra gem 0.11.1 (https://github.com/fauna/cassandra/) which in turn uses thrift_client (0.6.2), which in turns uses the thrift library (0.6.0).
> As an example, the following code[1]
> bq. cassandra.insert(:Cache, "123", {"unicode string" => "ä"})
> will raise an exception:
> bq.    {{Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8}}
> The stacktrace points to `thrift-0.6.0/lib/thrift/transport/framed_transport.rb:58`. What seems to be happening is that `@wbuf` is encoded as ASCII-8BIT, while `buf` is encoded as UTF-8, which causes the concatenation operation (<<) to fail with the exception above.
> This issue might be connected to https://issues.apache.org/jira/browse/THRIFT-1023.
> [1] Of course, this assumes a "cassandra" object created using the cassandra gem and a schema inizialized with a column family called "Cache"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira