You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Nathan Beyer (JIRA)" <ji...@apache.org> on 2012/08/06 04:24:03 UTC

[jira] [Commented] (THRIFT-1023) Thrift encoding (UTF-8) issue with Ruby 1.9.2

    [ https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428958#comment-13428958 ] 

Nathan Beyer commented on THRIFT-1023:
--------------------------------------

I did some testing on Ubuntu 12.04 and everything seems to work fine with Ruby 1.8.7-p370 (via RVM), 1.9.3-p194 (via RVM), 1.9.3-p0 (via Ubuntu APT).

To get the full build to work via Ruby 1.9.3 on Ubuntu requires additional tweaks. Ubuntu packages 'ruby1.9.1' and 'ruby1.9.1-dev' are required. Also, the changes in the patch file 'THRIFT-1023-build-ruby19.patch'.

The patch 'THRIFT-1023-build-ruby19.patch' is a bit kludgy and needs some consideration. Here's what was changed -
* Makefile.am - There seems to be a slight behavior change in the interpreter's load mechanism, such that to get the integration tests to run a '-I.' was needed; this includes the local directory.
* thrift.gemspec - the mongrel dependency must be removed and the easiest way to do that was add the RUBY_VERSION conditional. There's no mechanism in RubyGems to define a dependency as "optional". My suggestion would be to just remove this dependency altogether from the gemspec.
* lib/rb/spec/mongrel_http_server_spec.rb - this adds a similar RUBY_VERSION check, which essentially removes the contents of the file when Ruby 1.9 is used. If mongrel is removed from the gemspec, then this should be tweaked to capture a LoadError on the load of 'thrift/server/mongrel_http_server'.

I'm open to suggestions on the best approach to handle the mongrel portions of the code. My suggestion would be to remove it from the gemspec, make the specs only run when mongrel can be loaded. I think mongrel could be added to the Gemfile, such that it's optional, but I'll have to look into that.
                
> Thrift encoding  (UTF-8) issue with Ruby 1.9.2
> ----------------------------------------------
>
>                 Key: THRIFT-1023
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1023
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>    Affects Versions: 0.5
>         Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0
>            Reporter: Vincent Peres
>            Assignee: Jake Farrell
>             Fix For: 0.9
>
>         Attachments: THRIFT-1023-build-ruby19.patch, THRIFT-1023-refactor-transport-protocol-for-ruby19-v2.patch, THRIFT-1023-refactor-transport-protocol-for-ruby19-v3.patch, THRIFT-1023-refactor-transport-protocol-for-ruby19.patch, thrift-1023-utf8-encoding-issue.path
>
>
> I came up with an encoding issue coming from the Thrift library, and especially the BufferedTransport class.
> I've decided to write down few tests to give you a concrete example :
> # encoding: utf-8
> require 'spec_helper'
> describe "encoding" do
>  before do
>    transport = Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
>    protocol  = Thrift::BinaryProtocol.new(transport)
>    @client   = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
>    transport.open()
>    @table_name = "encoding_test"
>    @column_family = "info:"
>  end
>  it "should create a new table" do
>    column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= @column_family}
>    @client.createTable(@table_name, [column]).should be_nil
>  end
>  it "should save standard caracteres" do
>    m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
>    m.column = "info:first_name"
>    m.value  = "Vincent"
>    m.value.encoding.should == Encoding::UTF_8
>    @client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should save UTF8 caracteres" do
>    m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
>    m.column = "info:first_name"
>    m.value  = "Thorbjørn"
>    m.value.encoding.should == Encoding::UTF_8
>    @client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should destroy the table" do
>    @client.disableTable(@table_name).should be_nil
>    @client.deleteTable(@table_name).should be_nil
>  end
> end
> It fails when it tries to save the UTF8 string including the caractere 'ø'.
> Here is the output :
>  1) encoding should save UTF8 caracteres
>     Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
>     incompatible character encodings: ASCII-8BIT and UTF-8
>     #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
> `write'
>     #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
> `write_string'
>     #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `write'
>     #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `send_message'
>     # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
>     # ./lib/thrift/hbase.rb:284:in `mutateRow'
>     # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
> (required)>'
> Let me know if you need any other details, thank you !

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira