You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Tophe Vigny (JIRA)" <ji...@apache.org> on 2012/11/16 14:28:12 UTC

[jira] [Created] (AVRO-1206) utf-8 serialisation problems

Tophe Vigny created AVRO-1206:
---------------------------------

             Summary: utf-8 serialisation problems 
                 Key: AVRO-1206
                 URL: https://issues.apache.org/jira/browse/AVRO-1206
             Project: Avro
          Issue Type: Bug
          Components: ruby
    Affects Versions: 1.7.2
         Environment: ruby-1.9.3p194, avro gem 1.7.2.
            Reporter: Tophe Vigny


some serialized utf-8 characters like "家" cannot be read latter, avro break with 
/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
	from avr_err_example.rb:42:in `block in <main>'


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Tophe Vigny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502001#comment-13502001 ] 

Tophe Vigny commented on AVRO-1206:
-----------------------------------

hi Doug,

you are using ruby 1.8.x (oldest branch), try with ruby > 1.9.x (official branch), you can use rvm (ruby version manager) to install multiple ruby version.

Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.8.7
Using /home/Tophe/.rvm/gems/ruby-1.8.7-p371
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/work/svn_1/trunk/lang/ruby/Rakefile:19: warning: already initialized constant VERSION
/home/Tophe/.rvm/rubies/ruby-1.8.7-p371/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" 
Loaded suite /home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader
Started
................................
Finished in 0.536805 seconds.

32 tests, 710 assertions, 0 failures, 0 errors


Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.9.3
Using /home/Tophe/.rvm/gems/ruby-1.9.3-p327
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" 
Run options: 

# Running tests:

...F............................

Finished tests in 0.212220s, 150.7870 tests/s, 3345.5875 assertions/s.

  1) Failure:
test_utf8(TestDataFile) [/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:152]:
<"家"> expected but was
<"\xE5\xAE\xB6">.

32 tests, 710 assertions, 1 failures, 0 errors, 0 skips
rake aborted!

apply that modif :

Index: test/test_datafile.rb
===================================================================
--- test/test_datafile.rb	(revision 1410649)
+++ test/test_datafile.rb	(working copy)
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -140,4 +141,17 @@
       assert_equal(block_count+1, dw.block_count)
     end
   end
+  def test_utf8
+    datafile = Avro::DataFile::open('data.avr', 'w', '"string"')
+    datafile << "家"
+    datafile.close
+    
+    datafile = Avro::DataFile.open('data.avr')
+    datafile.each do |s|
+      (rmaj,rmin,rlast) = RUBY_VERSION.split(".").map {|a| a.to_i}
+      if rmaj <2 && rmin < 9
+        assert_equal "家", s
+      else
+        assert_equal "家", s.force_encoding('UTF-8')
+      end
+    end
+    datafile.close
+    end
+  end

Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" 
Run options: 

# Running tests:

................................

Finished tests in 0.166176s, 192.5669 tests/s, 4272.5791 assertions/s.

32 tests, 710 assertions, 0 failures, 0 errors, 0 skips

and now change
      def write_bytes(datum)
        write_long(datum.size)
        @writer.write(datum)
      end
      
      
and run test in 1.9.3
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" 
Run options: 

# Running tests:

...F............................

Finished tests in 0.186894s, 171.2203 tests/s, 3798.9507 assertions/s.

  1) Failure:
test_utf8(TestDataFile) [/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:156]:
<"家"> expected but was
<"\xE5">.

32 tests, 710 assertions, 1 failures, 0 errors, 0 skips
rake aborted!

and no in 1.8.7

Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.8.7
Using /home/Tophe/.rvm/gems/ruby-1.8.7-p371
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/work/svn_1/trunk/lang/ruby/Rakefile:19: warning: already initialized constant VERSION
/home/Tophe/.rvm/rubies/ruby-1.8.7-p371/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" 
Loaded suite /home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader
Started
................................
Finished in 0.379195 seconds.

32 tests, 710 assertions, 0 failures, 0 errors

it seems that string.size, return the caracter count in ruby > 1.9, and not the byte count as in ruby < 1.9
the patch correct that and work for all rubies .
surely it can work with jruby, but need to remove yajl, ruby json perhaps can do the job ? and we can use avro in jruby with the avro gem.
Or yajl can be an option, if the require work it can be used, if not present can use JSON.load,dump.




                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1206) utf-8 serialisation problems

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-1206:
-------------------------------

    Attachment: AVRO-1206.patch

Here's a patch that includes the patch from AVRO-1134 and updates the description.

I also added a test, but the test passes for me with or without the change to io.rb in either Ruby 1.8.7 or 1.9.3.  I would rather not commit the change until we have a test that fails without it.  Can someone please help improve this test?
                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Tophe Vigny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501145#comment-13501145 ] 

Tophe Vigny commented on AVRO-1206:
-----------------------------------

Hi Doug, 
I can help with the test issue.

i get a copy from trunk
svn checkout http://svn.apache.org/repos/asf/avro/trunk/lang/ruby/
A    ruby/test
A    ruby/test/tool.rb
A    ruby/test/test_protocol.rb
A    ruby/test/sample_ipc_http_server.rb
A    ruby/test/sample_ipc_server.rb
A    ruby/test/test_socket_transport.rb
A    ruby/test/test_io.rb
A    ruby/test/test_help.rb
A    ruby/test/test_datafile.rb
A    ruby/test/random_data.rb
A    ruby/test/sample_ipc_http_client.rb
A    ruby/test/sample_ipc_client.rb
A    ruby/interop
A    ruby/interop/test_interop.rb
A    ruby/Rakefile
A    ruby/.gitignore
A    ruby/Manifest
A    ruby/lib
A    ruby/lib/avro
A    ruby/lib/avro/schema.rb
A    ruby/lib/avro/protocol.rb
A    ruby/lib/avro/io.rb
A    ruby/lib/avro/collect_hash.rb
A    ruby/lib/avro/data_file.rb
A    ruby/lib/avro/ipc.rb
A    ruby/lib/avro.rb
A    ruby/CHANGELOG
 U   ruby
Révision 1411599 extraite.

and then
patch < AVRO-1206.patch 
patching file Rakefile
patching file io.rb
Hunk #1 FAILED at 201.
1 out of 1 hunk FAILED -- saving rejects to file io.rb.rej
patching file test_datafile.rb
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 140.
2 out of 2 hunks FAILED -- saving rejects to file test_datafile.rb.rej

what the matter ?
I have merged the the test manualy, do some code modification to ensure loading of the ../lib/avro.
and so : with your test :
original io.rb
Tophe@info3:~/work/ruby/test$ ruby test_datafile.rb 
Run options: 

# Running tests:

...F

Finished tests in 0.088778s, 45.0561 tests/s, 878.5939 assertions/s.

  1) Failure:
test_utf8(TestDataFile) [test_datafile.rb:155]:
<"家"> expected but was
<"\xE5">.

4 tests, 78 assertions, 1 failures, 0 errors, 0 skips
only one byte stored fo a  bytes char, and with modified io.rb
Tophe@info3:~/work/ruby/test$ ruby test_datafile.rb 
Run options: 

# Running tests:

....

Finished tests in 0.088450s, 45.2230 tests/s, 881.8492 assertions/s.

4 tests, 78 assertions, 0 failures, 0 errors, 0 skips

you need to add 

#encoding: utf-8 at the begining of test_datafile.rb
for the assertion, we can do :
      (rmaj,rmin,rlast) = RUBY_VERSION.split(".").map {|a| a.to_i}
      if rmaj <2 &&  rmin < 9
        assert_equal "家", s
      else
        assert_equal "家", s.force_encoding('UTF-8') 
      end
that test work with ruby 1.8 and >= 1.9 because of the encoding awareness of 1.9 ruby branche, you need to specify encoding, or we need to compre in binary.

is it possible to specify the encoding in the schema, either for all data, or by string type ? that could contribute to have the reader returning correct string encoding.
that can be more simple to use, because reader don't need to know the encoding.

the problem for you is that you are loading the gem and not the ../lib, and you have made the correction on the gem. (I have the same problem, and I spend some time on that)
try that:
gem uninstall avro (all)

and run the test , it should not run. because there are some require 'avro' along the code, and that load the gem, not the source code.
to load the source code you should do in the test_help.rb :

$LOAD_PATH << '../lib/'
require 'avro'
that way, avro.rb should be loaded from ../lib and not $GEM_HOME/...

I can send you a patch, if I can apply your patch on the trunk. tell me if you need, and what to do.
by the way you can remove the FIXME in
def write_string(datum)
    # FIXME utf-8 encode this in 1.9
    write_bytes(datum)
end



                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Tophe Vigny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500110#comment-13500110 ] 

Tophe Vigny commented on AVRO-1206:
-----------------------------------

hi all,

I have merged the patch from AVRO-1134, and then all work.
I have run the test with ruby-1.9.3 and ruby-1.8.7, all of them passed.
some test have to be modified because in 1.9.3 ./ isn't in the classpath, so change require 'test_help' with require './test_help' can solve the problem.
if you release a new 1.7.3 gems, can you also change :
s.description = "Apache is a data serialization and RPC format" with s.description = "Avro is a data serialization and RPC format"

                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Tophe Vigny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502649#comment-13502649 ] 

Tophe Vigny commented on AVRO-1206:
-----------------------------------

Hi Doug

that's amazing. 
for me the test don't work with a compiled ruby 1.9.1.
anyway, thank's you for the commit.



Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.9.1
Using /home/Tophe/.rvm/gems/ruby-1.9.1-p431
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.1-p431/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.1-p431@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.1-p431@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" 
Loaded suite /home/Tophe/.rvm/gems/ruby-1.9.1-p431@global/gems/rake-10.0.2/lib/rake/rake_test_loader
Started
...F............................
Finished in 0.221537 seconds.

  1) Failure:
test_utf8(TestDataFile) [/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:156]:
<"家"> expected but was
<"\xE5">.

32 tests, 710 assertions, 1 failures, 0 errors, 0 skips
rake aborted!

                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499230#comment-13499230 ] 

Doug Cutting commented on AVRO-1206:
------------------------------------

Is this the same problem as AVRO-1134?  Should we commit the fix there?
                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501539#comment-13501539 ] 

Doug Cutting commented on AVRO-1206:
------------------------------------

Test passes for me when I apply it as follows:
{code}
 svn co http://svn.apache.org/repos/asf/avro/trunk
 cd trunk/lang/ruby
 patch -p 0 < AVRO-1206.patch
 rake test
{code}

If I run 'gem uninstall avro' I get the message:

  INFO:  gem "avro" is not installed



                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1206) utf-8 serialisation problems

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502406#comment-13502406 ] 

Doug Cutting commented on AVRO-1206:
------------------------------------

Hmm.  'ruby --version' prints 1.9.1 for me, but if this patch fixes things for you then I'll commit it.
                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (AVRO-1206) utf-8 serialisation problems

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved AVRO-1206.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.3

I committed this.
                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>             Fix For: 1.7.3
>
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': undefined method `type' for nil:NilClass (NoMethodError)
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in `read_union'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in `block in read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in `read_record'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in `read_data'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in `read'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in `block in each'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `loop'
> 	from /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in `each'
> 	from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira