You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Martin Kleppmann (JIRA)" <ji...@apache.org> on 2016/01/11 23:35:39 UTC

[jira] [Created] (AVRO-1783) Gracefully handle strings with wrong character encoding

Martin Kleppmann created AVRO-1783:
--------------------------------------

             Summary: Gracefully handle strings with wrong character encoding
                 Key: AVRO-1783
                 URL: https://issues.apache.org/jira/browse/AVRO-1783
             Project: Avro
          Issue Type: Bug
          Components: ruby
    Affects Versions: 1.7.7
            Reporter: Martin Kleppmann


In the [vote thread for Avro 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E], [~busbey] noticed that [phunt's avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:

{code}
busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
Avro::IO::AvroTypeError: The datum
"\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
is not an example of schema
{"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
              write_data at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
            write_record at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
                    each at org/jruby/RubyArray.java:1613
            write_record at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
              write_data at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
                   write at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
 write_handshake_request at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
                 request at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
                 request at
/Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
                  (root) at sample_ipc_client.rb:49
{code}

I tried reproducing the error, and it is quite strange. avro-rpc-quickstart works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and in this particular version of JRuby I was able to reproduce the issue.

It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a binary-encoded string. {{Schema.validate}} checks that the string is suitable for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, although the MD5 digest of the schema is a 16-byte string, if you interpret it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some sequences are interpreted as multibyte characters).

Rather than trying to divine why JRuby is being weird here, I think this is an opportunity to fix Avro's handling of strings to make it robust against unexpected encodings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)