You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (JIRA)" <ji...@apache.org> on 2009/02/10 06:38:59 UTC

[jira] Created: (THRIFT-332) Compact Protocol in Ruby

Compact Protocol in Ruby
------------------------

                 Key: THRIFT-332
                 URL: https://issues.apache.org/jira/browse/THRIFT-332
             Project: Thrift
          Issue Type: Sub-task
          Components: Library (Ruby)
            Reporter: Bryan Duxbury
            Priority: Trivial




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-332:
---------------------------------

    Attachment: thrift-332-v5.patch

OK, here we go. It turns out we had a bug in struct.c that wasn't setting the native method table correctly when you go down the read path. I've fixed that and now both native protocols use the special hooks. Here's the updated performance table:

||proto||operation||time||
|binary protocol| write|2.025033|
|accelerated binary protocol|write|0.322201|
|compact protocol|write|0.361371|
|binary protocol|read|2.006195|
|accelerated binary protocol|read|0.551998|
|compact protocol|read|0.560213|

You'll notice that both the accelerated protocols got faster on read as a result of this fix. The accelerated binary protocol was very likely not using the right reading methods, which should be fixed now. On the whole, the performance of the compact accelerated protocol is on par with the binary accelerated protocol, which I think is pretty awesome.

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332-v4.patch, thrift-332-v5.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-332:
---------------------------------

    Attachment: thrift-332.patch

Here's my first go at it. I ported the Ruby implementation more or less completely from the Java version. There's a pretty thorough spec included, too, which is also modeled on the Java compact protocol's test. 

This patch also includes some tiny changes to struct.rb designed to improve performance. I don't think those are critical changes, and I'm not that attached to them. They probably won't even really get executed on most platforms since we have the thrift_native extension in the mix.

I haven't yet benchmarked this implementation, but I fully expect it to be "too slow", and will probably end up writing a C extension for it.

Looking forward to comments from any Ruby folks out there.

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-332) Compact Protocol in Ruby

Posted by "Kevin Clark (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683337#action_12683337 ] 

Kevin Clark commented on THRIFT-332:
------------------------------------

In general this looks great. Good work.

re: sets and maps don't hash well
Maybe you could iterate through the structs and compare by hand? It's ugly though. Should probably try to find a way to fix this.

lib/rb/lib/thrift/transport.rb.orig is in the patch

I'm getting a failure I don't see in trunk:

{monospace}
NameError in 'deprecate_module! should skip thrift library code when printing caller'
undefined method `struct_fields' for module `Thrift::Struct'
./spec/../lib/thrift/deprecation.rb:98:in `instance_method'
./spec/../lib/thrift/deprecation.rb:98:in `method_missing'
./spec/../lib/thrift/struct.rb:41:in `fields_with_default_values'
./spec/../lib/thrift/deprecation.rb:98:in `call'
./spec/../lib/thrift/deprecation.rb:98:in `method_missing'
./spec/../lib/thrift/struct.rb:8:in `initialize'
./spec/deprecation_spec.rb:441:in `new'
./spec/deprecation_spec.rb:441:
{monospace}

Maybe caused by this?

-    # Obsoleted by THRIFT-246, which generates this method inline
-    # TODO: Should be removed at some point. -- Kevin Clark
-    def struct_fields
-      self.class.const_get(:FIELDS)
-    end

But I appear to be regenerating the stuff in gen-rb, so I don't know what's up.


compact_protocol.rb:

In write_double:

@trans.write([dub].pack("G").reverse) #TODO: make sure this is right

Is it right?


compact_protocol.c:
The 5 defines at the top also show up in binary_protocol_accelerated. Could we move them into a .h?

Is this going to hook into the native protocol hooks you wrote? If so, doesn't rb_thrift_compact_proto_native_qmark need to return true? Looks like the native hooks are commented out.

I guess this is a general issue, but do we have a way to test both the native and non-native bindings?

In rb_thrift_compact_proto_read_message_begin, the max length of buf for PROTOCOL_ID is 51 and for VERSION is 50(string.gsub('%d', '4294967296').length is 49 + 1 for zero termination). Make sure to null out the last byte, or rb_str_new2 could include garbage.

Do you need the memcpy in rb_thrift_compact_proto_read_double? Can you just cast instead? return rb_float_new((double)RSTRING(bytes)->ptr)

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury reassigned THRIFT-332:
------------------------------------

    Assignee: Bryan Duxbury

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-332) Compact Protocol in Ruby

Posted by "Kevin Clark (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683562#action_12683562 ] 

Kevin Clark commented on THRIFT-332:
------------------------------------

Looks great. +1

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332-v4.patch, thrift-332-v5.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-332:
---------------------------------

    Patch Info: [Patch Available]

Would love some review comments on this latest patch.

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-332:
---------------------------------

    Attachment: thrift-332-v3.patch

This version of the patch includes changes to thrift_native to support Compact Protocol, with substantial performance benefits:

||protocol||operation||time||
|binary protocol|write|2.031031|
|accelerated binary protocol|write|0.325992|
|compact protocol|write|0.402717|
|binary protocol|read|2.009999|
|accelerated binary protocol|read|0.598706|
|compact protocol|read|0.604265|

This brings the compact protocol almost to exact parity with the accelerated binary protocol. I suspect it'd be even faster if IO were taken into account, instead of just CPU time. 

This implementation does have one issue - it seems that the double serialization stuff is currently architecture dependent. This area requires some more attention.

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-332:
---------------------------------

    Attachment: thrift-332-v2.patch

This patch contains the completed and tested pure-Ruby implementation of the compact protocol. The spec passes fully.

I also included a few new scripts to be used for testing serialization across languages. WriteStruct.java and ReadStruct.java will write a certain struct to any file in any protocol based on command line specifications. Likewise lib/rb/script/write_struct.rb and lib/rb/script/read_struct.rb do exactly the same thing, but in Ruby. Using these tests to verify cross-language protocol compatibility indicates that everything is working (though I ran into an issue with set and map equality). 

I also wrote up a comparative protocol benchmark to see what kind of performance we're getting:

||proto||operation||time for test (sec)||
|binary protocol|write|2.039999|
|accelerated binary protocol|write|0.326499|
|compact protocol|write|1.803780|
|binary protocol|read|2.020129|
|accelerated binary protocol|read|0.604584|
|compact protocol|read|2.620025|

So it looks like, without doing any optimization so far, we're not that much slower than the pure Ruby binary protocol, but we have some room to try and catch up with the accelerated binary protocol. 

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury resolved THRIFT-332.
----------------------------------

    Resolution: Fixed

Committed. Thanks for the reviews, Kevin!

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332-v4.patch, thrift-332-v5.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (THRIFT-332) Compact Protocol in Ruby

Posted by "Kevin Clark (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683337#action_12683337 ] 

Kevin Clark edited comment on THRIFT-332 at 3/18/09 11:23 PM:
--------------------------------------------------------------

In general this looks great. Good work.

re: sets and maps don't hash well
Maybe you could iterate through the structs and compare by hand? It's ugly though. Should probably try to find a way to fix this.

lib/rb/lib/thrift/transport.rb.orig is in the patch

I'm getting a failure I don't see in trunk:

{quote}
NameError in 'deprecate_module! should skip thrift library code when printing caller'
undefined method `struct_fields' for module `Thrift::Struct'
./spec/../lib/thrift/deprecation.rb:98:in `instance_method'
./spec/../lib/thrift/deprecation.rb:98:in `method_missing'
./spec/../lib/thrift/struct.rb:41:in `fields_with_default_values'
./spec/../lib/thrift/deprecation.rb:98:in `call'
./spec/../lib/thrift/deprecation.rb:98:in `method_missing'
./spec/../lib/thrift/struct.rb:8:in `initialize'
./spec/deprecation_spec.rb:441:in `new'
./spec/deprecation_spec.rb:441:
{quote}

Maybe caused by this?

{quote}
--    # Obsoleted by THRIFT-246, which generates this method inline
--    # TODO: Should be removed at some point. -- Kevin Clark
--    def struct_fields
--      self.class.const_get(:FIELDS)
--    end
{quote}

But I appear to be regenerating the stuff in gen-rb, so I don't know what's up.


compact_protocol.rb:

In write_double:

{quote}@trans.write([dub].pack("G").reverse) #TODO: make sure this is right{quote}

Is it right?


compact_protocol.c:
The 5 defines at the top also show up in binary_protocol_accelerated. Could we move them into a .h?

Is this going to hook into the native protocol hooks you wrote? If so, doesn't rb_thrift_compact_proto_native_qmark need to return true? Looks like the native hooks are commented out.

I guess this is a general issue, but do we have a way to test both the native and non-native bindings?

In rb_thrift_compact_proto_read_message_begin, the max length of buf for PROTOCOL_ID is 51 and for VERSION is 50(string.gsub('%d', '4294967296').length is 49 + 1 for zero termination). Make sure to null out the last byte, or rb_str_new2 could include garbage.

Do you need the memcpy in rb_thrift_compact_proto_read_double? Can you just cast instead? 
{quote}return rb_float_new((double)RSTRING(bytes)->ptr){quote}

      was (Author: kclark):
    In general this looks great. Good work.

re: sets and maps don't hash well
Maybe you could iterate through the structs and compare by hand? It's ugly though. Should probably try to find a way to fix this.

lib/rb/lib/thrift/transport.rb.orig is in the patch

I'm getting a failure I don't see in trunk:

{monospace}
NameError in 'deprecate_module! should skip thrift library code when printing caller'
undefined method `struct_fields' for module `Thrift::Struct'
./spec/../lib/thrift/deprecation.rb:98:in `instance_method'
./spec/../lib/thrift/deprecation.rb:98:in `method_missing'
./spec/../lib/thrift/struct.rb:41:in `fields_with_default_values'
./spec/../lib/thrift/deprecation.rb:98:in `call'
./spec/../lib/thrift/deprecation.rb:98:in `method_missing'
./spec/../lib/thrift/struct.rb:8:in `initialize'
./spec/deprecation_spec.rb:441:in `new'
./spec/deprecation_spec.rb:441:
{monospace}

Maybe caused by this?

-    # Obsoleted by THRIFT-246, which generates this method inline
-    # TODO: Should be removed at some point. -- Kevin Clark
-    def struct_fields
-      self.class.const_get(:FIELDS)
-    end

But I appear to be regenerating the stuff in gen-rb, so I don't know what's up.


compact_protocol.rb:

In write_double:

@trans.write([dub].pack("G").reverse) #TODO: make sure this is right

Is it right?


compact_protocol.c:
The 5 defines at the top also show up in binary_protocol_accelerated. Could we move them into a .h?

Is this going to hook into the native protocol hooks you wrote? If so, doesn't rb_thrift_compact_proto_native_qmark need to return true? Looks like the native hooks are commented out.

I guess this is a general issue, but do we have a way to test both the native and non-native bindings?

In rb_thrift_compact_proto_read_message_begin, the max length of buf for PROTOCOL_ID is 51 and for VERSION is 50(string.gsub('%d', '4294967296').length is 49 + 1 for zero termination). Make sure to null out the last byte, or rb_str_new2 could include garbage.

Do you need the memcpy in rb_thrift_compact_proto_read_double? Can you just cast instead? return rb_float_new((double)RSTRING(bytes)->ptr)
  
> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (THRIFT-332) Compact Protocol in Ruby

Posted by "Kevin Clark (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Clark closed THRIFT-332.
------------------------------


> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332-v4.patch, thrift-332-v5.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-332) Compact Protocol in Ruby

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-332:
---------------------------------

    Attachment: thrift-332-v4.patch

bq. lib/rb/lib/thrift/transport.rb.orig is in the patch

I couldn't find this svn added anywhere. Maybe I fixed it already.

bq. I'm getting a failure I don't see in trunk: ...

I figured it out. The spec was mimicking a generated class incompletely.

bq. In write_double: ...

Yeah, this is the correct behavior. Compact Protocol defines little-endian doubles on the wire.

bq. The 5 defines at the top also show up in binary_protocol_accelerated. Could we move them into a .h?

Done. Added macros.h, both binary and compact share macros from there now.

bq. Is this going to hook into the native protocol hooks you wrote?

Ideally, yes. I ran into some weirdness trying to do that, and rather than debug it then, I just wanted to get a patch on JIRA. I will take another look at it though.

bq. I guess this is a general issue, but do we have a way to test both the native and non-native bindings?

Yeah, I struggle with this issue too. Basically all you can do is comment out the require of thrift_native in spec_helper.rb. I don't know if there's a better way to go about it. We could have two "separate" protocol classes like the binary protocol /accelerated stuff, but we still have struct.c and memory_buffer.c in the mix, which cannot be separated. 

bq. In rb_thrift_compact_proto_read_message_begin, the max length of buf for PROTOCOL_ID is 51 and for VERSION is 50(string.gsub('%d', '4294967296').length is 49 + 1 for zero termination). Make sure to null out the last byte, or rb_str_new2 could include garbage.

I *think* I fixed this problem in this patch by actually reading the docs for sprintf. Double check me, though - I am definitely not a c string master.

bq. Do you need the memcpy in rb_thrift_compact_proto_read_double? Can you just cast instead? 

Casting definitely did not work. I got rid of the memcpy though with an explicit little-endian transform instead. This should be appropriately cross-architecture. 

> Compact Protocol in Ruby
> ------------------------
>
>                 Key: THRIFT-332
>                 URL: https://issues.apache.org/jira/browse/THRIFT-332
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Trivial
>         Attachments: thrift-332-v2.patch, thrift-332-v3.patch, thrift-332-v4.patch, thrift-332.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.