You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (JIRA)" <ji...@apache.org> on 2009/05/20 19:39:45 UTC

[jira] Created: (THRIFT-511) Better performing hash method for generated structs

Better performing hash method for generated structs
---------------------------------------------------

                 Key: THRIFT-511
                 URL: https://issues.apache.org/jira/browse/THRIFT-511
             Project: Thrift
          Issue Type: New Feature
          Components: Library (Ruby)
            Reporter: Bryan Duxbury
             Fix For: 0.2


As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 

We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-511) Better performing hash method for generated structs

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-511:
---------------------------------

    Patch Info: [Patch Available]

> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>         Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (THRIFT-511) Better performing hash method for generated structs

Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711432#action_12711432 ] 

Dan Kinder edited comment on THRIFT-511 at 5/20/09 5:42 PM:
------------------------------------------------------------

turns out ruby arrays hash by item, two different arrays with the same elements have the same hashcode, but otherwise the hashes are very random-looking. Does it seem like there would be a problem with this?:

   def hash
      field_values = []
      each_field do |fid, field_info|
        name = field_info[:name]
        field_values << self.instance_variable_get("@#{name}")
      end
      field_values.hash
    end

      was (Author: dkinder):
    turns out ruby arrays hash by item, two different arrays with the same elements have the same hashcode, but otherwise the hashes are very random-looking. Does it seem like there would be a problem with this?:

  def hash
      struct_fields.values.sort.hash 
  end
  
> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-511) Better performing hash method for generated structs

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714513#action_12714513 ] 

Bryan Duxbury commented on THRIFT-511:
--------------------------------------

I tested out this patch's performance difference. Before:

{code}
10
                                    user     system      total        real
string set create               0.000000   0.000000   0.000000 (  0.000143)
struct set create               0.010000   0.000000   0.010000 (  0.003211)
100
                                    user     system      total        real
string set create               0.000000   0.000000   0.000000 (  0.001208)
struct set create               0.330000   0.010000   0.340000 (  0.337008)
1000
                                    user     system      total        real
string set create               0.030000   0.000000   0.030000 (  0.037109)
struct set create              33.420000   0.190000  33.610000 ( 34.300607)
{code}

after:

{code}
10
                                    user     system      total        real
string set create               0.000000   0.000000   0.000000 (  0.000243)
struct set create               0.000000   0.000000   0.000000 (  0.001049)
100
                                    user     system      total        real
string set create               0.010000   0.000000   0.010000 (  0.001654)
struct set create               0.000000   0.000000   0.000000 (  0.006906)
1000
                                    user     system      total        real
string set create               0.040000   0.000000   0.040000 (  0.036426)
struct set create               0.080000   0.000000   0.080000 (  0.090610)
{code}

The performance boost is pretty stark. Also, there's zero testing to be added, because verifying that hash codes for == structs before would have worked equally well. I think this patch is ready to go.

If no one objects, I will commit it later today.

> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>         Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (THRIFT-511) Better performing hash method for generated structs

Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Kinder reassigned THRIFT-511:
---------------------------------

    Assignee: Dan Kinder

> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (THRIFT-511) Better performing hash method for generated structs

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury resolved THRIFT-511.
----------------------------------

    Resolution: Fixed

I just committed this. Thanks for the patch, Dan!

> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>         Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-511) Better performing hash method for generated structs

Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Kinder updated THRIFT-511:
------------------------------

    Attachment: thrift-511.patch

changed hash definition to hash of array elements

> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>         Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-511) Better performing hash method for generated structs

Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711432#action_12711432 ] 

Dan Kinder commented on THRIFT-511:
-----------------------------------

turns out ruby arrays hash by item, two different arrays with the same elements have the same hashcode, but otherwise the hashes are very random-looking. Does it seem like there would be a problem with this?:

  def hash
      struct_fields.values.sort.hash 
  end

> Better performing hash method for generated structs
> ---------------------------------------------------
>
>                 Key: THRIFT-511
>                 URL: https://issues.apache.org/jira/browse/THRIFT-511
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Library (Ruby)
>            Reporter: Bryan Duxbury
>            Assignee: Dan Kinder
>             Fix For: 0.2
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications. 
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.