You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (JIRA)" <ji...@apache.org> on 2009/05/20 19:39:45 UTC
[jira] Created: (THRIFT-511) Better performing hash method for
generated structs
Better performing hash method for generated structs
---------------------------------------------------
Key: THRIFT-511
URL: https://issues.apache.org/jira/browse/THRIFT-511
Project: Thrift
Issue Type: New Feature
Components: Library (Ruby)
Reporter: Bryan Duxbury
Fix For: 0.2
As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (THRIFT-511) Better performing hash method for
generated structs
Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Duxbury updated THRIFT-511:
---------------------------------
Patch Info: [Patch Available]
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
> Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (THRIFT-511) Better performing hash
method for generated structs
Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711432#action_12711432 ]
Dan Kinder edited comment on THRIFT-511 at 5/20/09 5:42 PM:
------------------------------------------------------------
turns out ruby arrays hash by item, two different arrays with the same elements have the same hashcode, but otherwise the hashes are very random-looking. Does it seem like there would be a problem with this?:
def hash
field_values = []
each_field do |fid, field_info|
name = field_info[:name]
field_values << self.instance_variable_get("@#{name}")
end
field_values.hash
end
was (Author: dkinder):
turns out ruby arrays hash by item, two different arrays with the same elements have the same hashcode, but otherwise the hashes are very random-looking. Does it seem like there would be a problem with this?:
def hash
struct_fields.values.sort.hash
end
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (THRIFT-511) Better performing hash method for
generated structs
Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714513#action_12714513 ]
Bryan Duxbury commented on THRIFT-511:
--------------------------------------
I tested out this patch's performance difference. Before:
{code}
10
user system total real
string set create 0.000000 0.000000 0.000000 ( 0.000143)
struct set create 0.010000 0.000000 0.010000 ( 0.003211)
100
user system total real
string set create 0.000000 0.000000 0.000000 ( 0.001208)
struct set create 0.330000 0.010000 0.340000 ( 0.337008)
1000
user system total real
string set create 0.030000 0.000000 0.030000 ( 0.037109)
struct set create 33.420000 0.190000 33.610000 ( 34.300607)
{code}
after:
{code}
10
user system total real
string set create 0.000000 0.000000 0.000000 ( 0.000243)
struct set create 0.000000 0.000000 0.000000 ( 0.001049)
100
user system total real
string set create 0.010000 0.000000 0.010000 ( 0.001654)
struct set create 0.000000 0.000000 0.000000 ( 0.006906)
1000
user system total real
string set create 0.040000 0.000000 0.040000 ( 0.036426)
struct set create 0.080000 0.000000 0.080000 ( 0.090610)
{code}
The performance boost is pretty stark. Also, there's zero testing to be added, because verifying that hash codes for == structs before would have worked equally well. I think this patch is ready to go.
If no one objects, I will commit it later today.
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
> Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (THRIFT-511) Better performing hash method for
generated structs
Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Kinder reassigned THRIFT-511:
---------------------------------
Assignee: Dan Kinder
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (THRIFT-511) Better performing hash method for
generated structs
Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Duxbury resolved THRIFT-511.
----------------------------------
Resolution: Fixed
I just committed this. Thanks for the patch, Dan!
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
> Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (THRIFT-511) Better performing hash method for
generated structs
Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Kinder updated THRIFT-511:
------------------------------
Attachment: thrift-511.patch
changed hash definition to hash of array elements
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
> Attachments: thrift-511.patch
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (THRIFT-511) Better performing hash method for
generated structs
Posted by "Dan Kinder (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/THRIFT-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711432#action_12711432 ]
Dan Kinder commented on THRIFT-511:
-----------------------------------
turns out ruby arrays hash by item, two different arrays with the same elements have the same hashcode, but otherwise the hashes are very random-looking. Does it seem like there would be a problem with this?:
def hash
struct_fields.values.sort.hash
end
> Better performing hash method for generated structs
> ---------------------------------------------------
>
> Key: THRIFT-511
> URL: https://issues.apache.org/jira/browse/THRIFT-511
> Project: Thrift
> Issue Type: New Feature
> Components: Library (Ruby)
> Reporter: Bryan Duxbury
> Assignee: Dan Kinder
> Fix For: 0.2
>
>
> As discussed in THRIFT-231, ruby generated structs' hash code method just always returns zero. While this is semantically correct, it leads to hashes having O(n) performance instead of O(1), which is critical for some applications.
> We can either take the approach that the Java library uses (optionally produce complex hash function), or just generate a good hash all the time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.