You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/05/23 02:12:45 UTC
[jira] Created: (HIVE-511) Change the hashcode for DoubleWritable
Change the hashcode for DoubleWritable
--------------------------------------
Key: HIVE-511
URL: https://issues.apache.org/jira/browse/HIVE-511
Project: Hadoop Hive
Issue Type: Bug
Reporter: Zheng Shao
The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713761#action_12713761 ]
Zheng Shao commented on HIVE-511:
---------------------------------
Sorry I should have put a comment.
We are using http://java.sun.com/j2se/1.4.2/docs/api/java/util/List.html#hashCode() here.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-511:
----------------------------
Attachment: HIVE-511.3.patch
Added a comment for the hash function.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch, HIVE-511.3.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-511:
----------------------------
Fix Version/s: 0.4.0
Affects Version/s: 0.4.0
Status: Patch Available (was: Open)
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713796#action_12713796 ]
Raghotham Murthy commented on HIVE-511:
---------------------------------------
+1
looks good. will commit once tests pass.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch, HIVE-511.3.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao reassigned HIVE-511:
-------------------------------
Assignee: Zheng Shao
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713751#action_12713751 ]
Raghotham Murthy commented on HIVE-511:
---------------------------------------
r = r * 31 + ObjectInspectorUtils.hashCode(arguments[i].get(), argumentOIs[i]);
Wont this overflow?
Maybe change it to something like the following?
r = r ^ ObjectInspectorUtils.hashCode(arguments[i].get(), argumentOIs[i]);
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712329#action_12712329 ]
Zheng Shao commented on HIVE-511:
---------------------------------
Text and String are both "STRING" primitive type.
The relationship between Text and String are the same as that between IntWritable and Integer - they will have WritableStringOI and JavaStringOI, which both implements StringOI.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-511:
----------------------------
Attachment: HIVE-511.2.patch
This patch moves hash function to GenericUDF. Now we supports variable number of arguments in the hash function.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712328#action_12712328 ]
Namit Jain commented on HIVE-511:
---------------------------------
I am not sure I follow - what if if it a Text Writable Object.
Wont it go to default case, and error out
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-511:
----------------------------
Attachment: HIVE-511.1.patch
Changed hash code implementation.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712337#action_12712337 ]
Raghotham Murthy commented on HIVE-511:
---------------------------------------
I dont think this change works well for longs with the code in UDFDefaultSampleHashFn since we just call o.hashCode(). Long.hashCode() returns (int)(value ^ (value >>> 32)) which is not the same as what you are doing. Notice '>>>' instead of '>>'. Also, maybe you want to change UDFDefaultSampleHashFn to use your hashCode method as well.
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-511:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed. Thanks Zheng!
> Change the hashcode for DoubleWritable
> --------------------------------------
>
> Key: HIVE-511
> URL: https://issues.apache.org/jira/browse/HIVE-511
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-511.1.patch, HIVE-511.2.patch, HIVE-511.3.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.