You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/05/23 02:12:45 UTC

[jira] Created: (HIVE-511) Change the hashcode for DoubleWritable

Change the hashcode for DoubleWritable
--------------------------------------

                 Key: HIVE-511
                 URL: https://issues.apache.org/jira/browse/HIVE-511
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Zheng Shao


The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713761#action_12713761 ] 

Zheng Shao commented on HIVE-511:
---------------------------------

Sorry I should have put a comment.
We are using http://java.sun.com/j2se/1.4.2/docs/api/java/util/List.html#hashCode() here.


> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-511:
----------------------------

    Attachment: HIVE-511.3.patch

Added a comment for the hash function.

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch, HIVE-511.3.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-511:
----------------------------

        Fix Version/s: 0.4.0
    Affects Version/s: 0.4.0
               Status: Patch Available  (was: Open)

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713796#action_12713796 ] 

Raghotham Murthy commented on HIVE-511:
---------------------------------------

+1

looks good. will commit once tests pass.

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch, HIVE-511.3.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao reassigned HIVE-511:
-------------------------------

    Assignee: Zheng Shao

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713751#action_12713751 ] 

Raghotham Murthy commented on HIVE-511:
---------------------------------------

 r = r * 31 + ObjectInspectorUtils.hashCode(arguments[i].get(), argumentOIs[i]);
Wont this overflow?

Maybe change it to something like the following?

r = r ^ ObjectInspectorUtils.hashCode(arguments[i].get(), argumentOIs[i]);

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712329#action_12712329 ] 

Zheng Shao commented on HIVE-511:
---------------------------------

Text and String are both "STRING" primitive type.
The relationship between Text and String are the same as that between IntWritable and Integer - they will have WritableStringOI and JavaStringOI, which both implements StringOI.


> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-511:
----------------------------

    Attachment: HIVE-511.2.patch

This patch moves hash function to GenericUDF. Now we supports variable number of arguments in the hash function.


> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712328#action_12712328 ] 

Namit Jain commented on HIVE-511:
---------------------------------



I am not sure I follow - what if if it a Text Writable Object.

Wont it go to default case, and error out

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-511:
----------------------------

    Attachment: HIVE-511.1.patch

Changed hash code implementation.

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712337#action_12712337 ] 

Raghotham Murthy commented on HIVE-511:
---------------------------------------

I dont think this change works well for longs with the code in UDFDefaultSampleHashFn since we just call o.hashCode(). Long.hashCode() returns (int)(value ^ (value >>> 32)) which is not the same as what you are doing. Notice '>>>' instead of '>>'. Also, maybe you want to change UDFDefaultSampleHashFn to use your hashCode method as well.

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-511.1.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-511) Change the hashcode for DoubleWritable

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-511:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed. Thanks Zheng!

> Change the hashcode for DoubleWritable
> --------------------------------------
>
>                 Key: HIVE-511
>                 URL: https://issues.apache.org/jira/browse/HIVE-511
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.4.0
>
>         Attachments: HIVE-511.1.patch, HIVE-511.2.patch, HIVE-511.3.patch
>
>
> The current DoubleWritable hashCode takes only the last 32 bits. This is a big problem because for small integer values like 1.0, 2.0, 15.0, the hashCode are all 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.