You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/03/14 08:19:57 UTC
[jira] Created: (HADOOP-80) binary key
binary key
----------
Key: HADOOP-80
URL: http://issues.apache.org/jira/browse/HADOOP-80
Project: Hadoop
Type: New Feature
Components: io
Versions: 0.1
Reporter: Owen O'Malley
Assigned to: Owen O'Malley
Fix For: 0.1
Attachments: binary-key.patch
I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-80) binary key
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=comments#action_12370606 ]
Doug Cutting commented on HADOOP-80:
------------------------------------
If we think that the default java hash function is bad, then we should probably switch it in all of the places where we hash, e.g., in UTF8, here, etc. For now I will: (a) add a static hashBytes() method to WritableComparable, use it in BytesWritable(). Then, if we someday decide to change the hash function then we'll have a single place to do it. Allocating a new digester per call to hashCode() seems expensive (although I have not benchmarked this).
MapReduce does use the hash function by default for partitioning, but HashMaps will also use it, so MapReduce is not the only client.
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key-2.patch, binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Closed: (HADOOP-80) binary key
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=all ]
Owen O'Malley closed HADOOP-80:
-------------------------------
Resolution: Fixed
Doug committed this.
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key-2.patch, binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-80) binary key
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=comments#action_12370609 ]
Doug Cutting commented on HADOOP-80:
------------------------------------
I just committed this as r386219.
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key-2.patch, binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-80) binary key
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=all ]
Owen O'Malley updated HADOOP-80:
--------------------------------
Attachment: binary-key.patch
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-80) binary key
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=comments#action_12370406 ]
Doug Cutting commented on HADOOP-80:
------------------------------------
Overall this looks good. A couple of questions:
1. Why call setSize(0) in read()? This looks like a no-op. Am I missing something?
2. Why bother to use md5 for hashCode()? That could be expensive. Why not implement this like java.util.Arrays.hashCode() and UTF8.hashCode():
public int hashCode() {
int hash = 1;
for (int i = 0; i < size; i++)
hash = (31 * hash) + (int)bytes[i];
return hash;
}
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-80) binary key
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=all ]
Owen O'Malley updated HADOOP-80:
--------------------------------
Attachment: binary-key-2.patch
This updated patch handles the case of shrinking the buffer's capacity. It also adds testing code for that situation into the unit test.
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key-2.patch, binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Re: [jira] Commented: (HADOOP-80) binary key
Posted by Andrzej Bialecki <ab...@getopt.org>.
Owen O'Malley (JIRA) wrote:
>> 2. Why bother to use md5 for hashCode()? That could be expensive. Why not implement this like
>> java.util.Arrays.hashCode() and UTF8.hashCode():
>>
>
> Yeah, I considered doing something lighter than md5, but using md5 prevents pathological cases from doing bad things. We also use md5 a lot around here, so it is a really useful default for us, but it might make sense to have a lighter hash alternative. However, since in map/reduce the hash function is only used for partitioning the map output, it seemed better to use a known good hash function than taking a chance on a fast but sloppy hash function.
>
You can find an FNV hash implementation here: http://www.getopt.org
(Apache license). Computationally it's similar in complexity to the
above hashing schemes, but gives much better distribution. Perhaps worth
a try.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[jira] Commented: (HADOOP-80) binary key
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-80?page=comments#action_12370477 ]
Owen O'Malley commented on HADOOP-80:
-------------------------------------
> 1. Why call setSize(0) in read()? This looks like a no-op. Am I missing something?
Yeah, that is a little subtle and I should have commented it better. Basically, when I do the second setSize, it may call setCapacity, which will copy the current data. If the current size is 0, then it won't copy anything. It won't change the user visible behavior, but will save a useless copy.
> 2. Why bother to use md5 for hashCode()? That could be expensive. Why not implement this like
> java.util.Arrays.hashCode() and UTF8.hashCode():
Yeah, I considered doing something lighter than md5, but using md5 prevents pathological cases from doing bad things. We also use md5 a lot around here, so it is a really useful default for us, but it might make sense to have a lighter hash alternative. However, since in map/reduce the hash function is only used for partitioning the map output, it seemed better to use a known good hash function than taking a chance on a fast but sloppy hash function.
> binary key
> ----------
>
> Key: HADOOP-80
> URL: http://issues.apache.org/jira/browse/HADOOP-80
> Project: Hadoop
> Type: New Feature
> Components: io
> Versions: 0.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.1
> Attachments: binary-key.patch
>
> I needed a binary key type, so I extended BytesWritable to be comparable also.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira