You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vitamon <gi...@git.apache.org> on 2017/10/02 14:23:47 UTC
[GitHub] spark pull request #19409: fix openHashSet to actually use quadratic probing...
GitHub user vitamon opened a pull request:
https://github.com/apache/spark/pull/19409
fix openHashSet to actually use quadratic probing instead of linear
The comments in the code state that OpehHashSet uses quadratic probing, but in fact it uses linear probing, which "results in primary clustering, and as the cluster grows larger, the search for those items hashing within the cluster becomes less efficient."
see https://en.wikipedia.org/wiki/Quadratic_probing
OpenHashSetSuite pass with both probing methods.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vitamon/spark openhashset
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19409.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19409
----
commit 6fb0a407edd3ae8c8d4b9154076768ed03028a09
Author: Vitalii Tamazian <vt...@google.com>
Date: 2017-10-02T14:15:26Z
fix openHashSet to actually use quadratic probing instead of linear
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19409
Sure, that's not what your change does. That's also not the only valid probing scheme; lots work. The current one is however quadratic.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19409
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...
Posted by vitamon <gi...@git.apache.org>.
Github user vitamon commented on the issue:
https://github.com/apache/spark/pull/19409
Ah, I see now, thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19409: fix openHashSet to actually use quadratic probing...
Posted by vitamon <gi...@git.apache.org>.
Github user vitamon closed the pull request at:
https://github.com/apache/spark/pull/19409
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...
Posted by vitamon <gi...@git.apache.org>.
Github user vitamon commented on the issue:
https://github.com/apache/spark/pull/19409
@srowen see the reference algorithms here: https://en.wikipedia.org/wiki/Quadratic_probing
index = (key + i*i) % SIZE;
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19409
No, this is wrong. delta increases by 1 each iteration. The probe offsets are 1, 3, 6, 10, 15... or $n(n+1)/2$. It's not $n^2$ but it's quadratic. This change would not be quadratic.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org