You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vitamon <gi...@git.apache.org> on 2017/10/02 14:23:47 UTC

[GitHub] spark pull request #19409: fix openHashSet to actually use quadratic probing...

GitHub user vitamon opened a pull request:

    https://github.com/apache/spark/pull/19409

    fix openHashSet to actually use quadratic probing instead of linear

    The comments in the code state that OpehHashSet uses quadratic probing, but in fact it uses linear probing, which "results in primary clustering, and as the cluster grows larger, the search for those items hashing within the cluster becomes less efficient."
    see https://en.wikipedia.org/wiki/Quadratic_probing
    
    OpenHashSetSuite pass with both probing methods.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vitamon/spark openhashset

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19409
    
----
commit 6fb0a407edd3ae8c8d4b9154076768ed03028a09
Author: Vitalii Tamazian <vt...@google.com>
Date:   2017-10-02T14:15:26Z

    fix openHashSet to actually use quadratic probing instead of linear

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/19409
  
    Sure, that's not what your change does. That's also not the only valid probing scheme; lots work. The current one is however quadratic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19409
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...

Posted by vitamon <gi...@git.apache.org>.
Github user vitamon commented on the issue:

    https://github.com/apache/spark/pull/19409
  
    Ah, I see now, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19409: fix openHashSet to actually use quadratic probing...

Posted by vitamon <gi...@git.apache.org>.
Github user vitamon closed the pull request at:

    https://github.com/apache/spark/pull/19409


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...

Posted by vitamon <gi...@git.apache.org>.
Github user vitamon commented on the issue:

    https://github.com/apache/spark/pull/19409
  
    @srowen see the reference algorithms here: https://en.wikipedia.org/wiki/Quadratic_probing
    
            index = (key + i*i) % SIZE;



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19409: fix openHashSet to actually use quadratic probing instea...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/19409
  
    No, this is wrong. delta increases by 1 each iteration. The probe offsets are 1, 3, 6, 10, 15... or $n(n+1)/2$. It's not $n^2$ but it's quadratic. This change would not be quadratic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org