You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/30 21:53:30 UTC

[GitHub] [arrow] westonpace commented on pull request #13487: ARROW-16945: [C++] Add new scalar compute function for 32-bit hashing

westonpace commented on PR #13487:
URL: https://github.com/apache/arrow/pull/13487#issuecomment-1171714784

   > I'm not sure how to validate the hash outputs are "as expected". Further, unit tests for the hashing functions don't seem to validate hash outputs.
   
   For a hashing function I would expect:
    * If two values are equal then their hashes are equal
    * Given a random selection of non-equal values there should be some kind of expected false positive rate (e.g. equal hashes on unequal values).  Ideally we would include, as part of this, a benchmark that measures the FPR on random values.  You could then take then, pick a safe threshold (e.g. if the benchmark tends to show a 5% FPR then pick 10%) and put that into the unit test (e.g. assert the FPR is less than the safe threshold).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org