You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2023/01/16 10:48:45 UTC

[GitHub] [datasketches-cpp] c-dickens commented on a diff in pull request #325: Cpp countmin

c-dickens commented on code in PR #325:
URL: https://github.com/apache/datasketches-cpp/pull/325#discussion_r1071094476


##########
count/include/count_min.hpp:
##########
@@ -0,0 +1,166 @@
+#ifndef COUNT_MIN_HPP_
+#define COUNT_MIN_HPP_
+
+#include <cstdint>
+#include <vector>
+#include <iterator>
+#include <algorithm>
+
+#include "common_defs.hpp"
+
+namespace datasketches {
+
+  /*
+   * C++ implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.
+   * [1] - http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
+   * @author Charlie Dickens
+   */
+
+template<typename T> class count_min_sketch ;
+
+template<typename T>
+class count_min_sketch{
+  static_assert(std::is_arithmetic<T>::value, "Arithmetic type expected");
+public:
+  uint64_t num_hashes, num_buckets, seed, sketch_length ;

Review Comment:
   - num_hashes has been changed to uint8.  The failure probability `delta = exp(-d)` and `exp(-255)` is on the order of 10^(-100) which will be small enough for applications.  On the other hand using a smaller integer type eg `exp(-15)` would rule out sketches with say 20 hash functions.  This is already a small probability of failure though.
   - `num_buckets` has been changed to 32 bits by similar reasoning. 
   - In future, we could make this flexible because 16 bit num_hashes and 32 bit num_buckets might still be larger than necessary for some applications.  Eg if the user can tolerate failure probability up to 1E-6, 1E-7, then 8 bit uints could be used for the number of hash functions, followed by a similar reduction in num_buckets if lower error can be tolerated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org