You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by "AlexanderSaydakov (via GitHub)" <gi...@apache.org> on 2023/06/29 04:45:20 UTC

[GitHub] [datasketches-cpp] AlexanderSaydakov opened a new pull request, #379: documentation fixes

AlexanderSaydakov opened a new pull request, #379:
URL: https://github.com/apache/datasketches-cpp/pull/379

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] coveralls commented on pull request #379: documentation fixes

Posted by "coveralls (via GitHub)" <gi...@apache.org>.
coveralls commented on PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379#issuecomment-1612427508

   ## Pull Request Test Coverage Report for [Build 5408574033](https://coveralls.io/builds/61034035)
   
   * **0** of **0**   changed or added relevant lines in **0** files are covered.
   * **8** unchanged lines in **2** files lost coverage.
   * Overall coverage decreased (**-0.2%**) to **96.853%**
   
   ---
   
   
   |  Files with Coverage Reduction | New Missed Lines | % |
   | :-----|--------------|--: |
   | [hll/include/CouponList-internal.hpp](https://coveralls.io/builds/61034035/source?filename=hll%2Finclude%2FCouponList-internal.hpp#L61) | 4 | 65.52% |
   | [hll/include/Hll6Array-internal.hpp](https://coveralls.io/builds/61034035/source?filename=hll%2Finclude%2FHll6Array-internal.hpp#L68) | 4 | 93.55% |
   <!-- | **Total:** | **8** |  | -->
   
   |  Totals | [![Coverage Status](https://coveralls.io/builds/61034035/badge)](https://coveralls.io/builds/61034035) |
   | :-- | --: |
   | Change from base [Build 5326369309](https://coveralls.io/builds/60851868): |  -0.2% |
   | Covered Lines: | 4894 |
   | Relevant Lines: | 5053 |
   
   ---
   ##### 💛  - [Coveralls](https://coveralls.io)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov commented on a diff in pull request #379: documentation fixes

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379#discussion_r1253680081


##########
theta/include/theta_sketch.hpp:
##########
@@ -106,6 +107,7 @@ class base_theta_sketch_alloc {
   virtual void print_items(std::ostringstream& os) const = 0;
 };
 
+/// Base class for the Theta Sketch, a generalization of the Kth Minimum Value (KMV) sketch.

Review Comment:
   we have this on the web site: "Theta Sketches are a generalization of the well known Kth Minimum Value (KMV)1,2 sketches in that KMV sketches are a form of Theta Sketch, but not all Theta Sketches are KMV."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov merged pull request #379: documentation fixes

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov merged PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov commented on a diff in pull request #379: documentation fixes

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379#discussion_r1253681842


##########
tuple/include/tuple_sketch.hpp:
##########
@@ -522,8 +558,7 @@ class compact_tuple_sketch: public tuple_sketch<Summary, Allocator> {
 
 };
 
-// builder
-
+// base builder

Review Comment:
   I think we could expose this one as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] jmalkin commented on a diff in pull request #379: documentation fixes

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on code in PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379#discussion_r1248189466


##########
count/include/count_min.hpp:
##########
@@ -25,26 +25,27 @@
 
 namespace datasketches {
 
-  /*
-   * C++ implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.
-   * [1] - http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
-   * The template type W is the type of the vector that contains the weights of the objects inserted into the sketch,
-   * not the type of the input items themselves.
-   * @author Charlie Dickens
-   */
-
+/**
+ * C++ implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.
+ * [1] - http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
+ * The template type W is the type of the vector that contains the weights of the objects inserted into the sketch,
+ * not the type of the input items themselves.
+ * @author Charlie Dickens
+ */
 template <typename W,
           typename Allocator = std::allocator<W>>
 class count_min_sketch{
   static_assert(std::is_arithmetic<W>::value, "Arithmetic type expected");
 public:
   using allocator_type = Allocator;
+  using const_iterator = typename std::vector<W, Allocator>::const_iterator;
 
   /**
    * Creates an instance of the sketch given parameters _num_hashes, _num_buckets and hash seed, `seed`.
    * @param num_hashes : number of hash functions in the sketch. Equivalently the number of rows in the array
    * @param num_buckets : number of buckets that hash functions map into. Equivalently the number of columns in the array
    * @param seed for hash function
+   * @param allocator to use by this instance

Review Comment:
   for/with rather than by?



##########
common/include/serde.hpp:
##########
@@ -30,23 +30,56 @@
 
 namespace datasketches {
 
-// serialize and deserialize
+/// Interface for serializing and deserializing items
 template<typename T, typename Enable = void> struct serde {
-  // stream serialization
+  /**
+   * Stream serialization
+   * @param os output stream
+   * @param items pointer to array of items
+   * @param num number of items
+   */
   void serialize(std::ostream& os, const T* items, unsigned num) const;
+
+  /**
+   * Stream deserialization
+   * @param is input stream
+   * @param items pointer to array of items

Review Comment:
   I think we should specify allocation but not initialized here as opposed to inline with the declaration



##########
tuple/include/tuple_sketch.hpp:
##########
@@ -522,8 +558,7 @@ class compact_tuple_sketch: public tuple_sketch<Summary, Allocator> {
 
 };
 
-// builder
-
+// base builder

Review Comment:
   intentional to have only `//` rather than `///`?



##########
req/include/req_sketch.hpp:
##########
@@ -39,26 +81,55 @@ class req_sketch {
   using comparator = Comparator;
   using Compactor = req_compactor<T, Comparator, Allocator>;
   using AllocCompactor = typename std::allocator_traits<Allocator>::template rebind_alloc<Compactor>;
+  using vector_double = typename quantiles_sorted_view<T, Comparator, Allocator>::vector_double;
+
+  /**
+   * Quantile return type.
+   * This is to return quantiles either by value (for arithmetic types) or by const reference (for all other types)
+   */
+  using quantile_return_type = typename quantiles_sorted_view<T, Comparator, Allocator>::quantile_return_type;
 
   /**
    * Constructor
    * @param k Controls the size and error of the sketch. It must be even and in the range [4, 1024], inclusive.
    * Value of 12 roughly corresponds to 1% relative error guarantee at 95% confidence.
    * @param hra if true, the default, the high ranks are prioritized for better
    * accuracy. Otherwise the low ranks are prioritized for better accuracy.
-   * @param comparator to use by this instance
-   * @param allocator to use by this instance
+   * @param comparator instance to use by this sketch instance

Review Comment:
   `for use by` or `to use with`
   and in the next row



##########
count/include/count_min.hpp:
##########
@@ -111,7 +114,7 @@ class count_min_sketch{
   /**
    * Specific get_estimate function for int64_t type
    * see generic get_estimate function
-   * @param item : uint64_t type.
+   * @param item : int64_t type.

Review Comment:
   remove the `:`



##########
theta/include/theta_sketch.hpp:
##########
@@ -106,6 +107,7 @@ class base_theta_sketch_alloc {
   virtual void print_items(std::ostringstream& os) const = 0;
 };
 
+/// Base class for the Theta Sketch, a generalization of the Kth Minimum Value (KMV) sketch.

Review Comment:
   Is it a generalization of or a type of KMV sketch? Maybe sidestep and describe it as a type of KMV sketch for counting distinct items?



##########
common/include/quantiles_sorted_view.hpp:
##########
@@ -27,37 +27,129 @@
 
 namespace datasketches {
 
+/**
+ * Sorted view for quantiles sketches (REQ, KLL and Quantiles)
+ */
 template<
   typename T,
   typename Comparator, // strict weak ordering function (see C++ named requirements: Compare)
   typename Allocator
 >
 class quantiles_sorted_view {
 public:
+  /// Entry type
   using Entry = typename std::conditional<std::is_arithmetic<T>::value, std::pair<T, uint64_t>, std::pair<const T*, uint64_t>>::type;
   using AllocEntry = typename std::allocator_traits<Allocator>::template rebind_alloc<Entry>;
   using Container = std::vector<Entry, AllocEntry>;
 
+  /// @private
   quantiles_sorted_view(uint32_t num, const Comparator& comparator, const Allocator& allocator);
 
+  /// @private
   template<typename Iterator>
   void add(Iterator begin, Iterator end, uint64_t weight);
 
+  /// @private
   void convert_to_cummulative();
 
   class const_iterator;
+
+  /**
+   * Iterator pointing to the first entry in the view.
+   * If the view is empty, the returned iterator must not be dereferenced or incremented.
+   * @return iterator pointing to the first entry
+   */
   const_iterator begin() const;
+
+  /**
+   * Iterator pointing to the past-the-end entry in the view.
+   * The past-the-end entry is the hypothetical entry that would follow the last entry.
+   * It does not point to any entry, and must not be dereferenced or incremented.
+   * @return iterator pointing to the past-the-end entry
+   */
   const_iterator end() const;
 
+  /// @return size of the view
   size_t size() const;
 
+  /**
+   * Returns an approximation to the normalized rank of the given item from 0 to 1, inclusive.

Review Comment:
   I first read this as computing rank of an item from 0 to 1, which confused me. I think it might be worth splitting it into 2 sentences (end after item) to note that the result will be in [0,1] in a more detailed description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] jmalkin commented on a diff in pull request #379: documentation fixes

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on code in PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379#discussion_r1253721913


##########
theta/include/theta_sketch.hpp:
##########
@@ -106,6 +107,7 @@ class base_theta_sketch_alloc {
   virtual void print_items(std::ostringstream& os) const = 0;
 };
 
+/// Base class for the Theta Sketch, a generalization of the Kth Minimum Value (KMV) sketch.

Review Comment:
   Ok, although a bit misleading in the context of C++ where we don't offer anything but KMV. That's fine though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov commented on pull request #379: documentation fixes

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov commented on PR #379:
URL: https://github.com/apache/datasketches-cpp/pull/379#issuecomment-1613500210

   Besides doc fixes I made a couple of HLL operator= methods return ref instead of value, and removed a couple of update methods in count_min_sketch, which became unnecessary after adding default weight of 1. I believe these changes must be invisible for the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org