You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2023/01/03 18:28:09 UTC

[GitHub] [datasketches-cpp] AlexanderSaydakov opened a new pull request, #322: Python wrapper improvement

AlexanderSaydakov opened a new pull request, #322:
URL: https://github.com/apache/datasketches-cpp/pull/322

   added iterators, rearranged and simplified existing code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] jmalkin commented on a diff in pull request #322: Python wrapper improvement

Posted by GitBox <gi...@apache.org>.
jmalkin commented on code in PR #322:
URL: https://github.com/apache/datasketches-cpp/pull/322#discussion_r1062762503


##########
python/tests/vo_test.py:
##########
@@ -45,6 +45,13 @@ def test_vo_example(self):
     items = vo.get_samples()
     self.assertEqual(len(items), k)
 
+    count = 0
+    for tuple in vo:
+      sample = tuple[0]
+      weight = tuple[1]

Review Comment:
   Discussed offline. It's exact modulo machine precision with the arithmetic, but that's a fair point. It is reasonable to test that part only in C++.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov merged pull request #322: Python wrapper improvement

Posted by GitBox <gi...@apache.org>.
AlexanderSaydakov merged PR #322:
URL: https://github.com/apache/datasketches-cpp/pull/322


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] jmalkin commented on a diff in pull request #322: Python wrapper improvement

Posted by GitBox <gi...@apache.org>.
jmalkin commented on code in PR #322:
URL: https://github.com/apache/datasketches-cpp/pull/322#discussion_r1061817824


##########
python/src/quantiles_wrapper.cpp:
##########
@@ -114,12 +34,27 @@ void bind_quantiles_sketch(py::module &m, const char* name) {
   py::class_<quantiles_sketch<T>>(m, name)
     .def(py::init<uint16_t>(), py::arg("k")=quantiles_constants::DEFAULT_K)
     .def(py::init<const quantiles_sketch<T>&>())
-    .def("update", (void (quantiles_sketch<T>::*)(const T&)) &quantiles_sketch<T>::update, py::arg("item"),
-         "Updates the sketch with the given value")
-    .def("update", &dspy::quantiles_sketch_update<T>, py::arg("array"),
-         "Updates the sketch with the values in the given array")
+    .def(
+        "update",

Review Comment:
   By the time we have conditionals and exceptions, I feel like encapsulating the logic in a function elsewhere is probably a win for interface readability.



##########
python/tests/vo_test.py:
##########
@@ -45,6 +45,13 @@ def test_vo_example(self):
     items = vo.get_samples()
     self.assertEqual(len(items), k)
 
+    count = 0
+    for tuple in vo:
+      sample = tuple[0]
+      weight = tuple[1]

Review Comment:
   worth summing weights and ensuring they equal the total?



##########
python/src/fi_wrapper.cpp:
##########
@@ -105,15 +53,55 @@ void bind_fi_sketch(py::module &m, const char* name) {
          "Returns the guaranteed upper bound weight (frequency) of the given item.")
     .def("get_sketch_epsilon", (double (frequent_items_sketch<T>::*)(void) const) &frequent_items_sketch<T>::get_epsilon,
          "Returns the epsilon value used by the sketch to compute error")
-    .def_static("get_epsilon_for_lg_size", &dspy::fi_sketch_get_generic_epsilon<T>, py::arg("lg_max_map_size"),
-         "Returns the epsilon value used to compute a priori error for a given log2(max_map_size)")
-    .def_static("get_apriori_error", &frequent_items_sketch<T>::get_apriori_error, py::arg("lg_max_map_size"), py::arg("estimated_total_weight"),
-         "Returns the estimated a priori error given the max_map_size for the sketch and the estimated_total_stream_weight.")
-    .def("get_serialized_size_bytes", &dspy::fi_sketch_get_serialized_size_bytes<T>,
-         "Computes the size needed to serialize the current state of the sketch. This can be expensive since every item needs to be looked at.")
-    .def("serialize", &dspy::fi_sketch_serialize<T>, "Serializes the sketch into a bytes object")
-    .def_static("deserialize", &dspy::fi_sketch_deserialize<T>, "Reads a bytes object and returns the corresponding frequent_strings_sketch")
-    ;
+    .def(

Review Comment:
   I'm not actually convinced that this long a lambda is an improvement to readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov commented on a diff in pull request #322: Python wrapper improvement

Posted by GitBox <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #322:
URL: https://github.com/apache/datasketches-cpp/pull/322#discussion_r1061976333


##########
python/src/fi_wrapper.cpp:
##########
@@ -105,15 +53,55 @@ void bind_fi_sketch(py::module &m, const char* name) {
          "Returns the guaranteed upper bound weight (frequency) of the given item.")
     .def("get_sketch_epsilon", (double (frequent_items_sketch<T>::*)(void) const) &frequent_items_sketch<T>::get_epsilon,
          "Returns the epsilon value used by the sketch to compute error")
-    .def_static("get_epsilon_for_lg_size", &dspy::fi_sketch_get_generic_epsilon<T>, py::arg("lg_max_map_size"),
-         "Returns the epsilon value used to compute a priori error for a given log2(max_map_size)")
-    .def_static("get_apriori_error", &frequent_items_sketch<T>::get_apriori_error, py::arg("lg_max_map_size"), py::arg("estimated_total_weight"),
-         "Returns the estimated a priori error given the max_map_size for the sketch and the estimated_total_stream_weight.")
-    .def("get_serialized_size_bytes", &dspy::fi_sketch_get_serialized_size_bytes<T>,
-         "Computes the size needed to serialize the current state of the sketch. This can be expensive since every item needs to be looked at.")
-    .def("serialize", &dspy::fi_sketch_serialize<T>, "Serializes the sketch into a bytes object")
-    .def_static("deserialize", &dspy::fi_sketch_deserialize<T>, "Reads a bytes object and returns the corresponding frequent_strings_sketch")
-    ;
+    .def(

Review Comment:
   I thought that getting rid of that large section with namespaces would be good. Yes, this particular function is not trivial, but in my view it is not worse than having that large section just for this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov commented on a diff in pull request #322: Python wrapper improvement

Posted by GitBox <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #322:
URL: https://github.com/apache/datasketches-cpp/pull/322#discussion_r1061977242


##########
python/src/quantiles_wrapper.cpp:
##########
@@ -114,12 +34,27 @@ void bind_quantiles_sketch(py::module &m, const char* name) {
   py::class_<quantiles_sketch<T>>(m, name)
     .def(py::init<uint16_t>(), py::arg("k")=quantiles_constants::DEFAULT_K)
     .def(py::init<const quantiles_sketch<T>&>())
-    .def("update", (void (quantiles_sketch<T>::*)(const T&)) &quantiles_sketch<T>::update, py::arg("item"),
-         "Updates the sketch with the given value")
-    .def("update", &dspy::quantiles_sketch_update<T>, py::arg("array"),
-         "Updates the sketch with the values in the given array")
+    .def(
+        "update",

Review Comment:
   perhaps. I just didn't want to make an exception for one function



##########
python/tests/vo_test.py:
##########
@@ -45,6 +45,13 @@ def test_vo_example(self):
     items = vo.get_samples()
     self.assertEqual(len(items), k)
 
+    count = 0
+    for tuple in vo:
+      sample = tuple[0]
+      weight = tuple[1]

Review Comment:
   This seems to be a bit tricky since it is not exact. So I would suggest not test sketch correctness here.



##########
python/tests/vo_test.py:
##########
@@ -45,6 +45,13 @@ def test_vo_example(self):
     items = vo.get_samples()
     self.assertEqual(len(items), k)
 
+    count = 0
+    for tuple in vo:
+      sample = tuple[0]
+      weight = tuple[1]

Review Comment:
   maybe, but not necessary. I just wanted to exercise the iterator and show how to access the pieces



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] coveralls commented on pull request #322: Python wrapper improvement

Posted by GitBox <gi...@apache.org>.
coveralls commented on PR #322:
URL: https://github.com/apache/datasketches-cpp/pull/322#issuecomment-1370098723

   ## Pull Request Test Coverage Report for [Build 3832015948](https://coveralls.io/builds/55608574)
   
   * **0** of **0**   changed or added relevant lines in **0** files are covered.
   * No unchanged relevant lines lost coverage.
   * Overall coverage remained the same at **93.878%**
   
   ---
   
   
   
   |  Totals | [![Coverage Status](https://coveralls.io/builds/55608574/badge)](https://coveralls.io/builds/55608574) |
   | :-- | --: |
   | Change from base [Build 3740251941](https://coveralls.io/builds/55319887): |  0.0% |
   | Covered Lines: | 2331 |
   | Relevant Lines: | 2483 |
   
   ---
   ##### 💛  - [Coveralls](https://coveralls.io)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org