You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by jm...@apache.org on 2021/09/21 18:45:39 UTC

[datasketches-cpp] branch py_readme updated: update python readme, remove unused .gitmodules

This is an automated email from the ASF dual-hosted git repository.

jmalkin pushed a commit to branch py_readme
in repository https://gitbox.apache.org/repos/asf/datasketches-cpp.git


The following commit(s) were added to refs/heads/py_readme by this push:
     new 04e2272  update python readme, remove unused .gitmodules
04e2272 is described below

commit 04e22729d3e08a5ff18e6a1c79b28758ab9e772d
Author: Jon Malkin <jm...@users.noreply.github.com>
AuthorDate: Tue Sep 21 11:45:31 2021 -0700

    update python readme, remove unused .gitmodules
---
 .gitmodules      |  0
 python/README.md | 89 +++++++++++++++++++++++++-------------------------------
 2 files changed, 40 insertions(+), 49 deletions(-)

diff --git a/.gitmodules b/.gitmodules
deleted file mode 100644
index e69de29..0000000
diff --git a/python/README.md b/python/README.md
index fcc595f..cdc01c8 100644
--- a/python/README.md
+++ b/python/README.md
@@ -1,41 +1,20 @@
 # Python Wrapper for Apache DataSketches
 
-## Installation
+This is the official version of the [Apache DataSketches](https://datasketches.apache.org) Python library.
 
-The release files do not include the needed python binding library ([pybind11](https://github.com/pybind/pybind11)). If building
-from a relase package, you must ensure that the pybind11 directory points to a local copy of pybind11.
+In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis.
 
-An official pypi build is eventually planned but not yet available.
+If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or sketches that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.
 
-If you instead want to take a (possibly ill-advised) gamble on the current state of the master branch being useable, you can run:
-```pip install git+https://github.com/apache/datasketches-cpp.git```
+This package provides a variety of sketches as described below. Wherever a specific type of sketch exists in Apache DataSketches packages for other languages, the sketches will be portable between languages (for platforms with the same endianness).
 
-## Developer Instructions
-
-### Building
-
-When cloning the source repository, you should include the pybind11 submodule with the `--recursive` option to the clone command:
-```
-git clone --recursive https://github.com/apache/datasketches-cpp.git
-cd datasketches-cpp
-python -m pip install --upgrade pip setuptools wheel numpy
-python setup.py build
-```
+## Building and Installation
 
-If you cloned without `--recursive`, you can add the submodule post-checkout using `git submodule update --init --recursive`.
+Once cloned, the library can be installed by running `python -m pip install .` which will also install the necessary dependencies, namely [pybind11](https://github.com/pybind/pybind11) and numpy.
 
-### Installing
+If you prefer to call the `setup.py` build script directly, you must first install `pybind11[global]`, as well as any other dependencies listed under the build-system section in `pyproject.toml`.
 
-Assuming you have already checked out the library and any dependent submodules, install by simply replacing the last
-line of the build command with `python setup.py install`.
-
-### Unit tests
-
-The python tests are run with `tox`. To ensure you have all the needed packages, from the package base directory run:
-```
-python -m pip install --upgrade pip setuptools wheel numpy tox
-tox
-```
+The library is also available (or soon will be) from PyPI via `python -m pip install datasketches`
 
 ## Usage
 
@@ -44,33 +23,33 @@ Having installed the library, loading the Apache Datasketches library in Python
 ## Available Sketch Classes
 
 - KLL (Absolute Error Quantiles)
-    - `kll_ints_sketch`
-    - `kll_floats_sketch`
+  - `kll_ints_sketch`
+  - `kll_floats_sketch`
 - REQ (Relative Error Quantiles)
-    - `req_ints_sketch`
-    - `req_floats_sketch`
+  - `req_ints_sketch`
+  - `req_floats_sketch`
 - Frequent Items
-    - `frequent_strings_sketch`
-    - Error types are `frequent_items_error_type.{NO_FALSE_NEGATIVES | NO_FALSE_POSITIVES}`
+  - `frequent_strings_sketch`
+  - Error types are `frequent_items_error_type.{NO_FALSE_NEGATIVES | NO_FALSE_POSITIVES}`
 - Theta
-    - `update_theta_sketch`
-    - `compact_theta_sketch` (cannot be instantiated directly)
-    - `theta_union`
-    - `theta_intersection`
-    - `theta_a_not_b`
+  - `update_theta_sketch`
+  - `compact_theta_sketch` (cannot be instantiated directly)
+  - `theta_union`
+  - `theta_intersection`
+  - `theta_a_not_b`
 - HLL
-    - `hll_sketch`
-    - `hll_union`
-    - Target HLL types are `tgt_hll_type.{HLL_4 | HLL_6 | HLL_8}`
+  - `hll_sketch`
+  - `hll_union`
+  - Target HLL types are `tgt_hll_type.{HLL_4 | HLL_6 | HLL_8}`
 - CPC
-    - `cpc_sketch`
-    - `cpc_union`
+  - `cpc_sketch`
+  - `cpc_union`
 - VarOpt Sampling
-    - `var_opt_sketch`
-    - `var_opt_union`
+  - `var_opt_sketch`
+  - `var_opt_union`
 - Vector of KLL
-    - `vector_of_kll_ints_sketches`
-    - `vector_of_kll_floats_sketches`
+  - `vector_of_kll_ints_sketches`
+  - `vector_of_kll_floats_sketches`
 
 ## Known Differences from C++
 
@@ -79,3 +58,15 @@ The Python API largely mirrors the C++ API, with a few minor exceptions: The pri
 The Vector of KLL object is currently exclusive to python, and holds an array of independent KLL sketches. This is useful for creating a set of KLL sketches over a vector and has been designed to allow input as either a vector or a matrix of multiple vectors.
 
 We have also removed reliance on a builder class for theta sketches as Python allows named arguments to the constructor, not strictly positional arguments.
+
+## Developer Instructions
+
+The only developer-specific instructions relate to running unit tests.
+
+### Unit tests
+
+The Python unit tests are run with `tox`. To ensure you have all the needed package, from the package base directory run:
+```
+python -m pip install --upgrade tox
+tox
+```

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org