You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by jm...@apache.org on 2021/09/21 17:34:11 UTC

[datasketches-cpp] branch update_py_readme created (now d5848ef)

This is an automated email from the ASF dual-hosted git repository.

jmalkin pushed a change to branch update_py_readme
in repository https://gitbox.apache.org/repos/asf/datasketches-cpp.git.


      at d5848ef  remove unused module file, update python readme

This branch includes the following new commits:

     new d5848ef  remove unused module file, update python readme

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[datasketches-cpp] 01/01: remove unused module file, update python readme

Posted by jm...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jmalkin pushed a commit to branch update_py_readme
in repository https://gitbox.apache.org/repos/asf/datasketches-cpp.git

commit d5848ef3dab141171899735d27bfa1119fbe3c0d
Author: Jon Malkin <jm...@users.noreply.github.com>
AuthorDate: Tue Sep 21 10:33:57 2021 -0700

    remove unused module file, update python readme
---
 .gitmodules      |  0
 python/README.md | 49 ++++++++++++++++++++-----------------------------
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/.gitmodules b/.gitmodules
deleted file mode 100644
index e69de29..0000000
diff --git a/python/README.md b/python/README.md
index fcc595f..35c669e 100644
--- a/python/README.md
+++ b/python/README.md
@@ -1,41 +1,20 @@
 # Python Wrapper for Apache DataSketches
 
-## Installation
+This is the official version of the [Apache DataSketches](https://datasketches.apache.org) Python library.
 
-The release files do not include the needed python binding library ([pybind11](https://github.com/pybind/pybind11)). If building
-from a relase package, you must ensure that the pybind11 directory points to a local copy of pybind11.
+In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis.
 
-An official pypi build is eventually planned but not yet available.
+If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or sketches that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.
 
-If you instead want to take a (possibly ill-advised) gamble on the current state of the master branch being useable, you can run:
-```pip install git+https://github.com/apache/datasketches-cpp.git```
+This package provides a variety of sketches as described below. Wherever a specific type of sketch exists in Apache DataSketches packages for other languages, the sketches will be portable between languages (for platforms with the same endianness).
 
-## Developer Instructions
-
-### Building
-
-When cloning the source repository, you should include the pybind11 submodule with the `--recursive` option to the clone command:
-```
-git clone --recursive https://github.com/apache/datasketches-cpp.git
-cd datasketches-cpp
-python -m pip install --upgrade pip setuptools wheel numpy
-python setup.py build
-```
+## Building and Installation
 
-If you cloned without `--recursive`, you can add the submodule post-checkout using `git submodule update --init --recursive`.
+Once cloned, the library can be installed by running `python -m pip install .` which will also install the necessary dependencies, namely [pybind11](https://github.com/pybind/pybind11) and numpy.
 
-### Installing
+If you prefer to call the `setup.py` build script directly, you must first install `pybind11[global]`, as well as any other dependencies listed under the build-system section in `pyproject.toml`.
 
-Assuming you have already checked out the library and any dependent submodules, install by simply replacing the last
-line of the build command with `python setup.py install`.
-
-### Unit tests
-
-The python tests are run with `tox`. To ensure you have all the needed packages, from the package base directory run:
-```
-python -m pip install --upgrade pip setuptools wheel numpy tox
-tox
-```
+The library is also available (or soon will be) from PyPI via `python -m pip install datasketches`
 
 ## Usage
 
@@ -79,3 +58,15 @@ The Python API largely mirrors the C++ API, with a few minor exceptions: The pri
 The Vector of KLL object is currently exclusive to python, and holds an array of independent KLL sketches. This is useful for creating a set of KLL sketches over a vector and has been designed to allow input as either a vector or a matrix of multiple vectors.
 
 We have also removed reliance on a builder class for theta sketches as Python allows named arguments to the constructor, not strictly positional arguments.
+
+## Developer Instructions
+
+The only developer-specific instructions relate to running unit tests.
+
+### Unit tests
+
+The Python unit tests are run with `tox`. To ensure you have all the needed package, from the package base directory run:
+```
+python -m pip install --upgrade tox
+tox
+```

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org