You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by le...@apache.org on 2020/02/20 18:42:01 UTC

[incubator-datasketches-website] 01/03: Interim update

This is an automated email from the ASF dual-hosted git repository.

leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-datasketches-website.git

commit 9648f776f57dc1629fbe3e62156e52dfff18af02
Author: Lee Rhodes <le...@users.noreply.github.com>
AuthorDate: Wed Feb 19 17:54:47 2020 -0800

    Interim update
---
 _includes/page_header.html                         |   2 +-
 community/index.md                                 |  52 ++++++++++++
 css/header.css                                     |   2 +-
 docs/Architecture.md                               |  90 +++++---------------
 docs/Architecture/Components.md                    |  94 +++++++++++++++++++++
 .../SketchesByComponent.md}                        |  90 +++++---------------
 docs/SketchCriteria.md                             |  59 +++++++++++++
 img/Yahoo_white_small.png                          | Bin 10829 -> 0 bytes
 index.md                                           |   2 +-
 .../apache/datasketches/docgen/TocGenerator.java   |   2 +-
 src/main/resources/docgen/toc.json                 |   8 +-
 11 files changed, 262 insertions(+), 139 deletions(-)

diff --git a/_includes/page_header.html b/_includes/page_header.html
index c2f0904..aa90542 100644
--- a/_includes/page_header.html
+++ b/_includes/page_header.html
@@ -31,7 +31,7 @@
             <span class="fa fa-paper-plane"></span> RESEARCH</a>
         </li>
         <li>
-          <a href="https://groups.google.com/forum/#!forum/sketches-user">
+          <a href="https://lists.apache.org/list.html?users@datasketches.apache.org">
             <span class="fa fa-comment"></span> FORUM</a>
         </li>
         <ul class="nav navbar-nav navbar-right ds-nav">
diff --git a/community/index.md b/community/index.md
new file mode 100644
index 0000000..aef3f17
--- /dev/null
+++ b/community/index.md
@@ -0,0 +1,52 @@
+---
+layout: doc_page
+dev: https://lists.apache.org/list.html?dev@datasketches.apache.org
+users: https://lists.apache.org/list.html?users@datasketches.apache.org
+---
+
+# Apache DataSketches Community
+
+## How We Communicate
+
+There are many ways that are available for our community to communicate with each other and directly with the developer team.  Please review the following for methods that meet your needs.
+
+* **[Users Mailing List]({{page.users}}):** This is a great place for all users (new and experienced) to ask general questions about the library, its general capabilities, and where to get help and find more information.  This is also a great place to give the developers general feedback about the library. If you like what you see, please give us a [Star (Java)](https://github.com/apache/incubator-datasketches-java) and/or [Star (C++/Python)](https://github.com/apache/incubator-datasketc [...]
+  
+* **[Developers Mailing List]({{page.dev}}):** This is where the developers, committers, and contributors congregate to discuss, vote and establish priorities on addressing issues and opportunities with the library. The issues discussed tend to apply across all the different components of the library (see below).
+
+* **Slack:** We have two channels on the Apache Slack Workspace. Once you [join](https://s.apache.org/slack-invite) add the following channels:
+    * **datasketches** This is for use similar to the Users Mailing List above.
+    * **datasketches-dev** This is for use similar to the Developers Mailing List.
+<br/><br/>
+* **[GitHub Components]({{site.docs_dir}}/Architecture/Components.html):** Our library is made up of components that are partitioned into GitHub repositories. If you have a specific issue or bug report that impacts only one of these components please open an issue on the respective component. If you are a developer and wish to submit a PR, please choose the appropriate repository.
+
+## Contributing
+
+We are always open to contributions from our community.  Contributions can be of many forms: documentation, testing, science as well as bug fixes, code enhancements, code reviews, feature suggestions, usability feedback, etc. Contributions usually take the form of a Pull Request (PR), but if you wish to contribute and not sure how, please contact us on our [dev]({{page.dev}}) list.  
+
+We are also open to the submission of entirely new sketch algorithms.  If you have a sketch algorithm (or a significant enhancement of our current algorithms), please read our [Sketch Criteria]({{site.docs_dir}}/SketchCriteria.html) and contact us on our [dev]({{page.dev}}) list.
+
+### What to work on
+We are in the process of developing a comprehensive TODO list that will be referenced here.
+
+### Getting your proposed changes accepted
+
+Proposed changes to the code or documentation are usually done through GitHub Pull Requests (PRs).
+
+* Simple PRs, such as simple bug fixes, typos, and documentation corrections require one approval vote (+1) from a committer.
+* Major changes to the code such as API or architectural changes or new sketch algorithms must be discussed on [dev]({{page.dev}}) or on a GitHub issue as these will require additional design and compatibility reviews. These changes must receive at least three (+1) votes from committers. If the author is already a committer, than two additional committers must vote (+1). 
+
+### Becoming a committer
+We welcome anyone who is eager to continue to contribute to the DataSketches mission of providing open source, production quality sketch algorithms and become part of our team.  Please send us a message on [dev]({{page.dev}}) where we can give you some guidance.  After you have made some successful contributions, the current committers will discuss your candidacy for becoming a committer.  You can also review the [Apache policies on becoming a committer](https://community.apache.org/cont [...]
+
+
+
+## Governance
+The [Project Management Committee](https://www.apache.org/foundation/how-it-works.html) (PMC) is responsible for the administrative aspects of the DataSketches project.
+
+The basic responsibilities of the PMC include:
+
+* Approving releases
+* Nominating new committers
+* Maintaining the project's shared resources, including the github account, mailing lists, websites, social media channels, etc.
+* Maintaining guidelines for the project
\ No newline at end of file
diff --git a/css/header.css b/css/header.css
index 6e68d95..b9c59dd 100644
--- a/css/header.css
+++ b/css/header.css
@@ -168,7 +168,7 @@ a:hover, a:focus {
 }
 
 .navbar-inverse .navbar-nav>li>a,.dropdown-menu>li>a {
-  color: rgba(177,186,198,.7);
+  color: rgba(227,204,255, 1); //was 177,186,198,.7
   transition: color 0.1s ease;
   font-weight: 300;
   text-transform: uppercase;
diff --git a/docs/Architecture.md b/docs/Architecture.md
index e68bf80..4600486 100644
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@@ -19,7 +19,7 @@ layout: doc_page
     specific language governing permissions and limitations
     under the License.
 -->
-# Architecture
+# Sketches by Component
 
 The DataSketches Library is organized into the following repository groups:
 
@@ -57,7 +57,7 @@ This code is versioned and the latest release can be obtained from
 
 Memory Packages                | Package Description
 -------------------------------|---------------------
-org.apache.datasketches.memory               | Low level, high-performance Memory data-structure management primarily for off-heap. 
+org.apache.datasketches.memory | Low level, high-performance Memory data-structure management primarily for off-heap. 
 
 
 ### incubator-datasketches-hive
@@ -97,22 +97,34 @@ org.apache.datasketches.pig.theta       | Pig UDFs for Theta sketches
 org.apache.datasketches.pig.tuple       | Pig UDFs for Tuple sketches
 
 
-
 ### incubator-datasketches-characterization
-This relatively new repository is for code that we use to characterize the accuracy and speed performance of the sketches in 
-the library and is constantly being updated.  Examples of the job command files used for various tests can be found in the src/main/resources directory.
-Some of these tests can run for hours depending on its configuration.
+This relatively new repository is for Java and C++ code that we use to characterize the accuracy and speed performance of the sketches in 
+the library and is constantly being updated.  Examples of the job command files used for various tests can be found in the src/main/resources directory. Some of these tests can run for hours depending on its configuration.
 
 Characterization Packages                       | Package Description
 ------------------------------------------------|---------------------
 org.apache.datasketches.characterization             | Common functions and utilities
+org.apache.datasketches.characterization.concurrent  | Concurrent Theta Sketch
+org.apache.datasketches.characterization.cpc         | Compressed Probabilistic Counting Sketch
+org.apache.datasketches.characterization.fdt         | Frequent Distinct Tuples Sketch
+org.apache.datasketches.characterization.frequencies | Frequent Items Sketches
 org.apache.datasketches.characterization.hash        | Hash function performance
+org.apache.datasketches.characterization.hll         | HyperLogLog Sketcch
 org.apache.datasketches.characterization.memory      | Memory performance
-org.apache.datasketches.characterization.quantiles.  | Quantiles performance
-org.apache.datasketches.characterization.uniquecount | Performance of Theta and HLL sketches
+org.apache.datasketches.characterization.quantiles   | Quantiles performance
+org.apache.datasketches.characterization.theta       | Theta Sketch
+org.apache.datasketches.characterization.uniquecount | Base Profiles for Unique Counting Sketches
+
+#### C++ Characterizations
+* CPC
+* Frequent Items
+* HLL
+* KLL
+* Theta
+
 
 ### incubator-datasketches-vector
-This is a new repository dedicated to sketches for vector and matrix operations. It is still somewhat experimental.
+This component implements the [Frequent Directions Algorithm](https://datasketches.apache.org/docs/Research.html) [GLP16].  It is still experimental in that the theoretical work has not yet supplied a suitable measure of error for production work. It can be used as is, but it will not go through a formal Apache Release until we can find a way to provide better error properties.  It has a dependence on the Memory component.
 
 
 ## C++ and Python
@@ -125,69 +137,13 @@ In other words, a sketch created and stored in C++ can be opened and read in Jav
 This site also has our Python adaptors that basically wrap the C++ implementations, 
 making the high performance C++ implementations available from Python.
 
-### incubator-datasketches-postgres
+### incubator-datasketches-postgresql
 This site provides the postgres-specific adaptors that wrap the C++ implementations making
-them available to the Postgres database users.
-
-
-## Web Site
-
-### incubator-datasketches-website
-This is the DataSketches web site source, and is constantly being updated with new material 
-and to be current with the GitHub master.
-This site is not versioned.
+them available to the PostgreSQL database users. PostgreSQL users should download the PostgreSQL extension from [pgxn.org](https://pgxn.org/dist/datasketches/).  For examples refer to the README on the component site.
 
-## Command-Line Tool
-These repositories provide a command-line tool that provides access to the following sketches:
-- Frequent Items
-- HLL
-- Quantiles
-- Reservoir Sampling
-- Theta Sketches
-- VarOpt Sampling
 
-This tool can be installed from Homebrew.
 
-### sketches-cmd
 
-### homebrew-sketches
 
-### homebrew-sketches-cmd
 
 
-## Deprecated sites
-The code in these sites are no longer maintained and will eventually be removed.
-
-### sketches-android
-This is a new repository dedicated to sketches designed to be run in a mobile client, such as a cell phone. 
-It is still in development and should be considered experimental.
-
-### experimental
-This repository is an experimental staging area for code that will eventually end up in another 
-repository. This code is not versioned.
-
-
-### sketches-misc
-Demos, command-line access, characterization testing and other code not related to production 
-deployment.
-
-This code is offered "as is" and primarily as a reference so that users can understand how some of 
-the performance characterization plots were obtained. This code has few unit tests, if any, 
-and was never intended for production use. 
-Nonetheless, some folks have found it useful. If you find it useful, go for it. 
-This code is not versioned.
-    
-Sketches-misc Packages             | Package Description
------------------------------------|---------------------
-org.apache.datasketches                 | Utility functions used by the sketches-misc packages
-org.apache.datasketches.cmd             | Support for Command Line functions **Being Redesigned**
-org.apache.datasketches.demo            | Simple demo for brute-force vs Theta and HLL sketches **Will be superceded by Command Line functions**
-org.apache.datasketches.quantiles       | Utility for computing & printing space table for Quantiles Sketches (only in the test branch)
-org.apache.datasketches.sampling        | Benchmarks and Entropy testing for sampling sketches
-
-### characterization-cpp
-This is the parallel characterization repository with a parallel objective to the Java characterization repository.
-
-### experimental-cpp
-This repository is an experimental staging area for C++ code that will eventually end up in another 
-repository.  
diff --git a/docs/Architecture/Components.md b/docs/Architecture/Components.md
new file mode 100644
index 0000000..7154f44
--- /dev/null
+++ b/docs/Architecture/Components.md
@@ -0,0 +1,94 @@
+---
+layout: doc_page
+---
+# Apache DataSketches GitHub Components
+
+Our library is made up of components that are partitioned into GitHub repositories by language and dependencies. The dependencies of the core components are kept to a bare minimum to enable flexible integration into many different environments. Meanwhile, the Hive and Pig components, for example, have major dependencies on those envionments. 
+
+If you have a specific issue or bug report that impacts only one of these components please open an issue on the respective component. If you are a developer and wish to submit a PR, please choose the appropriate repository.
+
+## Core Algorithms
+If you like what you see give us a **Star** on one of these two sites!
+
+* **[Java](https://https://github.com/apache/incubator-datasketches-java)** (Versioned, Apache Released) This is the original and the most comprehensive collection of sketch algorithms. It has a dependence on the Memory component and the Java Adaptors have a dependence on this component. 
+
+* **[C++/Python](https://github.com/apache/incubator-datasketches-cpp)** (Versioned, Apache Released) This is newer and provides most of the major algorithms available in Java.  Our C++ adaptors have a dependence on this component.  The Pybind adaptors for Python are included here for all the C++ sketches.
+
+## Adapters
+Apapters integrate the core components into the aggregation APIs of specific data processing systems. Some of these adapters are available as part of the library, other adapters may be directly integrated into the specific data processing system.
+
+### Java Adaptors
+* **[Apache Hive](https://https://github.com/apache/incubator-datasketches-hive)** (Versioned, Apache Released)
+    * [Theta Sketch Example]({{site.docs_dir}}/Theta/ThetaHiveUDFs.html)
+    * [Tuple Sketch Example]({{site.docs_dir}}/Tuple/TuplePigUDFs.html)
+* **[Apache Pig](https://https://github.com/apache/incubator-datasketches-pig)** (Versioned, Apache Released)
+    * [Theta Sketch Example]({{site.docs_dir}}/Theta/ThetaPigUDFs.html)
+    * [Tuple Sketch Example]({{site.docs_dir}}/Tuple/TuplePigUDFs.html) 
+* **[Apache Druid](https://github.com/druid-io/druid/tree/master/extensions-core/datasketches)** (Apach Released as part of Druid)
+
+### C++ Adaptors
+* **[PostgreSQL](https://github.com/apache/incubator-datasketches-postgresql)** (Versioned, Apache Released)
+This site provides the postgres-specific adaptors that wrap the C++ implementations making
+them available to the PostgreSQL database users. PostgreSQL users should download the PostgreSQL extension from [pgxn.org](https://pgxn.org/dist/datasketches/).  For examples refer to the README on the component site.
+
+## Other Components
+* **[Memory](https://github.com/apache/incubator-datasketches-memory):** (Versioned, Apache Released) This is a low-level library that enables fast access to off-heap memory for Java.
+* **[Characterization](https://github.com/apache/incubator-datasketches-characterization):** This is a collection of Java and C++ code that we use for long-running studies of accuracy and speed performance over many different parameters. Feel free to run these tests to reproduce many of the graphs and charts you see on our website.
+* **[Vector (Experimental)](https://github.com/apache/incubator-datasketches-vector):** This component implements the [Frequent Directions Algorithm](https://datasketches.apache.org/docs/Research.html) [GLP16].  It is still experimental in that the theoretical work has not yet supplied a suitable measure of error for production work. It can be used as is, but it will not go through a formal Apache Release until we can find a way to provide better error properties.  It has a dependence on [...]
+* **[Website](https://github.com/apache/incubator-datasketches-website):** This repository is the home of our website and is constantly being updated with new material.
+
+
+
+## Deprecated Components
+The code in these components are no longer maintained and will eventually be removed.
+
+### sketches-android
+This is a new repository dedicated to sketches designed to be run in a mobile client, such as a cell phone. 
+It is still in development and should be considered experimental.
+
+### experimental
+This repository is an experimental staging area for code that will eventually end up in another 
+repository. This code is not versioned.
+
+
+### sketches-misc
+Demos, command-line access, characterization testing and other code not related to production 
+deployment.
+
+This code is offered "as is" and primarily as a reference so that users can understand how some of 
+the performance characterization plots were obtained. This code has few unit tests, if any, 
+and was never intended for production use. 
+Nonetheless, some folks have found it useful. If you find it useful, go for it. 
+This code is not versioned.
+    
+Sketches-misc Packages             | Package Description
+-----------------------------------|---------------------
+org.apache.datasketches                 | Utility functions used by the sketches-misc packages
+org.apache.datasketches.cmd             | Support for Command Line functions **Being Redesigned**
+org.apache.datasketches.demo            | Simple demo for brute-force vs Theta and HLL sketches **Will be superceded by Command Line functions**
+org.apache.datasketches.quantiles       | Utility for computing & printing space table for Quantiles Sketches (only in the test branch)
+org.apache.datasketches.sampling        | Benchmarks and Entropy testing for sampling sketches
+
+### characterization-cpp
+This is the parallel characterization repository with a parallel objective to the Java characterization repository.
+
+### experimental-cpp
+This repository is an experimental staging area for C++ code that will eventually end up in another 
+repository.
+
+### Command-Line Tool
+These repositories provide a command-line tool that provides access to the following sketches:
+- Frequent Items
+- HLL
+- Quantiles
+- Reservoir Sampling
+- Theta Sketches
+- VarOpt Sampling
+
+This tool can be installed from Homebrew.
+
+#### sketches-cmd
+
+#### homebrew-sketches
+
+#### homebrew-sketches-cmd
\ No newline at end of file
diff --git a/docs/Architecture.md b/docs/Architecture/SketchesByComponent.md
similarity index 68%
copy from docs/Architecture.md
copy to docs/Architecture/SketchesByComponent.md
index e68bf80..4600486 100644
--- a/docs/Architecture.md
+++ b/docs/Architecture/SketchesByComponent.md
@@ -19,7 +19,7 @@ layout: doc_page
     specific language governing permissions and limitations
     under the License.
 -->
-# Architecture
+# Sketches by Component
 
 The DataSketches Library is organized into the following repository groups:
 
@@ -57,7 +57,7 @@ This code is versioned and the latest release can be obtained from
 
 Memory Packages                | Package Description
 -------------------------------|---------------------
-org.apache.datasketches.memory               | Low level, high-performance Memory data-structure management primarily for off-heap. 
+org.apache.datasketches.memory | Low level, high-performance Memory data-structure management primarily for off-heap. 
 
 
 ### incubator-datasketches-hive
@@ -97,22 +97,34 @@ org.apache.datasketches.pig.theta       | Pig UDFs for Theta sketches
 org.apache.datasketches.pig.tuple       | Pig UDFs for Tuple sketches
 
 
-
 ### incubator-datasketches-characterization
-This relatively new repository is for code that we use to characterize the accuracy and speed performance of the sketches in 
-the library and is constantly being updated.  Examples of the job command files used for various tests can be found in the src/main/resources directory.
-Some of these tests can run for hours depending on its configuration.
+This relatively new repository is for Java and C++ code that we use to characterize the accuracy and speed performance of the sketches in 
+the library and is constantly being updated.  Examples of the job command files used for various tests can be found in the src/main/resources directory. Some of these tests can run for hours depending on its configuration.
 
 Characterization Packages                       | Package Description
 ------------------------------------------------|---------------------
 org.apache.datasketches.characterization             | Common functions and utilities
+org.apache.datasketches.characterization.concurrent  | Concurrent Theta Sketch
+org.apache.datasketches.characterization.cpc         | Compressed Probabilistic Counting Sketch
+org.apache.datasketches.characterization.fdt         | Frequent Distinct Tuples Sketch
+org.apache.datasketches.characterization.frequencies | Frequent Items Sketches
 org.apache.datasketches.characterization.hash        | Hash function performance
+org.apache.datasketches.characterization.hll         | HyperLogLog Sketcch
 org.apache.datasketches.characterization.memory      | Memory performance
-org.apache.datasketches.characterization.quantiles.  | Quantiles performance
-org.apache.datasketches.characterization.uniquecount | Performance of Theta and HLL sketches
+org.apache.datasketches.characterization.quantiles   | Quantiles performance
+org.apache.datasketches.characterization.theta       | Theta Sketch
+org.apache.datasketches.characterization.uniquecount | Base Profiles for Unique Counting Sketches
+
+#### C++ Characterizations
+* CPC
+* Frequent Items
+* HLL
+* KLL
+* Theta
+
 
 ### incubator-datasketches-vector
-This is a new repository dedicated to sketches for vector and matrix operations. It is still somewhat experimental.
+This component implements the [Frequent Directions Algorithm](https://datasketches.apache.org/docs/Research.html) [GLP16].  It is still experimental in that the theoretical work has not yet supplied a suitable measure of error for production work. It can be used as is, but it will not go through a formal Apache Release until we can find a way to provide better error properties.  It has a dependence on the Memory component.
 
 
 ## C++ and Python
@@ -125,69 +137,13 @@ In other words, a sketch created and stored in C++ can be opened and read in Jav
 This site also has our Python adaptors that basically wrap the C++ implementations, 
 making the high performance C++ implementations available from Python.
 
-### incubator-datasketches-postgres
+### incubator-datasketches-postgresql
 This site provides the postgres-specific adaptors that wrap the C++ implementations making
-them available to the Postgres database users.
-
-
-## Web Site
-
-### incubator-datasketches-website
-This is the DataSketches web site source, and is constantly being updated with new material 
-and to be current with the GitHub master.
-This site is not versioned.
+them available to the PostgreSQL database users. PostgreSQL users should download the PostgreSQL extension from [pgxn.org](https://pgxn.org/dist/datasketches/).  For examples refer to the README on the component site.
 
-## Command-Line Tool
-These repositories provide a command-line tool that provides access to the following sketches:
-- Frequent Items
-- HLL
-- Quantiles
-- Reservoir Sampling
-- Theta Sketches
-- VarOpt Sampling
 
-This tool can be installed from Homebrew.
 
-### sketches-cmd
 
-### homebrew-sketches
 
-### homebrew-sketches-cmd
 
 
-## Deprecated sites
-The code in these sites are no longer maintained and will eventually be removed.
-
-### sketches-android
-This is a new repository dedicated to sketches designed to be run in a mobile client, such as a cell phone. 
-It is still in development and should be considered experimental.
-
-### experimental
-This repository is an experimental staging area for code that will eventually end up in another 
-repository. This code is not versioned.
-
-
-### sketches-misc
-Demos, command-line access, characterization testing and other code not related to production 
-deployment.
-
-This code is offered "as is" and primarily as a reference so that users can understand how some of 
-the performance characterization plots were obtained. This code has few unit tests, if any, 
-and was never intended for production use. 
-Nonetheless, some folks have found it useful. If you find it useful, go for it. 
-This code is not versioned.
-    
-Sketches-misc Packages             | Package Description
------------------------------------|---------------------
-org.apache.datasketches                 | Utility functions used by the sketches-misc packages
-org.apache.datasketches.cmd             | Support for Command Line functions **Being Redesigned**
-org.apache.datasketches.demo            | Simple demo for brute-force vs Theta and HLL sketches **Will be superceded by Command Line functions**
-org.apache.datasketches.quantiles       | Utility for computing & printing space table for Quantiles Sketches (only in the test branch)
-org.apache.datasketches.sampling        | Benchmarks and Entropy testing for sampling sketches
-
-### characterization-cpp
-This is the parallel characterization repository with a parallel objective to the Java characterization repository.
-
-### experimental-cpp
-This repository is an experimental staging area for C++ code that will eventually end up in another 
-repository.  
diff --git a/docs/SketchCriteria.md b/docs/SketchCriteria.md
new file mode 100644
index 0000000..3002fde
--- /dev/null
+++ b/docs/SketchCriteria.md
@@ -0,0 +1,59 @@
+---
+layout: doc_page
+---
+
+# Sketch Criteria for Library Inclusion
+
+There are lots of clever and useful algorithms that are sometimes called "sketches".  However, due to limited resources, in order to be included in the DataSketches library, we had to clearly define what we meant by the term "sketch".  Otherwise, we would end up with a hodge podge of algorithms and have to answer: Why don't we include algorithm X?.
+
+In order to be in our library, a *Sketch* must exhibit these properties:
+
+## Streaming / One-Touch 
+Sketches are a class of streaming algorithms by definition, which means they only touch or process each item in a stream once.  This is absolutely required for real-time applications.
+
+## Small in Size
+One of the key properties of any sketch is that it is a synopsis or summary of a much larger data set.  The whole point of a small summary is that it is faster to read and merge.  In this context, *small* means small with respect to the original data.  If the original data is terabytes in size, a single sketch of 100KB may not seem very different from a sketch of 50KB as both are very small compared to the original data.  
+
+But *small* can also be important in an systems context. If that original terabyte of data generates 10,000 sketches, each sketch consuming 100KB, that amounts to a GB of storage.  Now the total memory use starts to be a concern.  Being able to reduce that by 50% by using a smaller (and otherwise equivalent) sketch can be a big deal.
+
+Nonetheless, *small* is relevant to the specific application. Sketches can very from a few bytes to many megabytes depending on the specific sketch and how it has been configured. Whether it is small enough is up to the system engineers to determine. 
+
+## Sublinear in Size Growth
+Not only should a sketch start small, it needs to stay small as the size of the input stream grows.  Some sketches have an upper bound of size independent of the size of the input stream, which clearly makes them sublinear.  Other sketches may need to continue to increase their size as the stream grows.  For these sketches it is important that they do so very very slowly. They should grow sublinearly by no more then *O(log(n))* or preferrably by *O(k log(n/k))* or less.
+
+## Mergeable
+In order to be useful in large distributed computing environments, the sketches must be mergeable without additional loss of accuracy.  This is defined as
+
+<p style="text-align: center;"><i>sketch(A + B) &asymp; sketch(A) U sketch(B),</i></p>
+
+where<br>
+&nbsp;&nbsp; "+" = concatination of streams A and B,<br>
+&nbsp;&nbsp; "U" = merge or union,<br>
+&nbsp;&nbsp; "&asymp;" = approximately equal within the error bounds of the sketch.
+
+### Mergeable With Different Size Parameters
+In addition to just being mergeable, sketches used in production environments must be mergeable with different sizing parameters.
+ 
+In many production applications sketches might be stored for years because they are so much smaller than keeping the original data around, and orders-of-magnitude faster to merge.
+
+Imagine an organization that has saved its sketches for several years with one size/accuracy parameter, then changes its policy about the size/accuracy needed going forward.  Unless the two differently configured sketches can be merged successfully (even if the accuracy degrades to the lower of the two configurations), the data from the older sketch data would be essentially lost. The only other alternative would require reprocessing all the old original data -- if it even exists!
+
+## Data Insensitive
+In many real production environments the data that needs to be processed is ugly. There are often missing or troublesome values in the stream.  It is naive to expect, for example, that a stream of integer time-spent values does not contain zeros or negative values.  In this case, an algorithm that makes the assumption that all the input values are always positive is a very fragile algorithm.  In a real-time streaming application, if the algorithm returns horrible answers because there ha [...]
+
+There are many types of Data Insensitivity where the sketch should return meaningful results, within the specified error bounds:
+
+* **Order Insensitive:** The sketch results should be independent of the order that the items are presented to the sketch. 
+* **Distribution Insensitive:** The sketch results should be independent of how the data within the stream is distributed.  For example, the data might be distrubuted as Gaussian, Zipf, lognormal, power-law, or whatever.
+* **Value Insensitive:** The sketch results should be independent of exceptional values in the stream.  For example, if the stream consists of double values, the sketch should be able to handle NaNs and Infinities in a sensible way.
+
+There are practical limits to data insensitivity, for example, A sketch designed for handling *double* values should not be expected to handle *strings* or arbitrary objects in the same stream.
+
+It is important that the sketch developers clearly document the data insensitivities that the sketch is designed to handle.
+
+## Mathematically Proven Error and Size Properties
+Sketch algorithms must have an openly published and reviewed theoretical basis for their operation including their error and merging properties.  Empirical algorithms do not qualify.
+
+### Meaningful and Usable Error Bounds.
+It is not sufficient that some algorithm has been published in a scientific paper. There are published and reviewed sketch algorithms that use error definitions that may be interesting theoretically, but very misleading in practice.  For example, if the authors of a paper define their measure of error to be the average error over a distribution of values (<i>L<sub>1</sub> error</i>), this it not what most users would expect.  Given a query, the user has no idea what part of the distribut [...]
+
diff --git a/img/Yahoo_white_small.png b/img/Yahoo_white_small.png
deleted file mode 100755
index 4a85735..0000000
Binary files a/img/Yahoo_white_small.png and /dev/null differ
diff --git a/index.md b/index.md
index ce45394..f0c2441 100644
--- a/index.md
+++ b/index.md
@@ -42,7 +42,7 @@ id: home
         <a class="btn btn-lg btn-outline-inverse" href="/docs/downloads.html"><span class="fa fa-download"></span> Download</a>
         <a class="btn btn-lg btn-outline-inverse" href="https://github.com/apache?utf8=%E2%9C%93&q=datasketches"><span class="fa fa-github"></span> GitHub</a>
         <a class="btn btn-lg btn-outline-inverse" href="/docs/Research.html"><span class="fa fa-paper-plane"></span> Research</a>
-        <a class="btn btn-lg btn-outline-inverse" href="https://groups.google.com/forum/#!forum/sketches-user"><span class="fa fa-comment"></span> Forum</a>
+        <a class="btn btn-lg btn-outline-inverse" href="https://lists.apache.org/list.html?users@datasketches.apache.org"><span class="fa fa-comment"></span> Contact Us</a>
       </p>
     </div>
   </div>
diff --git a/src/main/java/org/apache/datasketches/docgen/TocGenerator.java b/src/main/java/org/apache/datasketches/docgen/TocGenerator.java
index 00be18d..539b387 100644
--- a/src/main/java/org/apache/datasketches/docgen/TocGenerator.java
+++ b/src/main/java/org/apache/datasketches/docgen/TocGenerator.java
@@ -53,7 +53,7 @@ public class TocGenerator {
    * </ol>
    * @author Lee Rhodes
    */
-    @Test
+    //@Test
     public static void runTocGenerator() {
       final String jsonSrcFile = "src/main/resources/docgen/toc.json";
       final String htmlScriptFile = "src/main/resources/docgen/tocScript.html";
diff --git a/src/main/resources/docgen/toc.json b/src/main/resources/docgen/toc.json
index e3466e7..09ab748 100644
--- a/src/main/resources/docgen/toc.json
+++ b/src/main/resources/docgen/toc.json
@@ -9,7 +9,13 @@
         {"class":"Doc",  "desc" : "Key Features",                 "dir" : "", "file": "KeyFeatures" },
         {"class":"Doc",  "desc" : "Large Scale Computing",        "dir" : "", "file": "LargeScale" },
         {"class":"Doc",  "desc" : "Architecture",                 "dir" : "", "file": "Architecture" },
-        {"class":"Doc",  "desc" : "Notes on Order Sensitivity",   "dir" : "", "file": "OrderSensitivity" },
+        { "class":"Dropdown", "desc" : "Architecture", "array":
+          [
+            {"class":"Doc",  "desc" : "Components",              "dir" : "Architecture", "file": "Components" },
+            {"class":"Doc",  "desc" : "Sketches by Component",   "dir" : "Architecture", "file": "SketchesByComponent" },
+          ]
+        },
+		{"class":"Doc",  "desc" : "Notes on Order Sensitivity",   "dir" : "", "file": "OrderSensitivity" },
         {"class":"Doc",  "desc" : "Notes on Concurrency",         "dir" : "", "file": "Concurrency" },
         {"class":"Doc",  "desc" : "Overview Slide Deck",          "dir" : "", "file": "DataSketches_deck", "pdf":"true" },
       ]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org