You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by ja...@apache.org on 2019/01/29 21:39:45 UTC
[incubator-pinot] branch master updated: Add Documents for Index Techniques (#3761)

This is an automated email from the ASF dual-hosted git repository.

jackie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git


The following commit(s) were added to refs/heads/master by this push:
     new 4a1c373  Add Documents for Index Techniques (#3761)
4a1c373 is described below

commit 4a1c3732ced819fe277afbf298967149b50574cb
Author: Xiaotian (Jackie) Jiang <17...@users.noreply.github.com>
AuthorDate: Tue Jan 29 13:39:40 2019 -0800

    Add Documents for Index Techniques (#3761)
---
 docs/index_techniques.rst | 56 +++++++++++++++++++++++++++++++++++++++++++++++
 docs/reference.rst        |  2 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/docs/index_techniques.rst b/docs/index_techniques.rst
new file mode 100644
index 0000000..3202879
--- /dev/null
+++ b/docs/index_techniques.rst
@@ -0,0 +1,56 @@
+.. TODO: add more details
+
+Index Techniques
+================
+
+Pinot currently supports the following index techniques, where each of them have their own advantages in different query
+scenarios.
+
+Forward Index
+-------------
+
+Dictionary-Encoded Forward Index with Bit Compression
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For each unique value from a column, we assign an id to it, and build a dictionary from the id to the value. Then in the
+forward index, we only store the bit-compressed ids instead of the values.
+
+With few number of unique values, dictionary-encoding can significantly improve the space efficiency of the storage.
+
+Raw Value Forward Index
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In contrast to the dictionary-encoded forward index, raw value forward index directly stores values instead of ids.
+
+Without the dictionary, the dictionary lookup step can be skipped for each value fetch. Also, the index can take
+advantage of the good locality of the values, thus improve the performance of scanning large number of values.
+
+Sorted Forward Index with Run-Length Encoding
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On top of the dictionary-encoding, all the values are sorted, so sorted forward index has the advantages of both good
+compression and data locality.
+
+Sorted forward index can also be used as inverted index.
+
+Inverted Index (only available with dictionary-encoded indexes)
+---------------------------------------------------------------
+
+Bitmap Inverted Index
+~~~~~~~~~~~~~~~~~~~~~
+
+Pinot maintains a map from each value to a bitmap, which makes value lookup to be constant time.
+
+Sorted Inverted Index
+~~~~~~~~~~~~~~~~~~~~~
+Because the values are sorted, the sorted forward index can directly be used as inverted index, with constant time
+lookup and good data locality.
+
+Advanced Index
+--------------
+
+Star-Tree Index
+~~~~~~~~~~~~~~~
+
+Unlike other index techniques which work on single column, Star-Tree index is built on multiple columns, and utilize the
+pre-aggregated results to significantly reduce the number of values to be processed, thus improve the query performance.
diff --git a/docs/reference.rst b/docs/reference.rst
index 2171119..8eaccdf 100644
--- a/docs/reference.rst
+++ b/docs/reference.rst
@@ -4,7 +4,7 @@
    :maxdepth: 1
 
    pql_examples
+   index_techniques
    client_api
    management_api
    pinot_hadoop
-


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org