You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by jl...@apache.org on 2019/01/31 17:51:04 UTC

[incubator-pinot] branch master updated: Documentation review on Pinot Overview (#3762)

This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git


The following commit(s) were added to refs/heads/master by this push:
     new 94b34e2  Documentation review on Pinot Overview (#3762)
94b34e2 is described below

commit 94b34e20e734769cfcc6e2ddf2bdf4ebe77131d9
Author: Jialiang Li <jl...@linkedin.com>
AuthorDate: Thu Jan 31 09:50:58 2019 -0800

    Documentation review on Pinot Overview (#3762)
    
    * Documentation review on Pinot Overview
---
 README.md | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/README.md b/README.md
index 790cb5e..cb861cc 100644
--- a/README.md
+++ b/README.md
@@ -7,9 +7,9 @@ Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to del
 These presentations on Pinot give an overview of Pinot and how it is used at LinkedIn:
 
 * [Pinot: Realtime Distributed OLAP Datastore (Aug 2015)](http://www.slideshare.net/KishoreGopalakrishna/pinot-realtime-distributed-olap-datastore)
-* [Introduction to Pinot (Jan 2016)](http://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104) 
+* [Introduction to Pinot (Jan 2016)](http://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104)
 * [Open Source Analytics Pipeline at LinkedIn (Sep 2016, covers Gobblin and Pinot)](http://www.slideshare.net/IssacBuenrostro/open-source-linkedin-analytics-pipeline-vldb-2016)
-* [Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018 (June 2018)](http://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584)
+* [Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018 (Jun 2018)](http://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584)
 
 Looking for the ThirdEye anomaly detection and root-cause analysis platform? Check out the [Pinot/ThirdEye project](https://github.com/linkedin/pinot/tree/master/thirdeye)
 
@@ -21,33 +21,33 @@ Pinot is well suited for analytical use cases on immutable append-only data that
 
 - A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
 - Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index
-- Ability to optimize query/execution plan based on query and segment metadata . 
+- Ability to optimize query/execution plan based on query and segment metadata
 - Near real time ingestion from Kafka and batch ingestion from Hadoop
-- SQL like language that supports _selection, aggregation, filtering, group by, order by, distinct_ queries on fact data.
+- SQL like language that supports _selection, aggregation, filtering, group by, order by, distinct_ queries on fact data
 - Support for multivalued fields
 - Horizontally scalable and fault tolerant 
 
 Because of the design choices we made to achieve these goals, there are certain limitations present in Pinot:
 
 - Pinot is not a replacement for database i.e it cannot be used as source of truth store, cannot mutate data 
-- Not a replacement for search engine i.e Full text search, relevance not supported
-- Query cannot span across multiple tables. 
+- Not a replacement for search engine i.e full text search, relevance not supported
+- Query cannot span across multiple tables
 
 Pinot works very well for querying time series data with lots of Dimensions and Metrics. Example - Query (profile views, ad campaign performance, etc.) in an analytical fashion (who viewed this profile in the last weeks, how many ads were clicked per campaign). 
 
 ## Terminology
 
-Before we get to quick start, lets go over the terminology. 
+Before we get to quick start, let's go over the terminology.
 - Table: A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (Document). Table Schema defines column names and their metadata.
 - Segment: A logical table is divided into multiple physical units referred to as segments.
 
 Pinot has following Roles/Components:
 
-- Pinot Controller: Manages nodes in the cluster. Responsibilities :
-  * Handles all Create, Update, Delete operations on Tables and Segments.
-  * Computes assignment of Table and its segments to Pinot Servers.  
-- Pinot Server: Hosts one or more physical segments. Responsibilities: -
-  * When assigned a pre created segment, download it and load it. If assigned a Kafka topic, start consuming from a sub set of partitions in Kafka.
+- Pinot Controller: Manages nodes in the cluster. Responsibilities:
+  * Handles all Create, Update, Delete operations on tables and segments.
+  * Computes assignment of table and its segments to Pinot Servers.
+- Pinot Server: Hosts one or more physical segments. Responsibilities:
+  * When assigned a pre-created segment, download it and load it. If assigned a Kafka topic, start consuming from a subset of partitions in Kafka.
   * Executes queries and returns the response to Pinot Broker.
 - Pinot Broker: Accepts queries from clients and routes them to multiple servers (based of routing strategy). All responses are merged and sent back to client.
 
@@ -73,11 +73,11 @@ chmod +x bin/*.sh
 
 ### 2: Run
 
-We will load BaseBall stats from 1878 to 2013 into Pinot and run queries against it. There are 100000 records and 15 columns ([schema](https://github.com/linkedin/pinot/blob/master/pinot-tools/src/main/resources/sample_data/baseballStats_schema.json)) in this dataset.
+We will load Baseball stats from 1878 to 2013 into Pinot and run queries against it. There are 100,000 records and 15 columns ([schema](https://github.com/linkedin/pinot/blob/master/pinot-tools/src/main/resources/sample_data/baseballStats_schema.json)) in this dataset.
 
 Execute the quick-start-offline.sh script in bin folder which performs the following:
-- Converts Baseball data in CSV format into Pinot Index Segments.
-- Starts Pinot components, Zookeeper, Controller, Broker, Server.
+- Converts Baseball data from CSV format into Pinot Index Segments.
+- Starts Zookeeper and Pinot components, i.e Controller, Broker, Server.
 - Uploads segment to Pinot
 
 If you have Docker, run `docker run -it -p 9000:9000 linkedin/pinot-quickstart-offline`. If you have built Pinot, run `bin/quick-start-offline.sh`.
@@ -116,19 +116,19 @@ select playerName, runs, homeRuns from baseballStats order by yearID limit 10
 
 ### 3: Pinot Data Explorer
 
-There are 3 ways to [interact](https://github.com/linkedin/pinot/wiki/Pinot-Client-API) with Pinot - simple web interface, REST api and java client. Open your browser and go to http://localhost:9000/query/ and run any of the queries provided above. See [Pinot Query Syntax](https://github.com/linkedin/pinot/wiki/Pinot-Query-Language-Examples) for more info.
+There are 3 ways to [interact](https://github.com/linkedin/pinot/wiki/Pinot-Client-API) with Pinot - simple web interface, REST api and java client. Open your browser and go to http://localhost:9000/query/ and run any of the queries provided above. See [Pinot Query Syntax](https://github.com/linkedin/pinot/wiki/Pinot-Query-Language) for more info.
 
 *** 
 ## Realtime quick start
 
 There are two ways to ingest data into Pinot - batch and realtime. Previous baseball stats demonstrated ingestion in batch. Typically these batch jobs are run on Hadoop periodically (e.g every hour/day/week/month). Data freshness depends on job granularity. 
 
-Lets look at an example where we ingest data in realtime. We will subscribe to meetup.com rsvp feed and index the rsvp events in real time. 
+Let's look at an example where we ingest data in realtime. We will subscribe to meetup.com rsvp feed and index the rsvp events in real time.
 Execute quick-start-realtime.sh script in bin folder which performs the following:
-- start a kafka broker 
-- setup a meetup event listener that subscribes to meetup.com stream and publishes it to local kafka broker
-- start zookeeper, pinot controller, pinot broker, pinot-server.
-- configure the realtime source 
+- Starts a kafka broker
+- Setups a meetup event listener that subscribes to meetup.com stream and publishes it to local kafka broker
+- Starts zookeeper, pinot-controller, pinot-broker, pinot-server.
+- Configures the realtime source
 
 If you have Docker, run `docker run -it -p 9000:9000 linkedin/pinot-quickstart-realtime`. If you have built Pinot, run `bin/quick-start-realtime.sh`.
 
@@ -141,7 +141,7 @@ Realtime quick start setup complete
 Starting meetup data stream and publishing to kafka
 ```
 
-Open Pinot Query Console at http://localhost:9000/query and run queries. Here are some sample queries
+Open Pinot Query Console at http://localhost:9000/query and run queries. Here are some sample queries:
 
 ```sql
 /*Total number of documents in the table*/
@@ -160,7 +160,7 @@ select sum(rsvp_count) from meetupRsvp group by event_name top 10
 
 ## Pinot usage
 
-At LinkedIn, it powers more than 50+ applications such as  Who Viewed My Profile, Who Viewed My Jobs and many more, with interactive-level response times. Pinot ingests close to a Billion per day in real time and processes 100 million queries per day.
+At LinkedIn, it powers more than 50+ applications such as  Who Viewed My Profile, Who Viewed My Jobs and many more, with interactive-level response times. Pinot ingests close to a billion per day in real time and processes 100 million queries per day.
 
 ## Discussion Group
 Please join or post questions to this group. 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org