You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by sn...@apache.org on 2019/02/04 18:05:07 UTC

[incubator-pinot] branch master updated: Updating README.md (#3784)

This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git


The following commit(s) were added to refs/heads/master by this push:
     new 103bc78  Updating README.md (#3784)
103bc78 is described below

commit 103bc783792c3f93be0675927c272a36879ee6ef
Author: Seunghyun Lee <sn...@linkedin.com>
AuthorDate: Mon Feb 4 10:04:59 2019 -0800

    Updating README.md (#3784)
---
 README.md                             | 167 ++++++----------------------------
 docs/trying_pinot.rst                 |   6 +-
 pinot-distribution/pinot-assembly.xml |   2 +-
 3 files changed, 33 insertions(+), 142 deletions(-)

diff --git a/README.md b/README.md
index cb861cc..898463e 100644
--- a/README.md
+++ b/README.md
@@ -1,23 +1,19 @@
-# Introduction to Pinot
+# Apache Pinot (incubating)
 
 [![Build Status](https://api.travis-ci.org/apache/incubator-pinot.svg?branch=master)](https://travis-ci.org/apache/incubator-pinot) [![codecov.io](https://codecov.io/github/linkedin/pinot/branch/master/graph/badge.svg)](https://codecov.io/github/linkedin/pinot) [![Join the chat at https://gitter.im/linkedin/pinot](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/linkedin/pinot?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![license](https://img.s [...]
 
-Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
+Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
 
-These presentations on Pinot give an overview of Pinot and how it is used at LinkedIn:
+These presentations on Pinot give an overview of Pinot:
 
 * [Pinot: Realtime Distributed OLAP Datastore (Aug 2015)](http://www.slideshare.net/KishoreGopalakrishna/pinot-realtime-distributed-olap-datastore)
 * [Introduction to Pinot (Jan 2016)](http://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104)
 * [Open Source Analytics Pipeline at LinkedIn (Sep 2016, covers Gobblin and Pinot)](http://www.slideshare.net/IssacBuenrostro/open-source-linkedin-analytics-pipeline-vldb-2016)
 * [Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018 (Jun 2018)](http://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584)
 
-Looking for the ThirdEye anomaly detection and root-cause analysis platform? Check out the [Pinot/ThirdEye project](https://github.com/linkedin/pinot/tree/master/thirdeye)
+Looking for the ThirdEye anomaly detection and root-cause analysis platform? Check out the [Pinot/ThirdEye project](https://github.com/apache/incubator-pinot/tree/master/thirdeye)
 
-## What is it for (and not)?
-
-Pinot is well suited for analytical use cases on immutable append-only data that require low latency between an event being ingested and it being available to be queried. 
-
-### Key Features
+## Key Features
 
 - A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
 - Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index
@@ -35,139 +31,34 @@ Because of the design choices we made to achieve these goals, there are certain
 
 Pinot works very well for querying time series data with lots of Dimensions and Metrics. Example - Query (profile views, ad campaign performance, etc.) in an analytical fashion (who viewed this profile in the last weeks, how many ads were clicked per campaign). 
 
-## Terminology
-
-Before we get to quick start, let's go over the terminology.
-- Table: A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (Document). Table Schema defines column names and their metadata.
-- Segment: A logical table is divided into multiple physical units referred to as segments.
-
-Pinot has following Roles/Components:
-
-- Pinot Controller: Manages nodes in the cluster. Responsibilities:
-  * Handles all Create, Update, Delete operations on tables and segments.
-  * Computes assignment of table and its segments to Pinot Servers.
-- Pinot Server: Hosts one or more physical segments. Responsibilities:
-  * When assigned a pre-created segment, download it and load it. If assigned a Kafka topic, start consuming from a subset of partitions in Kafka.
-  * Executes queries and returns the response to Pinot Broker.
-- Pinot Broker: Accepts queries from clients and routes them to multiple servers (based of routing strategy). All responses are merged and sent back to client.
-
-Pinot leverages [Apache Helix](http://helix.apache.org) for cluster management. 
-
-For more information on Pinot Design and Architecture can be found [here](https://github.com/linkedin/pinot/wiki/Architecture)
-
-***
-
-## Quick Start 
-
-You can either build Pinot manually or use Docker to run Pinot.
-
-### 1: Build and install Pinot (optional if you have Docker installed)
-
+## Instructions to build Pinot
+More detailed instructions can be found at [Quick Demo](https://pinot.readthedocs.io/en/latest/trying_pinot.html) section in the documentation.
 ```
-git clone https://github.com/linkedin/pinot.git
-cd pinot
-mvn install package  -DskipTests
-cd pinot-distribution/target/pinot-0.016-pkg
-chmod +x bin/*.sh
-```
-
-### 2: Run
-
-We will load Baseball stats from 1878 to 2013 into Pinot and run queries against it. There are 100,000 records and 15 columns ([schema](https://github.com/linkedin/pinot/blob/master/pinot-tools/src/main/resources/sample_data/baseballStats_schema.json)) in this dataset.
+# Clone a repo
+$ git clone https://github.com/apache/incubator-pinot.git
+$ cd incubator-pinot
 
-Execute the quick-start-offline.sh script in bin folder which performs the following:
-- Converts Baseball data from CSV format into Pinot Index Segments.
-- Starts Zookeeper and Pinot components, i.e Controller, Broker, Server.
-- Uploads segment to Pinot
+# Build Pinot
+$ mvn clean install -DskipTests -Pbin-dist
 
-If you have Docker, run `docker run -it -p 9000:9000 linkedin/pinot-quickstart-offline`. If you have built Pinot, run `bin/quick-start-offline.sh`.
-
-We should see the following output.
-
-```
-Deployed Zookeeper
-Deployed controller, broker and server
-Added baseballStats schema
-Creating baseballStats table
-Built index segment for baseballStats
-Pushing segments to the controller
+# Run Quck Demo
+$ cd pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin
+$ bin/quick-start-offline.sh
 ```
 
-At this point we can post queries. Here are some of the sample queries. 
-Sample queries:
-
-```sql
-/*Total number of documents in the table*/
-select count(*) from baseballStats
-
-/*Top 5 run scorers of all time*/ 
-select sum(runs) from baseballStats group by playerName top 5
-
-/*Top 5 run scorers of the year 2000*/
-select sum(runs) from baseballStats where yearID = 2000 group by playerName top 5
-
-/*Top 10 run scorers after 2000*/
-select sum(runs) from baseballStats where yearID >= 2000 group by playerName
-
-/*Select playerName,runs,homeRuns for 10 records from the table and order them by yearID*/
-select playerName, runs, homeRuns from baseballStats order by yearID limit 10
-
-```
-
-### 3: Pinot Data Explorer
-
-There are 3 ways to [interact](https://github.com/linkedin/pinot/wiki/Pinot-Client-API) with Pinot - simple web interface, REST api and java client. Open your browser and go to http://localhost:9000/query/ and run any of the queries provided above. See [Pinot Query Syntax](https://github.com/linkedin/pinot/wiki/Pinot-Query-Language) for more info.
-
-*** 
-## Realtime quick start
-
-There are two ways to ingest data into Pinot - batch and realtime. Previous baseball stats demonstrated ingestion in batch. Typically these batch jobs are run on Hadoop periodically (e.g every hour/day/week/month). Data freshness depends on job granularity. 
-
-Let's look at an example where we ingest data in realtime. We will subscribe to meetup.com rsvp feed and index the rsvp events in real time.
-Execute quick-start-realtime.sh script in bin folder which performs the following:
-- Starts a kafka broker
-- Setups a meetup event listener that subscribes to meetup.com stream and publishes it to local kafka broker
-- Starts zookeeper, pinot-controller, pinot-broker, pinot-server.
-- Configures the realtime source
-
-If you have Docker, run `docker run -it -p 9000:9000 linkedin/pinot-quickstart-realtime`. If you have built Pinot, run `bin/quick-start-realtime.sh`.
-
-```
-Starting Kafka
-Created topic "meetupRSVPEvents".
-Starting controller, server and broker
-Added schema and table
-Realtime quick start setup complete
-Starting meetup data stream and publishing to kafka
-```
-
-Open Pinot Query Console at http://localhost:9000/query and run queries. Here are some sample queries:
-
-```sql
-/*Total number of documents in the table*/
-select count(*) from meetupRsvp
-
-/*Top 10 cities with the most rsvp*/	
-select sum(rsvp_count) from meetupRsvp group by group_city top 10
-
-/*Show 10 most recent rsvps*/
-select * from meetupRsvp order by mtime limit 10 
-
-/*Show top 10 rsvp'ed events*/
-select sum(rsvp_count) from meetupRsvp group by event_name top 10
-
-```
-
-## Pinot usage
-
-At LinkedIn, it powers more than 50+ applications such as  Who Viewed My Profile, Who Viewed My Jobs and many more, with interactive-level response times. Pinot ingests close to a billion per day in real time and processes 100 million queries per day.
-
-## Discussion Group
-Please join or post questions to this group. 
-https://groups.google.com/forum/#!forum/pinot_users
+## Getting Involved
+ - Ask questions on [Slack](https://join.slack.com/t/apache-pinot/shared_invite/enQtNDY4NDczOTYyNjk1LTExODVjY2QxYzBkMzJjNTk0ZGQ3NThiYTU2YzdlNjE0MWI5ZjUwYjI2ZTgxNjNiYWJiNmEzYjkxMTIzMzUxNTQ)
+ - Please join Apache Pinot mailing lists  
+   dev-subscribe@pinot.apache.org (subscribe to pinot-dev mailing list)  
+   dev@pinot.apache.org (posting to pinot-dev mailing list)  
+   users-subscribe@pinot.apache.org (subscribe to pinot-user mailing list)  
+   users@pinot.apache.org (positng to pinot-user mailing list)
 
 ## Documentation
-- [Home](https://github.com/linkedin/pinot/wiki/Home)
-- [How to use Pinot](https://github.com/linkedin/pinot/wiki/How-To-Use-Pinot)
-- [Design and Architecture](https://github.com/linkedin/pinot/wiki/Architecture)
-- [Pinot Administration](https://github.com/linkedin/pinot/wiki/Pinot-Administration)
+Check out [Pinot documentation](https://pinot.readthedocs.io) for a complete description of Pinot's features.
+- [Quick Demo](https://pinot.readthedocs.io/en/latest/trying_pinot.html)
+- [Pinot Architecture](https://pinot.readthedocs.io/en/latest/architecture.html)
+- [Pinot Query Language](https://pinot.readthedocs.io/en/latest/pql_examples.html)
+
+## License
+Apache Pinot is under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)
\ No newline at end of file
diff --git a/docs/trying_pinot.rst b/docs/trying_pinot.rst
index 1e2e72f..bf32b50 100644
--- a/docs/trying_pinot.rst
+++ b/docs/trying_pinot.rst
@@ -24,8 +24,8 @@ One can also run the Pinot demonstration by checking out the code on GitHub, com
 Pinot requires JDK 8 or later and Apache Maven 3.
 
 #. Check out the code from GitHub (https://github.com/apache/incubator-pinot)
-#. With Maven installed, run ``mvn install package -DskipTests`` in the directory in which you checked out Pinot.
-#. Make the generated scripts executable ``cd pinot-distribution/target/pinot-0.016-pkg; chmod +x bin/*.sh``
+#. With Maven installed, run ``mvn install package -DskipTests -Pbin-dist`` in the directory in which you checked out Pinot.
+#. Make the generated scripts executable ``cd pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin; chmod +x bin/*.sh``
 
 Trying out Offline quickstart demo
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -64,7 +64,7 @@ The full reference for the PQL query language is present in the :ref:`pql` secti
 Trying out Realtime quickstart demo
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Pinot can ingest data from streaming sources such as Kafka. 
+Pinot can ingest data from streaming sources such as Kafka.
 
 To run the demo with docker
   ``docker run -it -p 9000:9000 linkedin/pinot-quickstart-realtime``
diff --git a/pinot-distribution/pinot-assembly.xml b/pinot-distribution/pinot-assembly.xml
index 32f4d9c..3a1cfdf 100644
--- a/pinot-distribution/pinot-assembly.xml
+++ b/pinot-distribution/pinot-assembly.xml
@@ -65,7 +65,7 @@
     </file>
   </files>
   <fileSets>
-    <!-- Rename liscenses-binary directory to licenses and include it to a distribution tarbell -->
+    <!-- Rename licenses-binary directory to licenses and include it to a distribution tarbell -->
     <fileSet>
       <useDefaultExcludes>false</useDefaultExcludes>
       <directory>${pinot.root}/licenses-binary</directory>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org