You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:27:37 UTC

[GitHub] [incubator-pinot] haibow commented on a change in pull request #5731: Refreshing the Pinot project README to match docs.

haibow commented on a change in pull request #5731:
URL: https://github.com/apache/incubator-pinot/pull/5731#discussion_r461241052



##########
File path: README.md
##########
@@ -18,47 +18,71 @@
     under the License.
 
 -->
-# Apache Pinot (incubating)
+<img src="https://imgur.com/GNevDZ0.png" align="center" alt="Apache Pinot"/>
 
-[![Build Status](https://api.travis-ci.org/apache/incubator-pinot.svg?branch=master)](https://travis-ci.org/apache/incubator-pinot) 
+---------------------------------------
+
+[![Build Status](https://api.travis-ci.org/apache/incubator-pinot.svg?branch=master)](https://travis-ci.org/apache/incubator-pinot)
 [![Release](https://img.shields.io/github/release/apache/incubator-pinot/all.svg)](https://pinot.apache.org/download/)
-[![codecov.io](https://codecov.io/github/apache/incubator-pinot/branch/master/graph/badge.svg)](https://codecov.io/github/apache/incubator-pinot) 
-[![Join the chat at https://communityinviter.com/apps/apache-pinot/apache-pinot](https://img.shields.io/badge/slack-apache--pinot-brightgreen?logo=slack)](https://communityinviter.com/apps/apache-pinot/apache-pinot) 
-[![Twitter Follow](https://img.shields.io/twitter/follow/apachepinot.svg?label=Follow&style=social)](https://twitter.com/intent/follow?screen_name=apachepinot) 
+[![codecov.io](https://codecov.io/github/apache/incubator-pinot/branch/master/graph/badge.svg)](https://codecov.io/github/apache/incubator-pinot)
+[![Join the chat at https://communityinviter.com/apps/apache-pinot/apache-pinot](https://img.shields.io/badge/slack-apache--pinot-brightgreen?logo=slack)](https://communityinviter.com/apps/apache-pinot/apache-pinot)
+[![Twitter Follow](https://img.shields.io/twitter/follow/apachepinot.svg?label=Follow&style=social)](https://twitter.com/intent/follow?screen_name=apachepinot)
 [![license](https://img.shields.io/github/license/apache/pinot.svg)](LICENSE)
 
-Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
+- [What is Apache Pinot?](#what-is-apache-pinot)
+- [Features](#features)
+- [When should I use Pinot?](#when-should-i-use-pinot)
+- [Building Pinot](#building-pinot)
+- [Deploying Pinot to Kubernetes](#deploying-pinot-to-kubernetes)
+- [Join the Community](#join-the-community)
+- [Documentation](#documentation)
+- [License](#license)
+
+# What is Apache Pinot?
+
+[Apache Pinot](https://pinot.apache.org) (incubating) is a real-time distributed OLAP datastore, built to deliver scalable real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage) as well as stream data sources (such as Apache Kafka).
+
+Pinot was built by engineers at LinkedIn and Uber and is designed to scale up and out with no upper bound. Performance always remains constant based on the size of your cluster and an expected query per second (QPS) threshold.
+
+For getting started guides, deployment recipes, tutorials, and more, please visit our project documentation at [https://docs.pinot.apache.org](https://docs.pinot.apache.org).
+
+<img src="https://gblobscdn.gitbook.com/assets%2F-LtH6nl58DdnZnelPdTc%2F-M69C48fK2BhCoou1REr%2F-M69DbDfcATcZOAgyX7k%2Fpinot-overview-graphic.png?alt=media&token=3552722e-8d1d-4397-972e-a81917ced182" align="center" alt="Apache Pinot"/>
+
+## Features
+
+Pinot was originally built at LinkedIn to power rich interactive real-time analytic applications such as [Who Viewed Profile](https://www.linkedin.com/me/profile-views/urn:li:wvmp:summary/),  [Company Analytics](https://www.linkedin.com/company/linkedin/insights/),  [Talent Insights](https://business.linkedin.com/talent-solutions/talent-insights), and many more. [UberEats Restaurant Manager](https://eng.uber.com/restaurant-manager/) is another example of a customer facing Analytics App. At LinkedIn, Pinot powers 50+ user-facing products, ingesting millions of events per second and serving 100k+ queries per second at millisecond latency.
+
+* **Column-oriented**: a column-oriented database with various compression schemes such as Run Length, Fixed Bit Length.
 
-These presentations on Pinot give an overview of Pinot:
+* [**Pluggable indexing**](https://docs.pinot.apache.org/basics/features/indexing): pluggable indexing technologies Sorted Index, Bitmap Index, Inverted Index.
 
-* [Building realtime applications using Pinot @ DataCouncil](https://www.youtube.com/watch?v=mOzjVRf0yt4)
-* [Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale  - ApacheCon 2019 (Sep 2019)](https://www.slideshare.net/seunghyunlee1460/pinot-enabling-realtime-analytics-applications-linkedins-scale)
-* [Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018 (Jun 2018)](http://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584)
-* [Open Source Analytics Pipeline at LinkedIn (Sep 2016, covers Gobblin and Pinot)](http://www.slideshare.net/IssacBuenrostro/open-source-linkedin-analytics-pipeline-vldb-2016)
-* [Introduction to Pinot (Jan 2016)](http://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104)
-* [Pinot: Realtime Distributed OLAP Datastore (Aug 2015)](http://www.slideshare.net/KishoreGopalakrishna/pinot-realtime-distributed-olap-datastore)
+* **Query optimization**: ability to optimize query/execution plan based on query and segment metadata.
 
-Looking for the ThirdEye anomaly detection and root-cause analysis platform? Check out the [Pinot/ThirdEye project](https://github.com/apache/incubator-pinot/tree/master/thirdeye)
+* **Stream and batch ingest**: near real time ingestion from streams and batch ingestion from Hadoop.
 
-## Key Features
+* **Query with SQL:** SQL-like language that supports selection, aggregation, filtering, group by, order by, distinct queries on data.
 
-- A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
-- Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index, Star-Tree Index
-- Ability to optimize query/execution plan based on query and segment metadata
-- Near real time ingestion from Kafka and batch ingestion from Hadoop
-- SQL like language that supports _selection, aggregation, filtering, group by, order by, distinct_ queries on fact data
-- Support for multivalued fields
-- Horizontally scalable and fault tolerant 
+* **Multi-valued fields:** support for multi-valued fields, allowing you to query fields as comma separated values.
 
-Because of the design choices we made to achieve these goals, there are certain limitations present in Pinot:
+* **Cloud-native on Kubernetes**: Helm chart provides a horizontally scalable and fault-tolerant clustered deployment that is easy to manage using Kubernetes.
 
-- Pinot is not a replacement for database i.e it cannot be used as source of truth store, cannot mutate data 
-- While Pinot supports text search, its not a replacement for search engine i.e relevance is not supported
-- Query cannot span across multiple tables - Use Presto-Pinot connector to achieve joins and other features
+## When should I use Pinot?
 
-Pinot works very well for querying time series data with lots of Dimensions and Metrics. Example - Query (profile views, ad campaign performance, etc.) in an analytical fashion (who viewed this profile in the last weeks, how many ads were clicked per campaign). 
+Pinot is designed to execute real-time OLAP queries with low latency on massive amounts of data and events. In addition to real-time stream ingestion, Pinot also supports batch use cases with the same low latency guarantees. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. Pinot works very well for querying time series data with lots of dimensions and metrics.
 
-## Instructions to build Pinot
+Example query:
+```SQL
+SELECT sum(clicks), sum(impressions) FROM AdAnalyticsTable
+  WHERE
+       ((daysSinceEpoch >= 17849 AND daysSinceEpoch <= 17856)) AND
+       accountId IN (123456789)
+  GROUP BY
+       daysSinceEpoch TOP 100
+```
+
+Pinot is not a replacement for database i.e it cannot be used as source of truth store, cannot mutate data. While Pinot [supports text search](https://docs.pinot.apache.org/basics/features/text-search-support), it's not a replacement for a search engine. Also, Pinot queries cannot span across multiple tables by default. You can use the [Presto-Pinot connector](https://prestosql.io/blog/2020/05/25/pinot-connector.html) to achieve table joins and other features.

Review comment:
       The doc for Presto-Pinot connector links to prestosql. 
   Please use or add prestodb's link (https://prestodb.io/docs/current/connector/pinot.html), as referenced in Pinot Presto Helm: https://github.com/apache/incubator-pinot/blob/master/kubernetes/helm/presto/Chart.yaml#L29




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org