You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by jl...@apache.org on 2019/02/04 19:47:08 UTC

[incubator-pinot.wiki] branch master updated: Updated Architecture (markdown)

This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.wiki.git


The following commit(s) were added to refs/heads/master by this push:
     new 18844c9  Updated Architecture (markdown)
18844c9 is described below

commit 18844c9fa58d27644081710654c5aa1a76006928
Author: Jialiang Li <jl...@linkedin.com>
AuthorDate: Mon Feb 4 11:47:06 2019 -0800

    Updated Architecture (markdown)
---
 Architecture.md | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/Architecture.md b/Architecture.md
index 852c980..537dd64 100644
--- a/Architecture.md
+++ b/Architecture.md
@@ -2,7 +2,7 @@
 
 [[image2014-11-12 19-54-12.png]]
 
-Goal of Pinot is to provide analytics on any given data set. The input data set may exists either in Hadoop or Kafka. At LinkedIn, most tracking data is published into Kafka and it eventually moves to Hadoop via ETL process. In order to provide fast analytics, Pinot organizes the data into columnar format and make use of various indexing technologies such as bitmap, inverted index etc. Data on Hadoop is converted into Index Segment via Map reduce jobs. Index segments are then pushed to P [...]
+Goal of Pinot is to provide analytics on any given data set. The input data set may exist either in Hadoop or Kafka. At LinkedIn, most tracking data is published into Kafka and it eventually moves to Hadoop via ETL process. In order to provide fast analytics, Pinot organizes the data into columnar format and makes use of various indexing technologies such as bitmap, inverted index etc. Data on Hadoop is converted into Index Segment via Map reduce jobs. Index segments are then pushed to P [...]
 
 ### Data Flow
 
@@ -20,7 +20,7 @@ Real-time flow is slightly different from the Hadoop flow. Real-time nodes direc
 
 #### Query routing
 
-From a user point of view, all queries are sent at Pinot Broker. The user does not have to worry about the real time and historical nodes. Pinot Broker is smart enough to query realtime and historical nodes separately and merge the results before sending back the response. For example, let's say user queries **select count(*) from table where time > T** time stamp, Pinot understands that time is a special dimension and breaks the query appropriately between realtime and historical. If th [...]
+From a user point of view, all queries are sent at Pinot Broker. The user does not have to worry about the real time and historical nodes. Pinot Broker is smart enough to query realtime and historical nodes separately and merge the results before sending back the response. For example, let's say user queries **select count(*) from table where time > T** time stamp, Pinot understands that time is a special dimension and breaks the query appropriately between realtime and historical. If th [...]
 
 ### Pinot Components Architecture
 
@@ -40,13 +40,13 @@ Pinot team provides the library for generating the segments. Pinot expects the i
 
 ##### Segment move from HDFS to NFS
 
-After Pinot segments are generated on Hadoop, they need to be transferred over to online serving cluster. This is done via Hadoop server push job provided by Pinot. This job is non map reduce java job that runs on the Hadoop gateway machine via azkaban. It reads the data file from HDFS and performs a send file HTTP POST on one of the endpoints exposed by the Pinot Controllers. The files are then saved to NFS directories mounted on the controller nodes.
+After Pinot segments are generated on Hadoop, they need to be transferred over to online serving cluster. This is done via Hadoop server push job provided by Pinot. This job is non map reduce java job that runs on the Hadoop gateway machine via Azkaban. It reads the data file from HDFS and performs a send file HTTP POST on one of the endpoints exposed by the Pinot Controllers. The files are then saved to NFS directories mounted on the controller nodes.
 
-After saving the segment to NFS, controller assigns the segment to one of the pinot servers. The assignment of a segment to Pinot server is maintained and managed by Helix.
+After saving the segment to NFS, controller assigns the segment to one of the pinot servers. The assignment of a segment to Pinot server is maintained and managed by [Helix](https://helix.apache.org/).
 
 ##### Segment move from NFS to Historical Node
 
-The segment to Pinot server assignment is stored in Helix Idealstate. Helix monitors the liveness of a server, when the server starts up, Helix will notify the pinot server about this segment. The metadata about this segment contains the URI to fetch the segment and is stored in Helix Property Store backed by Zookeeper. Pinot server downloads the segment file from the controller and extracts its content on the local disk.
+The segment to Pinot server assignment is stored in Helix Idealstate. Helix monitors the liveness of a server. When the server starts up, Helix will notify the pinot server about this segment. The metadata about this segment contains the URI to fetch the segment and is stored in Helix Property Store backed by Zookeeper. Pinot server downloads the segment file from the controller and extracts its content on the local disk.
 
 ##### Segment Loading
 
@@ -78,49 +78,49 @@ All Pinot Servers and Brokers are managed by Apache Helix. Apache Helix is a gen
 
 Helix divides nodes into 3 logical components based on their responsibilities:
 
-1.  **Participant**: The nodes that actually host the distributed resources
-2.  **Spectator**: The nodes that simply observe the current state of each Participant and routes requests accordingly. Routers, for example, need to know the instance on which a partition is hosted and its state in order to route the request to the appropriate endpoint
-3.  **Controller**: The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability
+1.  **Participant**: The nodes that actually host the distributed resources.
+2.  **Spectator**: The nodes that simply observe the current state of each Participant and routes requests accordingly. Routers, for example, need to know the instance on which a partition is hosted and its state in order to route the request to the appropriate endpoint.
+3.  **Controller**: The node that observes and controls the Participant nodes. It is responsible for coordinating all transitions in the cluster and ensuring that state constraints are satisfied while maintaining cluster stability.
 
 Pinot Terminology and its mapping to Helix Concepts. See [Pinot Core Concepts and Terminology](Pinot-Core-Concepts-and-Terminology)
 
 *   **Pinot Segment:** This is modeled as **_Helix Partition_**. Each Pinot Segment can have multiple copies referred to as Replicas.
 *   **Pinot Table**: Multiple Pinot segment are grouped into a logical entity referred to as Pinot Table. All segments belonging to a Pinot Table have the same schema.
 *   **Pinot Server**: This is modeled as a _**Helix Participant**_. Pinot Server host the segments (Helix Partition) belonging to one or more Pinot Table (Helix Resource).
-*   **Pinot Broker**: This is modeled as a **Helix Spectator** that observes the cluster for changes in the state of segments and Pinot Server. In order support Multi tenancy in Pinot Brokers, Pinot Brokers are also modeled as Helix Participants.
+*   **Pinot Broker**: This is modeled as a **Helix Spectator** that observes the cluster for changes in the state of segments and Pinot Server. In order to support Multi tenancy in Pinot Brokers, Pinot Brokers are also modeled as Helix Participants.
 
 ##### 
 [[image2015-5-17 13-32-28.png]]
 
-<span style="line-height: 1.4285715;">**Zookeeper**: Zookeeper is used to store the state of the cluster. Its also used to store configuration needed for Helix and Pinot. Only dynamic configuration that is specific to a use case such as table schema, number of segments and other metadata is stored in Zookeeper. Zookeeper is also used by Helix Controller to communicate with the PARTICIPANT and SPECTATORS. Zookeeper is strongly consistent and fault tolerant. We typically run 3 or5 Zookeepe [...]
+<span style="line-height: 1.4285715;">**Zookeeper**: Zookeeper is used to store the state of the cluster. Its also used to store configuration needed for Helix and Pinot. Only dynamic configuration that is specific to a use case such as table schema, number of segments and other metadata is stored in Zookeeper. Zookeeper is also used by Helix Controller to communicate with the PARTICIPANT and SPECTATORS. Zookeeper is strongly consistent and fault tolerant. We typically run 3 or 5 Zookeep [...]
 
-<span style="line-height: 1.4285715;"></span>**Pinot Controller:** All admin commands such as creating allocating Pinot Server and Brokers for each use case, Creating New Table or Uploading New Segments go through Pinot Controller. Pinot Controller wraps Helix Controller within the same process. All Pinot Admin Commands internally get translated to Helix Commands via Helix Admin Apis.</span>
+<span style="line-height: 1.4285715;"></span>**Pinot Controller:** All admin commands such as allocating Pinot Server and Brokers for each use case, creating new table or uploading new segments go through Pinot Controller. Pinot Controller wraps Helix Controller within the same process. All Pinot Admin Commands internally get translated to Helix Commands via Helix Admin Apis.</span>
 
-*   <span style="line-height: 1.4285715;">Allocate Pinot Server/Broker: This commands is run when we on board new use case or to allocate additional capacity to an existing use case. This parameter simply takes in the #1 use case name X #2\. number of Pinot Servers S and number of Brokers B needed for the use case. Pinot Controllers uses Helix Tagging Api to tag S Server Instances and B Brokers Instances in the cluster as X. This means all subsequent tables that belong to use case X will [...]
+*   <span style="line-height: 1.4285715;">Allocate Pinot Server/Broker: This commands is run when we on board new use case or allocate additional capacity to an existing use case. This parameter simply takes in the #1 use case name X #2\. number of Pinot Servers S and number of Brokers B needed for the use case. Pinot Controllers use Helix Tagging Api to tag S Server Instances and B Brokers Instances in the cluster as X. This means all subsequent tables that belong to use case X will be  [...]
 *   <span style="line-height: 1.4285715;">Create Table: This will create an Empty IdealState for a Table. This table must also be tagged as X which means all segments of this table will be allocated to Instances that have the same tag X. Additional metadata such as table retention, allocation strategy etc is stored in Zookeeper using Helix Property Store Api.</span>
-*   <span style="line-height: 1.4285715;">Upload Segment: Pinot Controller adds an segment entry to the table IdealState. The number of entries added is according the number of replicas configured for the Table T. While its possible to let Helix decide the assignment of Segment to Pinot Server Instance by using AUTO Idealstate mode, in the current version we use the CUSTOM Idealstate mode. See [Helix Idealstate Mode](http://helix.apache.org/0.6.4-docs/tutorial_rebalance.html) for additio [...]
+*   <span style="line-height: 1.4285715;">Upload Segment: Pinot Controller adds an segment entry to the table IdealState. The number of entries added is according the number of replicas configured for the Table T. While it's possible to have Helix decide the assignment of Segment to Pinot Server Instance by using AUTO Idealstate mode, in the current version we use the CUSTOM Idealstate mode. See [Helix Idealstate Mode](http://helix.apache.org/0.6.4-docs/tutorial_rebalance.html) for addit [...]
 
-<span style="line-height: 1.4285715;"></span>**Helix Controller:** As explained in previous section all Pinot Admin commands simply get translated into Helix Admin Commands. The Helix commands in turn update the metadata stored in Zookeeper. Helix Controller acts as the brain of the system and translates all metadata changes into a set of actions and is responsible for the execution of these actions on the respective participants. This is achieved via State Transitions. See [Helix Archit [...]
+<span style="line-height: 1.4285715;"></span>**Helix Controller:** As explained in previous section all Pinot Admin commands simply get translated into Helix Admin Commands. The Helix commands in turn update the metadata stored in Zookeeper. Helix Controller acts as the brain of the system and translates all metadata changes into a set of actions and is responsible for the execution of these actions on the respective participants. This is achieved via State Transitions. See [Helix Archit [...]
 
 See [Multi tenancy in Pinot 2.0](https://github.com/linkedin/pinot/wiki/Multitenancy#multi-tenancy-in-pinot-20) for more info on how Multi tenancy is solved in Pinot 2.0.
 
 #### Broker Node
 
-The responsibility of Broker is to route a given query to appropriate Pinot Server instances, collect the responses and merge the responses into final result and send it back to the client. The two main steps involved in this are
+The responsibility of Broker is to route a given query to appropriate Pinot Server instances, collect and merge the responses into final result and send it back to the client. The two main steps involved in this are
 
-**service discovery**: Service discovery is the mechanism of knowing what Tables are hosted in the cluster and location of the Table Segments and the time range for each segment. As explained in the previous section, this is derived from the information stored in Zookeeper via Helix library. Broker uses this information not only to compute the subset of the nodes to send the request to, it prunes the number of segments to be queries. This is achieved by looking at the time range in the q [...]
+**service discovery**: Service discovery is the mechanism of knowing what Tables are hosted in the cluster and location of the Table Segments and the time range for each segment. As explained in the previous section, this is derived from the information stored in Zookeeper via Helix library. Broker uses this information not only to compute the subset of the nodes to send the request to, but also prunes the number of segments to be queries. This is achieved by looking at the time range in [...]
 
-**Scatter gather:** Once the broker computes the set of nodes to route the query, the requests are routed to the respective Pinot Server nodes. Each server nodes processes the query and returns the response, broker will merge the responses from individual Server and returns the response to the client. The merging logic is dependent on the query selection with Limit, aggregation, group by top K etc. If any of the servers fails to process the query or Time's out, broker will return partial [...]
+**Scatter gather:** Once the broker computes the set of nodes to route the query, the requests are routed to the respective Pinot Server nodes. Each server node processes the query and returns the response. Broker will merge the responses from individual Server and returns the response back to the client. The merging logic is dependent on the query selection with limit, aggregation, group by top K etc. If any of the servers fails to process the query or Time's out, broker will return par [...]
 
 ### Pinot Index Segment
 
 [[image2015-5-17 17-59-10.png]]
 
-Pinot Index Segment is the columnar representation of the raw data. The raw data is generally represented in a row oriented format which can be AVRO, JSON, CSV etc. Converting row oriented format into columnar can reduce storage space and allow fast scan for specific columns. Row oriented format is efficient when query is either updating or reading a specific row in the data. This is typically the case with OLTP use cases where relational databases such as Oracle, MySQL etc is used. Colu [...]
+Pinot Index Segment is the columnar representation of the raw data. The raw data is generally represented in a row oriented format which can be AVRO, JSON, CSV etc. Converting row oriented format into columnar can reduce storage space and allow fast scan for specific columns. Row oriented format is efficient when query is either updating or reading a specific row in the data. This is typically the case with OLTP use cases where relational databases such as Oracle, MySQL etc is used. Colu [...]
 
-Columnar format also provides storage related benefits. Some of the columns contains values that are repetitive. For e.g. if the column is of type country then storing in row oriented format will require space proportional varchar(100) * number of rows. Columnar format can apply various encoding such as Fixed Bit and on top of that, apply compression algorithm to compress the data further. While these techniques can be applied for row oriented storage as well, the encoding and compressio [...]
+Columnar format also provides storage related benefits. Some of the columns contain values that are repetitive. For e.g. if the column is of type country then storing in row oriented format will require space proportional varchar(100) * number of rows. Columnar format can apply various encoding such as Fixed Bit and on top of that, applying compression algorithm to compress the data further. While these techniques can be applied for row oriented storage as well, the encoding and compress [...]
 
-Columnar formats definitely have their down sides, creating efficient columnar formats typically take time and once created they cannot be mutated easily. While this might a problem for OLTP workloads, typical OLAP use cases consists of TimeSeries data which is immutable.
+Columnar formats definitely have their down sides, creating efficient columnar formats typically take time and once created they cannot be mutated easily. While this might be a problem for OLTP workloads, typical OLAP use cases consists of TimeSeries data which is immutable.
 
 #### Anatomy of Index Segment
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org