You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by je...@apache.org on 2018/11/27 21:32:13 UTC

[incubator-pinot] 01/01: Fixing most todos in documentation

This is an automated email from the ASF dual-hosted git repository.

jenniferdai pushed a commit to branch doctodo
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 71aa28aa9da8f046edb57b38724ec0f43366186f
Author: Jennifer Dai <jd...@linkedin.com>
AuthorDate: Tue Nov 27 13:31:54 2018 -0800

    Fixing most todos in documentation
---
 docs/architecture.rst      |   8 +-
 docs/conf.py               |   2 +
 docs/llc.rst               |   2 +-
 docs/multitenancy.rst      | 290 ++++++++++++++++++++++-----------------------
 docs/pluggable_streams.png | Bin 0 -> 136984 bytes
 docs/pluggable_streams.rst |  10 +-
 docs/znode_layout.png      | Bin 0 -> 43271 bytes
 7 files changed, 157 insertions(+), 155 deletions(-)

diff --git a/docs/architecture.rst b/docs/architecture.rst
index 9b61f2a..1fc8f88 100644
--- a/docs/architecture.rst
+++ b/docs/architecture.rst
@@ -18,7 +18,7 @@ Pinot Components
 * Pinot Server: Hosts one or more segments and serves queries from those segments
 * Pinot Broker: Accepts queries from clients and routes them to one or more servers, and returns consolidated response to the server.
 
-Pinot leverages `Apache Helix <http://helix.apache.org>`_ for cluster management. 
+Pinot leverages `Apache Helix <http://helix.apache.org>`_ for cluster management.
 Apache Helix is a generic cluster management framework to manage partitions and replicas in a distributed system. See http://helix.apache.org for additional information.
 Helix uses Zookeeper to store cluster state and metadata.
 
@@ -51,7 +51,7 @@ Tables in Pinot can be configured to be offline only, or realtime only, or a hyb
 
 Segments for offline tables are constructed outside of Pinot, typically in Hadoop via map-reduce jobs. These segments are then ingested
 into Pinot via REST API provided by the Controller. The controller looks up the table's configuration and assigns the segment
-to the servers that host the table. It may assign multiple servers for each servers depending on the number of replicas 
+to the servers that host the table. It may assign multiple servers for each servers depending on the number of replicas
 configured for that table.
 Pinot provides libraries to create Pinot segments out of input files in AVRO, JSON or CSV formats in a hadoop job, and push
 the constructed segments to the controlers via REST APIs.
@@ -73,9 +73,9 @@ to consume the next set of events from the stream.
 
 Depending on the type of consumer configured, realtime segments may be held locally in the server, or pushed the controller.
 
-**TODO Add reference to the realtime section here**
+See :doc:`realtime design <llc>` for more details.
 
-A hybrid Pinot table essentially has both realtime as well as offline tables. 
+A hybrid Pinot table essentially has both realtime as well as offline tables.
 In such a table, offline segments may be pushed periodically (say, once a day). The retention on the offline table
 can be set to a high value (say, a few years) since segments are coming in on a periodic basis, whereas the retention
 on the realtime part can be small (say, a few days). Once an offline segment is pushed to cover a recent time period,
diff --git a/docs/conf.py b/docs/conf.py
index d3ab560..9163bda 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -293,6 +293,8 @@ texinfo_documents = [
      'Miscellaneous'),
 ]
 
+extensions = ['sphinx.ext.intersphinx']
+
 # Documents to append as an appendix to all manuals.
 #texinfo_appendices = []
 
diff --git a/docs/llc.rst b/docs/llc.rst
index e9df3fa..15d2e0a 100644
--- a/docs/llc.rst
+++ b/docs/llc.rst
@@ -47,7 +47,7 @@ Important tuning parameters for Realtime Pinot
 * replicasPerPartition: This number indicates how many replicas are needed for each partition to be consumed from the stream
 * realtime.segment.flush.threshold.size: This parameter should be set to the total number of rows of a topic that a realtime consuming server can hold in memory. Default value is 5M. If the value is set to 0, then the number of rows is automatically adjusted such that the size of the segment generated is as per the setting realtime.segment.flush.desired.size
 * realtime.segment.flush.desired.size: Default value is "200M". The setting is used only if realtime.segment.flush.threshold.size is set to 0
-* realtime.segment.flush.threshold.size.llc: This parameter overrides realtime.segment.flush.threshold.size. Useful when migrating live from HLC to LLC 
+* realtime.segment.flush.threshold.size.llc: This parameter overrides realtime.segment.flush.threshold.size. Useful when migrating live from HLC to LLC
 * pinot.server.instance.realtime.alloc.offheap: Default is false. Set it to true if you want off-heap allocation for dictionaries and no-dictionary column
 * pinot.server.instance.realtime.alloc.offheap.direct: Default is false. Set it to true if you want off-heap allocation from DirectMemory (as opposed to MMAP)
 * pinot.server.instance.realtime.max.parallel.segment.builds: Default is 0 (meaning infinite). Set it to a number if you want to limit number of segment builds. Segment builds take up heap memory, so it is useful to have a max setting and limit the number of simultaneous segment builds on a single server instance JVM.
diff --git a/docs/multitenancy.rst b/docs/multitenancy.rst
index 6c52fa1..edd4bf8 100644
--- a/docs/multitenancy.rst
+++ b/docs/multitenancy.rst
@@ -80,7 +80,7 @@ Pinot Cluster creation
 
 When the cluster is created the Zookeeper ZNode layout looks as follows.
 
-**TODO**:: Is there a picture here?
+.. figure:: znode_layout.png
 
 Adding Nodes to cluster
 -----------------------
@@ -89,13 +89,13 @@ Adding node to cluster can be done in two ways, manual or automatic. This is con
 
 ::
 
-  {  
-   "id" : "PinotPerfTestCluster",  
-   "simpleFields" : {  
-   "allowParticipantAutoJoin" : "true"  
-   },  
-   "mapFields" : { },  
-   "listFields" : { }  
+  {
+   "id" : "PinotPerfTestCluster",
+   "simpleFields" : {
+   "allowParticipantAutoJoin" : "true"
+   },
+   "mapFields" : { },
+   "listFields" : { }
   }
 
 In Pinot 2.0 we will set AUTO_JOIN to true. This means after the SRE's procure the hardware they can simply deploy the Pinot war and provide the cluster name. When the nodes start up, they join the cluster and registers themselves as server_untagged or broker_untagged. This is what one would see in Helix.
@@ -104,36 +104,36 @@ The znode ``CONFIGS/PARTICIPANT/ServerInstanceName`` looks lik below:
 
 ::
 
-    {  
-     "id":"Server_localhost_8098"  
-     ,"simpleFields":{  
-     "HELIX_ENABLED":"true"  
-     ,"HELIX_HOST":"Server_localhost"  
-     ,"HELIX_PORT":"8098"     
-     }  
-     ,"listFields":{       
-     "TAG_LIST":["server_untagged"]  
-     }                
-     ,"mapFields":{                  
-     }  
+    {
+     "id":"Server_localhost_8098"
+     ,"simpleFields":{
+     "HELIX_ENABLED":"true"
+     ,"HELIX_HOST":"Server_localhost"
+     ,"HELIX_PORT":"8098"
+     }
+     ,"listFields":{
+     "TAG_LIST":["server_untagged"]
+     }
+     ,"mapFields":{
+     }
     }
 
 And the znode ``CONFIGS/PARTICIPANT/BrokerInstanceName`` looks like below:
 
 ::
 
-    {  
-     "id":"Broker_localhost_8099"  
-     ,"simpleFields":{  
-     "HELIX_ENABLED":"true"  
-     ,"HELIX_HOST":"Broker_localhost"  
-     ,"HELIX_PORT":"8099"  
-     }  
-     ,"listFields":{  
-     "TAG_LIST":["broker_untagged"]  
-     }  
-     ,"mapFields":{  
-     }  
+    {
+     "id":"Broker_localhost_8099"
+     ,"simpleFields":{
+     "HELIX_ENABLED":"true"
+     ,"HELIX_HOST":"Broker_localhost"
+     ,"HELIX_PORT":"8099"
+     }
+     ,"listFields":{
+     "TAG_LIST":["broker_untagged"]
+     }
+     ,"mapFields":{
+     }
     }
 
 Adding Resources to Cluster
@@ -145,19 +145,19 @@ There is one resource idealstate created for Broker by default called broker_res
 
 ::
 
-  {  
-   "id" : "brokerResource",  
-   "simpleFields" : {  
-   "IDEAL_STATE_MODE" : "CUSTOMIZED",  
-   "MAX_PARTITIONS_PER_INSTANCE" : "2147483647",  
-   "NUM_PARTITIONS" : "2147483647",  
-   "REBALANCE_MODE" : "CUSTOMIZED",  
-   "REPLICAS" : "2147483647",  
-   "STATE_MODEL_DEF_REF" : "BrokerResourceOnlineOfflineStateModel",  
-   "STATE_MODEL_FACTORY_NAME" : "DEFAULT"  
-   },  
-   "mapFields" : { },  
-   "listFields" : { }  
+  {
+   "id" : "brokerResource",
+   "simpleFields" : {
+   "IDEAL_STATE_MODE" : "CUSTOMIZED",
+   "MAX_PARTITIONS_PER_INSTANCE" : "2147483647",
+   "NUM_PARTITIONS" : "2147483647",
+   "REBALANCE_MODE" : "CUSTOMIZED",
+   "REPLICAS" : "2147483647",
+   "STATE_MODEL_DEF_REF" : "BrokerResourceOnlineOfflineStateModel",
+   "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
+   },
+   "mapFields" : { },
+   "listFields" : { }
   }
 
 
@@ -168,7 +168,7 @@ Sample Curl request
 
 ::
 
-  curl -i -X POST -H 'Content-Type: application/json' -d '{"requestType":"create", "resourceName":"XLNT","tableName":"T1", "timeColumnName":"daysSinceEpoch", "timeType":"daysSinceEpoch","numberOfDataInstances":4,"numberOfCopies":2,"retentionTimeUnit":"DAYS", "retentionTimeValue":"700","pushFrequency":"daily", "brokerTagName":"XLNT", "numberOfBrokerInstances":1, "segmentAssignmentStrategy":"BalanceNumSegmentAssignmentStrategy", "resourceType":"OFFLINE", "metadata":{}}' 
+  curl -i -X POST -H 'Content-Type: application/json' -d '{"requestType":"create", "resourceName":"XLNT","tableName":"T1", "timeColumnName":"daysSinceEpoch", "timeType":"daysSinceEpoch","numberOfDataInstances":4,"numberOfCopies":2,"retentionTimeUnit":"DAYS", "retentionTimeValue":"700","pushFrequency":"daily", "brokerTagName":"XLNT", "numberOfBrokerInstances":1, "segmentAssignmentStrategy":"BalanceNumSegmentAssignmentStrategy", "resourceType":"OFFLINE", "metadata":{}}'
 
 This is how it looks in Helix after running the above command.
 
@@ -177,17 +177,17 @@ The znode ``CONFIGS/PARTICIPANT/Broker_localhost_8099`` looks as follows:
 
 ::
 
-    {  
-     "id":"Broker_localhost_8099"  
-     ,"simpleFields":{  
-     "HELIX_ENABLED":"true"  
-     ,"HELIX_HOST":"Broker_localhost"  
-     ,"HELIX_PORT":"8099"  
+    {
+     "id":"Broker_localhost_8099"
+     ,"simpleFields":{
+     "HELIX_ENABLED":"true"
+     ,"HELIX_HOST":"Broker_localhost"
+     ,"HELIX_PORT":"8099"
      }
-     ,"listFields":{  
-     "TAG_LIST":["broker_mirrorProfileViewOfflineEvents1"]  
+     ,"listFields":{
+     "TAG_LIST":["broker_mirrorProfileViewOfflineEvents1"]
      }
-     ,"mapFields":{  
+     ,"mapFields":{
      }
     }
 
@@ -195,22 +195,22 @@ And the znode ``IDEALSTATES/brokerResource`` looks like below after Data resourc
 
 ::
 
-    {  
-     "id":"brokerResource"  
-     ,"simpleFields":{  
-     "IDEAL_STATE_MODE":"CUSTOMIZED"  
-     ,"MAX_PARTITIONS_PER_INSTANCE":"2147483647"  
-     ,"NUM_PARTITIONS":"2147483647"  
-     ,"REBALANCE_MODE":"CUSTOMIZED"  
-     ,"REPLICAS":"2147483647"  
-     ,"STATE_MODEL_DEF_REF":"BrokerResourceOnlineOfflineStateModel"  
-     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"  
+    {
+     "id":"brokerResource"
+     ,"simpleFields":{
+     "IDEAL_STATE_MODE":"CUSTOMIZED"
+     ,"MAX_PARTITIONS_PER_INSTANCE":"2147483647"
+     ,"NUM_PARTITIONS":"2147483647"
+     ,"REBALANCE_MODE":"CUSTOMIZED"
+     ,"REPLICAS":"2147483647"
+     ,"STATE_MODEL_DEF_REF":"BrokerResourceOnlineOfflineStateModel"
+     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
      }
-     ,"listFields":{  
+     ,"listFields":{
      }
-     ,"mapFields":{  
-     "mirrorProfileViewOfflineEvents1_O":{  
-     "Broker_localhost_8099":"ONLINE"  
+     ,"mapFields":{
+     "mirrorProfileViewOfflineEvents1_O":{
+     "Broker_localhost_8099":"ONLINE"
      }
      }
     }
@@ -222,38 +222,38 @@ The znode ``CONFIGS/PARTICIPANT/Server_localhost_8098`` looks as below
 
 ::
 
-    {  
-     "id":"Server_localhost_8098"  
-     ,"simpleFields":{  
-     "HELIX_ENABLED":"true"  
-     ,"HELIX_HOST":"Server_localhost"  
-     ,"HELIX_PORT":"8098"  
-     }  
-     ,"listFields":{  
-     "TAG_LIST":["XLNT"]  
-     }  
-     ,"mapFields":{  
-     }  
+    {
+     "id":"Server_localhost_8098"
+     ,"simpleFields":{
+     "HELIX_ENABLED":"true"
+     ,"HELIX_HOST":"Server_localhost"
+     ,"HELIX_PORT":"8098"
+     }
+     ,"listFields":{
+     "TAG_LIST":["XLNT"]
+     }
+     ,"mapFields":{
+     }
     }
 
 And the znode ``/IDEALSTATES/XLNT (XLNT Data Resource IdealState)`` looks as below:
 
 ::
 
-    {  
-     "id":"XLNT"  
-     ,"simpleFields":{  
-     "IDEAL_STATE_MODE":"CUSTOMIZED"  
-     ,"INSTANCE_GROUP_TAG":"XLNT"  
-     ,"MAX_PARTITIONS_PER_INSTANCE":"1"  
-     ,"NUM_PARTITIONS":"0"  
-     ,"REBALANCE_MODE":"CUSTOMIZED"  
-     ,"REPLICAS":"1"  
-     ,"STATE_MODEL_DEF_REF":"SegmentOnlineOfflineStateModel"  
-     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"  
-     }  
-     ,"listFields":{}  
-     ,"mapFields":{ }  
+    {
+     "id":"XLNT"
+     ,"simpleFields":{
+     "IDEAL_STATE_MODE":"CUSTOMIZED"
+     ,"INSTANCE_GROUP_TAG":"XLNT"
+     ,"MAX_PARTITIONS_PER_INSTANCE":"1"
+     ,"NUM_PARTITIONS":"0"
+     ,"REBALANCE_MODE":"CUSTOMIZED"
+     ,"REPLICAS":"1"
+     ,"STATE_MODEL_DEF_REF":"SegmentOnlineOfflineStateModel"
+     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
+     }
+     ,"listFields":{}
+     ,"mapFields":{ }
     }
 
 
@@ -277,31 +277,31 @@ The znode ``/PROPERTYSTORE/CONFIGS/RESOURCE/XLNT`` like like:
 
 ::
 
-    {  
-     "id":"mirrorProfileViewOfflineEvents1_O"  
-     ,"simpleFields":{  
-     "brokerTagName":"broker_mirrorProfileViewOfflineEvents1"  
-     ,"numberOfBrokerInstances":"1"  
-     ,"numberOfCopies":"1"  
-     ,"numberOfDataInstances":"1"  
-     ,"pushFrequency":"daily"  
-     ,"resourceName":"mirrorProfileViewOfflineEvents1"  
-     ,"resourceType":"OFFLINE"  
-     ,"retentionTimeUnit":"DAYS"  
-     ,"retentionTimeValue":"300"  
-     ,"segmentAssignmentStrategy":"BalanceNumSegmentAssignmentStrategy"  
-     ,"timeColumnName":"daysSinceEpoch"  
-     ,"timeType":"DAYS"  
-     }  
-     ,"listFields":{  
-     "tableName":["T1"]  
-     }  
-     ,"mapFields":{  
-     "metadata":{  
-     }  
-     }  
-    }  
-    //This will change slightly when retention properties   
+    {
+     "id":"mirrorProfileViewOfflineEvents1_O"
+     ,"simpleFields":{
+     "brokerTagName":"broker_mirrorProfileViewOfflineEvents1"
+     ,"numberOfBrokerInstances":"1"
+     ,"numberOfCopies":"1"
+     ,"numberOfDataInstances":"1"
+     ,"pushFrequency":"daily"
+     ,"resourceName":"mirrorProfileViewOfflineEvents1"
+     ,"resourceType":"OFFLINE"
+     ,"retentionTimeUnit":"DAYS"
+     ,"retentionTimeValue":"300"
+     ,"segmentAssignmentStrategy":"BalanceNumSegmentAssignmentStrategy"
+     ,"timeColumnName":"daysSinceEpoch"
+     ,"timeType":"DAYS"
+     }
+     ,"listFields":{
+     "tableName":["T1"]
+     }
+     ,"mapFields":{
+     "metadata":{
+     }
+     }
+    }
+    //This will change slightly when retention properties
     //are stored at table scope </pre>
 
 
@@ -309,31 +309,31 @@ The znode ``/IDEALSTATES/XLNT (XLNT Data Resource IdealState)``
 
 ::
 
-    {  
-     "id":"XLNT_O"  
-     ,"simpleFields":{  
-     "IDEAL_STATE_MODE":"CUSTOMIZED"  
-     ,"INSTANCE_GROUP_TAG":"XLNT_O"  
-     ,"MAX_PARTITIONS_PER_INSTANCE":"1"  
-     ,"NUM_PARTITIONS":"3"  
-     ,"REBALANCE_MODE":"CUSTOMIZED"  
-     ,"REPLICAS":"1"  
-     ,"STATE_MODEL_DEF_REF":"SegmentOnlineOfflineStateModel"  
-     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"  
-     }  
-     ,"listFields":{  
-     }  
-     ,"mapFields":{  
-     "XLNT_T1_daily_2014-08-01_2014-08-01_0":{  
-     "Server_localhost_8098":"ONLINE"  
-     }  
-     ,"XLNT_T1_daily_2014-08-01_2014-08-01_1":{  
-     "Server_localhost_8098":"ONLINE"  
-     }  
-     ,"XLNT_T1_daily_2014-08-01_2014-08-01_2":{  
-     "Server_localhost_8098":"ONLINE"  
-     }  
-     }  
+    {
+     "id":"XLNT_O"
+     ,"simpleFields":{
+     "IDEAL_STATE_MODE":"CUSTOMIZED"
+     ,"INSTANCE_GROUP_TAG":"XLNT_O"
+     ,"MAX_PARTITIONS_PER_INSTANCE":"1"
+     ,"NUM_PARTITIONS":"3"
+     ,"REBALANCE_MODE":"CUSTOMIZED"
+     ,"REPLICAS":"1"
+     ,"STATE_MODEL_DEF_REF":"SegmentOnlineOfflineStateModel"
+     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
+     }
+     ,"listFields":{
+     }
+     ,"mapFields":{
+     "XLNT_T1_daily_2014-08-01_2014-08-01_0":{
+     "Server_localhost_8098":"ONLINE"
+     }
+     ,"XLNT_T1_daily_2014-08-01_2014-08-01_1":{
+     "Server_localhost_8098":"ONLINE"
+     }
+     ,"XLNT_T1_daily_2014-08-01_2014-08-01_2":{
+     "Server_localhost_8098":"ONLINE"
+     }
+     }
     }
 
 
diff --git a/docs/pluggable_streams.png b/docs/pluggable_streams.png
new file mode 100644
index 0000000..b5a3cd7
Binary files /dev/null and b/docs/pluggable_streams.png differ
diff --git a/docs/pluggable_streams.rst b/docs/pluggable_streams.rst
index ab76fb5..876e31c 100644
--- a/docs/pluggable_streams.rst
+++ b/docs/pluggable_streams.rst
@@ -1,8 +1,8 @@
 Pluggable Streams
 =================
 
-Prior to commit `ba9f2d <https://github.com/linkedin/pinot/commit/ba9f2ddfc0faa42fadc2cc48df1d77fec6b174fb>`_, Pinot was only able to support reading 
-from `Kafka <https://kafka.apache.org/documentation/>`_ stream. 
+Prior to commit `ba9f2d <https://github.com/linkedin/pinot/commit/ba9f2ddfc0faa42fadc2cc48df1d77fec6b174fb>`_, Pinot was only able to support reading
+from `Kafka <https://kafka.apache.org/documentation/>`_ stream.
 
 Pinot now enables its users to write plug-ins to read from pub-sub streams
 other than Kafka. (Please refer to `Issue #2583 <https://github.com/linkedin/pinot/issues/2583>`_)
@@ -22,7 +22,7 @@ Pinot Stream Consumers
 ----------------------
 Pinot consumes rows from event streams and serves queries on the data consumed. Rows may be consumed either at stream level (also referred to as high level) or at partition level (also referred to as low level).
 
-**TODO**:: Refer to the pictures in the design document
+.. figure:: pluggable_streams.png
 
 .. figure:: High-level-stream.png
 
@@ -71,7 +71,7 @@ In order to add a new type of stream (say,Foo) implement the following classes:
 #. FooMetadataProvider implements `StreamMetadataProvider <https://github.com/linkedin/pinot/blob/master/pinot-core/src/main/java/com/linkedin/pinot/core/realtime/stream/StreamMetadataProvider.java>`_
 #. FooMessageDecoder implements `StreamMessageDecoder <https://github.com/linkedin/pinot/blob/master/pinot-core/src/main/java/com/linkedin/pinot/core/realtime/stream/StreamMessageDecoder.java>`_
 
-Depending on stream level or partition level, your implementation needs to include StreamLevelConsumer or PartitionLevelConsumer. 
+Depending on stream level or partition level, your implementation needs to include StreamLevelConsumer or PartitionLevelConsumer.
 
 
 The properties for the stream implementation are to be set in the table configuration, inside `streamConfigs <https://github.com/linkedin/pinot/blob/master/pinot-core/src/main/java/com/linkedin/pinot/core/realtime/stream/StreamConfig.java>`_ section.
@@ -95,7 +95,7 @@ All values should be strings. For example:
 
   "streamType" : "foo",
   "stream.foo.topic.name" : "SomeTopic",
-  "stream.foo.consumer.type": "lowlevel", 
+  "stream.foo.consumer.type": "lowlevel",
   "stream.foo.consumer.factory.class.name": "fully.qualified.pkg.ConsumerFactoryClassName",
   "stream.foo.consumer.prop.auto.offset.reset": "largest",
   "stream.foo.decoder.class.name" : "fully.qualified.pkg.DecoderClassName",
diff --git a/docs/znode_layout.png b/docs/znode_layout.png
new file mode 100644
index 0000000..81552fc
Binary files /dev/null and b/docs/znode_layout.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org