You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by jl...@apache.org on 2019/03/09 00:39:05 UTC
[incubator-pinot] 01/01: Add more documents to docs folder
This is an automated email from the ASF dual-hosted git repository.
jlli pushed a commit to branch add-doc-experiment-with-pinot
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git
commit 81848aaee209bdbad809759c901c1520b1225e09
Author: jackjlli <jl...@linkedin.com>
AuthorDate: Fri Mar 8 16:38:41 2019 -0800
Add more documents to docs folder
---
docs/getting_started.rst | 7 +++++
docs/img/generate-segment.png | Bin 0 -> 218597 bytes
docs/img/list-schemas.png | Bin 0 -> 8952 bytes
docs/img/quey-table.png | Bin 0 -> 35914 bytes
docs/img/upload-segment.png | Bin 0 -> 13944 bytes
docs/management_api.rst | 28 ++++++++++++++---
docs/pinot_hadoop.rst | 68 ++++++++++++++++++++++++++++++++++++++++--
7 files changed, 96 insertions(+), 7 deletions(-)
diff --git a/docs/getting_started.rst b/docs/getting_started.rst
index e3d2416..4b92f96 100644
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
@@ -94,3 +94,10 @@ show up in Pinot.
To show new events appearing, one can run :sql:`SELECT * FROM meetupRsvp ORDER BY mtime DESC LIMIT 50` repeatedly, which shows the
last events that were ingested by Pinot.
+Experiment with Pinot
+~~~~~~~~~~~~~~~~~~~~~
+
+Here's the instruction on how to add a simple table to the pinot system you got working int the previous demo steps,
+how to add segments, and how to query it.
+
+
diff --git a/docs/img/generate-segment.png b/docs/img/generate-segment.png
new file mode 100644
index 0000000..5848781
Binary files /dev/null and b/docs/img/generate-segment.png differ
diff --git a/docs/img/list-schemas.png b/docs/img/list-schemas.png
new file mode 100644
index 0000000..9b00855
Binary files /dev/null and b/docs/img/list-schemas.png differ
diff --git a/docs/img/quey-table.png b/docs/img/quey-table.png
new file mode 100644
index 0000000..5859f2b
Binary files /dev/null and b/docs/img/quey-table.png differ
diff --git a/docs/img/upload-segment.png b/docs/img/upload-segment.png
new file mode 100644
index 0000000..edd348d
Binary files /dev/null and b/docs/img/upload-segment.png differ
diff --git a/docs/management_api.rst b/docs/management_api.rst
index f8483eb..6f9930a 100644
--- a/docs/management_api.rst
+++ b/docs/management_api.rst
@@ -17,11 +17,31 @@
.. under the License.
..
-Managing Pinot via REST API on the Controller
-=============================================
+Managing Pinot
+==============
-*TODO* : Remove this section altogether and find a place somewhere for a pointer to the management API. Maybe in the 'Running pinot in production' section?
+There are two ways to manage Pinot cluster, i.e. using Pinot management console and pinot-admin.sh script.
-There is a REST API which allows management of tables, tenants, segments and schemas. It can be accessed by going to ``http://[controller host]/help`` which offers a web UI to do these tasks, as well as document the REST API.
+Pinot Management Console
+------------------------
+
+There is a REST API which allows management of tables, tenants, segments and schemas. It can be accessed by going to
+``http://[controller host]/help`` which offers a web UI to do these tasks, as well as document the REST API.
+
+For example, list all the schema within Pinot cluster:
+.. figure:: img/list-schemas.png
+
+Upload a pinot segment:
+.. figure:: img/upload-segment.png
+
+
+Pinot-admin.sh
+--------------
It can be used instead of the ``pinot-admin.sh`` commands to automate the creation of tables and tenants.
+
+For example, create a pinot segment:
+.. figure:: img/generate-segment.png
+
+Query a table:
+.. figure:: img/query-table.png
diff --git a/docs/pinot_hadoop.rst b/docs/pinot_hadoop.rst
index 1611040..25f6346 100644
--- a/docs/pinot_hadoop.rst
+++ b/docs/pinot_hadoop.rst
@@ -23,7 +23,8 @@ Creating Pinot segments
=======================
Pinot segments can be created offline on Hadoop, or via command line from data files. Controller REST endpoint
-can then be used to add the segment to the table to which the segment belongs.
+can then be used to add the segment to the table to which the segment belongs. Pinot segments can also be created by
+ingesting data from realtime resources (such as Kafka).
Creating segments using hadoop
------------------------------
@@ -177,8 +178,8 @@ Sample Schema:
"schemaName" : "mySchema",
}
-Pushing segments to Pinot
-^^^^^^^^^^^^^^^^^^^^^^^^^
+Pushing offline segments to Pinot
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can use curl to push a segment to pinot:
@@ -196,3 +197,64 @@ Alternatively you can use the pinot-admin.sh utility to upload one or more segme
The command uploads all the segments found in ``segmentDirectoryPath``.
The segments could be either tar-compressed (in which case it is a file under ``segmentDirectoryPath``)
or uncompressed (in which case it is a directory under ``segmentDirectoryPath``).
+
+Realtime segment generation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To consume in realtime, we simply need to create a table that uses the same schema and points to the Kafka topic to
+consume from, using a table definition such as this one:
+
+.. code-block:: none
+ {
+ "tableName":"flights",
+ "segmentsConfig" : {
+ "retentionTimeUnit":"DAYS",
+ "retentionTimeValue":"7",
+ "segmentPushFrequency":"daily",
+ "segmentPushType":"APPEND",
+ "replication" : "1",
+ "schemaName" : "flights",
+ "timeColumnName" : "daysSinceEpoch",
+ "timeType" : "DAYS",
+ "segmentAssignmentStrategy" : "BalanceNumSegmentAssignmentStrategy"
+ },
+ "tableIndexConfig" : {
+ "invertedIndexColumns" : ["Carrier"],
+ "loadMode" : "HEAP",
+ "lazyLoad" : "false",
+ "streamConfigs": {
+ "streamType": "kafka",
+ "stream.kafka.consumer.type": "highLevel",
+ "stream.kafka.topic.name": "flights-realtime",
+ "stream.kafka.decoder.class.name": "org.apache.pinot.core.realtime.impl.kafka.KafkaJSONMessageDecoder",
+ "stream.kafka.zk.broker.url": "localhost:2181",
+ "stream.kafka.hlc.zk.connect.string": "localhost:2181"
+ }
+ },
+ "tableType":"REALTIME",
+ "tenants" : {
+ "broker":"DefaultTenant_BROKER",
+ "server":"DefaultTenant_SERVER"
+ },
+ "metadata": {
+ }
+ }
+
+First, we'll start a local instance of Kafka and start streaming data into it:
+
+.. code-block:: none
+ bin/pinot-admin.sh StartKafka &
+ bin/pinot-admin.sh StreamAvroIntoKafka -avroFile flights-2014.avro -kafkaTopic flights-realtime &
+
+This will stream one event per second from the Avro file to the Kafka topic. Then, we'll create a realtime table, which
+will start consuming from the Kafka topic.
+
+.. code-block:: none
+ bin/pinot-admin.sh AddTable -filePath flights-definition-realtime.json
+
+We can then query the table and see the events stream in:
+
+.. code-block:: none
+ {"traceInfo":{},"numDocsScanned":17,"aggregationResults":[{"function":"count_star","value":"17"}],"timeUsedMs":27,"segmentStatistics":[],"exceptions":[],"totalDocs":17}
+
+Repeating the query multiple times should show the events slowly being streamed into the table.
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org