You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by jl...@apache.org on 2019/03/19 23:32:30 UTC
[incubator-pinot] 01/01: Add experiment section in getting started

This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch add-doc-for-experiment
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 3d0a897efd9aecbe423dfd3eec41b9169398e2a7
Author: jackjlli <jl...@linkedin.com>
AuthorDate: Tue Mar 19 16:32:07 2019 -0700

    Add experiment section in getting started
---
 docs/getting_started.rst | 156 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
index e3d2416..774d3b3 100644
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
@@ -94,3 +94,159 @@ show up in Pinot.
 To show new events appearing, one can run :sql:`SELECT * FROM meetupRsvp ORDER BY mtime DESC LIMIT 50` repeatedly, which shows the
 last events that were ingested by Pinot.
 
+Experimenting with Pinot
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Now we have a quick start Pinot cluster running locally. The below shows a step-by-step instruction on
+how to add a simple table to the Pinot system, how to upload segments, and how to query it.
+
+Suppose we have a transcript in CSV format containing students' basic info and their scores of each subject.
+
++------------+------------+-----------+-----------+-----------+-----------+
+| studentID  | firstName  | lastName  |   gender  |  subject  |   score   |
++============+============+===========+===========+===========+===========+
+|     200    |     Lucy   |   Smith   |   Female  |   Maths   |    3.8    |
++------------+------------+-----------+-----------+-----------+-----------+
+|     200    |     Lucy   |   Smith   |   Female  |  English  |    3.5    |
++------------+------------+-----------+-----------+-----------+-----------+
+|     201    |     Bob    |    King   |    Male   |   Maths   |    3.2    |
++------------+------------+-----------+-----------+-----------+-----------+
+|     202    |     Nick   |   Young   |    Male   |  Physics  |    3.6    |
++------------+------------+-----------+-----------+-----------+-----------+
+
+Firstly in order to set up a table, we need to specify the schema of this transcript.
+
+.. code-block:: none
+
+  {
+    "schemaName": "transcript",
+    "dimensionFieldSpecs": [
+      {
+        "name": "studentID",
+        "dataType": "STRING"
+      },
+      {
+        "name": "firstName",
+        "dataType": "STRING"
+      },
+      {
+        "name": "lastName",
+        "dataType": "STRING"
+      },
+      {
+        "name": "gender",
+        "dataType": "STRING"
+      },
+      {
+        "name": "subject",
+        "dataType": "STRING"
+      }
+    ],
+    "metricFieldSpecs": [
+      {
+        "name": "score",
+        "dataType": "FLOAT"
+      }
+    ]
+  }
+
+To upload the schema, we can use the command below:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh AddSchema -schemaFile /Users/jlli/transcript-schema.json -exec
+  Executing command: AddSchema -controllerHost 172.25.119.20 -controllerPort 9000 -schemaFilePath /Users/jlli/transcript-schema.json -exec
+  Sending request: http://172.25.119.20:9000/schemas to controller: jlli-mn2.linkedin.biz, version: 0.1.0-SNAPSHOT-2c5d42a908213122ab0ad8b7ac9524fcf390e4cb
+
+Then, we need to specify the table config which links the schema to this table:
+
+.. code-block:: none
+
+  {
+    "tableName": "transcript",
+    "segmentsConfig" : {
+      "replication" : "1",
+      "schemaName" : "transcript",
+      "segmentAssignmentStrategy" : "BalanceNumSegmentAssignmentStrategy"
+    },
+    "tenants" : {
+      "broker":"DefaultTenant",
+      "server":"DefaultTenant"
+    },
+    "tableIndexConfig" : {
+      "invertedIndexColumns" : [],
+      "loadMode"  : "HEAP",
+      "lazyLoad"  : "false"
+    },
+    "tableType":"OFFLINE",
+    "metadata": {}
+  }
+
+And upload the table config to Pinot cluster:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh AddTable -filePath /Users/jlli/transcript-table-config.json -exec
+  Executing command: AddTable -filePath /Users/jlli/transcript-table-config.json -controllerHost 172.25.119.20 -controllerPort 9000 -exec
+  {"status":"Table transcript_OFFLINE successfully added"}
+
+In order to upload our data to Pinot cluster, we need to convert our CSV file to Pinot Segment:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh CreateSegment -dataDir /Users/jlli/Desktop/test/ -format CSV -outDir /Users/jlli/Desktop/test2/ -tableName transcript -segmentName transcript_0 -overwrite -schemaFile /Users/jlli/transcript-schema.json
+  Executing command: CreateSegment  -generatorConfigFile null -dataDir /Users/jlli/Desktop/test/ -format CSV -outDir /Users/jlli/Desktop/test2/ -overwrite true -tableName transcript -segmentName transcript_0 -timeColumnName null -schemaFile /Users/jlli/transcript-schema.json -readerConfigFile null -enableStarTreeIndex false -starTreeIndexSpecFile null -hllSize 9 -hllColumns null -hllSuffix _hll -numThreads 1
+  Accepted files: [/Users/jlli/Desktop/test/Transcript.csv]
+  Finished building StatsCollector!
+  Collected stats for 4 documents
+  Created dictionary for STRING column: studentID with cardinality: 1, max length in bytes: 4, range: null to null
+  Created dictionary for STRING column: firstName with cardinality: 3, max length in bytes: 4, range: Bob to Nick
+  Created dictionary for STRING column: lastName with cardinality: 3, max length in bytes: 5, range: King to Young
+  Created dictionary for FLOAT column: score with cardinality: 4, range: 3.2 to 3.8
+  Created dictionary for STRING column: gender with cardinality: 2, max length in bytes: 6, range: Female to Male
+  Created dictionary for STRING column: subject with cardinality: 3, max length in bytes: 7, range: English to Physics
+  Start building IndexCreator!
+  Finished records indexing in IndexCreator!
+  Finished segment seal!
+  Converting segment: /Users/jlli/Desktop/test2/transcript_0_0 to v3 format
+  v3 segment location for segment: transcript_0_0 is /Users/jlli/Desktop/test2/transcript_0_0/v3
+  Deleting files in v1 segment directory: /Users/jlli/Desktop/test2/transcript_0_0
+  Driver, record read time : 1
+  Driver, stats collector time : 0
+  Driver, indexing time : 0
+
+Once we have the Pinot segment, we can upload this segment to our cluster:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh UploadSegment -segmentDir /Users/jlli/Desktop/test2/
+  Executing command: UploadSegment -controllerHost 172.25.119.20 -controllerPort 9000 -segmentDir /Users/jlli/Desktop/test2/
+  Compressing segment transcript_0_0
+  Uploading segment transcript_0_0.tar.gz
+  Sending request: http://172.25.119.20:9000/v2/segments to controller: jlli-mn2.linkedin.biz, version: 0.1.0-SNAPSHOT-2c5d42a908213122ab0ad8b7ac9524fcf390e4cb
+
+You made it! Now we can query the data in Pinot:
+
+To get all the number of rows in the table:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -brokerPort 8000 -query "select count(*) from transcript"
+  Executing command: PostQuery -brokerHost 172.25.119.20 -brokerPort 8000 -query select count(*) from transcript
+  Result: {"aggregationResults":[{"function":"count_star","value":"4"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":4,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":7,"segmentStatistics":[],"traceInfo":{}}
+
+To get the average score of subject Maths:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from transcript where subject = \"Maths\""
+  Executing command: PostQuery -brokerHost 172.25.119.20 -brokerPort 8000 -query select avg(score) from transcript where subject = "Maths"
+  Result: {"aggregationResults":[{"function":"avg_score","value":"3.50000"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":2,"numEntriesScannedInFilter":4,"numEntriesScannedPostFilter":2,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":33,"segmentStatistics":[],"traceInfo":{}}
+
+To get the average score for Lucy Smith:
+
+.. code-block:: none
+
+  $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from transcript where firstName = \"Lucy\" and lastName = \"Smith\""
+  Executing command: PostQuery -brokerHost 172.25.119.20 -brokerPort 8000 -query select avg(score) from transcript where firstName = "Lucy" and lastName = "Smith"
+  Result: {"aggregationResults":[{"function":"avg_score","value":"3.65000"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":2,"numEntriesScannedInFilter":6,"numEntriesScannedPostFilter":2,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":67,"segmentStatistics":[],"traceInfo":{}}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org