You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by GitBox <gi...@apache.org> on 2018/11/28 21:27:05 UTC

[GitHub] snleee commented on a change in pull request #3563: Re-org documentation

snleee commented on a change in pull request #3563: Re-org documentation
URL: https://github.com/apache/incubator-pinot/pull/3563#discussion_r237265111
 
 

 ##########
 File path: docs/pinot_hadoop.rst
 ##########
 @@ -1,39 +1,43 @@
-Creating Pinot segments in Hadoop
-=================================
+Creating Pinot segments
+=======================
 
-Pinot index files can be created offline on Hadoop, then pushed onto a production cluster. Because index generation does not happen on the Pinot nodes serving traffic, this means that these nodes can continue to serve traffic without impacting performance while data is being indexed. The index files are then pushed onto the Pinot cluster, where the files are distributed and loaded by the server nodes with minimal performance impact.
+Pinot segments can be created offline on Hadoop, or via command line from data files. Controller REST endpoint
+can then be used to add the segment to the table to which the segment belongs.
+
+Creating segments using hadoop
+------------------------------
 
 .. figure:: Pinot-Offline-only-flow.png
 
   Offline Pinot workflow
 
-To create index files offline  a Hadoop workflow can be created to complete the following steps:
+To create Pinot segments on Hadoop, a workflow can be created to complete the following steps:
 
-1. Pre-aggregate, clean up and prepare the data, writing it as Avro format files in a single HDFS directory
-2. Create the index files
-3. Upload the index files to the Pinot cluster
+#. Pre-aggregate, clean up and prepare the data, writing it as Avro format files in a single HDFS directory
 
 Review comment:
   I think that our hadoop job supports CSV, JSON, THRIFT format as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org
For additional commands, e-mail: dev-help@pinot.apache.org