You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/05 18:37:28 UTC

[GitHub] [druid] loquisgon commented on a change in pull request #11541: Docs Ingestion page refactor

loquisgon commented on a change in pull request #11541:
URL: https://github.com/apache/druid/pull/11541#discussion_r683699109



##########
File path: docs/ingestion/index.md
##########
@@ -22,33 +22,24 @@ title: "Ingestion"
   ~ under the License.
   -->
 
-All data in Druid is organized into _segments_, which are data files each of which may have up to a few million rows.
-Loading data in Druid is called _ingestion_ or _indexing_, and consists of reading data from a source system and creating
-segments based on that data.
+Loading data in Druid is called _ingestion_ or _indexing_. When you ingest data into Druid, Druid reads the data from your source system and stores it in data files called _segments_. In general, segment files contain a few million rows.
 
-In most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes
-(or the [Indexer](../design/indexer.md) processes) load your source data. One exception is
-Hadoop-based ingestion, where this work is instead done using a Hadoop MapReduce job on YARN (although MiddleManager or Indexer
-processes are still involved in starting and monitoring the Hadoop jobs). 
+For most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes or the [Indexer](../design/indexer.md) processes load your source data. One exception is
+Hadoop-based ingestion, which uses a Hadoop MapReduce job on YARN MiddleManager or Indexer processes to start and monitor Hadoop jobs. 
 
-Once segments have been generated and stored in [deep storage](../dependencies/deep-storage.md), they are loaded by Historical processes. 
-For more details on how this works, see the [Storage design](../design/architecture.md#storage-design) section 
-of Druid's design documentation.
+After Druid creates segments and stores them in [deep storage](../dependencies/deep-storage.md), Historical processes load them to respond to queries. See the [Storage design](../design/architecture.md#storage-design) section of the Druid design documentation for more information.

Review comment:
       We already said in first paragraph that Druid creates segments. Now we are are saying that they get stored in a special phase. I would rephrase this as 
   
   "Segments created by the ingestion process get stored in [deep storage...] which in turn are loaded in Historical nodes by Historical processes in order to respond to Historical queries" 
   
   At some point the distinction has to be made between queries served by historical processes and those served by real time (i.e. middle manager/indexer) processes. BTW the latter only happens for streaming ingestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org