You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/01/22 19:28:31 UTC

[GitHub] [druid] jihoonson commented on a change in pull request #10788: suggest index parallel for native batch reindexing > 1GB

jihoonson commented on a change in pull request #10788:
URL: https://github.com/apache/druid/pull/10788#discussion_r562859292



##########
File path: docs/ingestion/data-management.md
##########
@@ -232,11 +232,7 @@ There are other types of `inputSpec` to enable reindexing and delta ingestion.
 
 ### Reindexing with Native Batch Ingestion
 
-This section assumes the reader understands how to do batch ingestion without Hadoop using [native batch indexing](../ingestion/native-batch.md),
-which uses an `inputSource` to know where and how to read the input data. The [`DruidInputSource`](native-batch.md#druid-input-source)
-can be used to read data from segments inside Druid. Note that IndexTask is to be used for prototyping purposes only as
-it has to do all processing inside a single process and can't scale. Please use Hadoop batch ingestion for production
-scenarios dealing with more than 1GB of data.
+This section assumes you understand how to do batch ingestion without Hadoop using [native batch indexing](../ingestion/native-batch.md). Native batch indexing uses an `inputSource` to know where and how to read the input data. You can use the [`DruidInputSource`](native-batch.md#druid-input-source) to read data from segments inside Druid. Use the Index task (`index`) for prototyping purposes because it relies on a single process and can't scale. Use Parallel task (`index_parallel`) to ingest more than 1GB of data.

Review comment:
       `index_parallel` behaves almost the same as `index` when `maxNumConcurrentSubTasks` is 1. So, I think we can suggest to always use `index_parallel`, but change `maxNumConcurrentSubTasks` depending on data size.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org