You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/07/28 07:10:39 UTC

[GitHub] [druid] petermarshallio opened a new pull request #11506: Docs - note when partitioning using concatenated dimensions

petermarshallio opened a new pull request #11506:
URL: https://github.com/apache/druid/pull/11506


   OTBO community Slack https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400
   - Note of caution when doing `single_dim` with concatenated dimensions.
   - Links to subheadings for each partitioning type
   - Slight wording changes
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] been tested in a test Druid cluster.
   
   @techdocsmith


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r680269754



##########
File path: docs/ingestion/native-batch.md
##########
@@ -366,8 +366,15 @@ Druid currently supports only one partition function.
 #### Single-dimension range partitioning
 
 > Single dimension range partitioning is currently not supported in the sequential mode of the Parallel task.
+
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
+> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
+
+> While it is technically possible to concatenate multiple dimensions into a single new dimension
+> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
+> in order for segment pruning to be effective.

Review comment:
       ```suggestion
   > It is possible to concatenate multiple dimensions into a single new dimension to use for the  `partitionDimension`. To take advantage of the performance improvements of segment pruning in this case, filter on the newly concatenated dimension at query time.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r688035224



##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  While it is possible to concatenate values from multiple
+> dimensions into a single new dimension that you then opt to specify in `partitionDimension`, remember that you
+> must use that new `partitionDimension` dimension in your
+> [native filter](https://druid.apache.org/docs/latest/querying/filters.html) /

Review comment:
       ```suggestion
   > [native filter](../querying/filters.md) /
   ```
   Use the relative path to the `.md` for links

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  While it is possible to concatenate values from multiple

Review comment:
       ```suggestion
   > the `partitionDimension` at query time.  You can concatenate values from multiple
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  While it is possible to concatenate values from multiple
+> dimensions into a single new dimension that you then opt to specify in `partitionDimension`, remember that you

Review comment:
       ```suggestion
   > dimensions into a new dimension to use as the `partitionDimension`. In this case, you
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  While it is possible to concatenate values from multiple
+> dimensions into a single new dimension that you then opt to specify in `partitionDimension`, remember that you
+> must use that new `partitionDimension` dimension in your
+> [native filter](https://druid.apache.org/docs/latest/querying/filters.html) /
+> [WHERE clause](https://druid.apache.org/docs/latest/querying/sql.html#where).

Review comment:
       ```suggestion
   > [WHERE clause](../querying/sql.md#where).
   ```
   Is this link necessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r698651218



##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+When you use this technique to partition your data, segment sizes may be unequally distributed if the data
+in your `partitionDimension` is also unequally distributed.  Therefore, to avoid imbalance in data layout, 
+ review the distribution of values in your source data before deciding on a partitioning strategy.
+
+For segment pruning to be effective and translate into better query performance, you must use
+the `partitionDimension` at query time.  You can concatenate values from multiple
+dimensions into a new dimension to use as the `partitionDimension`. In this case, you
+must use that new dimension in your native filter `WHERE` clause.
+> [native filter](../querying/filters.md) /

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] petermarshallio commented on pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

petermarshallio commented on pull request #11506:
URL: https://github.com/apache/druid/pull/11506#issuecomment-895963406


   @techdocsmith I have merged our proposals – switched up the grammar and added some links, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] petermarshallio closed pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

petermarshallio closed pull request #11506:
URL: https://github.com/apache/druid/pull/11506


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r698651218



##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+When you use this technique to partition your data, segment sizes may be unequally distributed if the data
+in your `partitionDimension` is also unequally distributed.  Therefore, to avoid imbalance in data layout, 
+ review the distribution of values in your source data before deciding on a partitioning strategy.
+
+For segment pruning to be effective and translate into better query performance, you must use
+the `partitionDimension` at query time.  You can concatenate values from multiple
+dimensions into a new dimension to use as the `partitionDimension`. In this case, you
+must use that new dimension in your native filter `WHERE` clause.
+> [native filter](../querying/filters.md) /

Review comment:
       ```suggestion
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+When you use this technique to partition your data, segment sizes may be unequally distributed if the data
+in your `partitionDimension` is also unequally distributed.  Therefore, to avoid imbalance in data layout, 
+ review the distribution of values in your source data before deciding on a partitioning strategy.
+
+For segment pruning to be effective and translate into better query performance, you must use
+the `partitionDimension` at query time.  You can concatenate values from multiple
+dimensions into a new dimension to use as the `partitionDimension`. In this case, you
+must use that new dimension in your native filter `WHERE` clause.
+> [native filter](../querying/filters.md) /
+> [WHERE clause](../querying/sql.md#where).

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith merged pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith merged pull request #11506:
URL: https://github.com/apache/druid/pull/11506


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] petermarshallio closed pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

petermarshallio closed pull request #11506:
URL: https://github.com/apache/druid/pull/11506


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith merged pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith merged pull request #11506:
URL: https://github.com/apache/druid/pull/11506


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r698651402



##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+When you use this technique to partition your data, segment sizes may be unequally distributed if the data
+in your `partitionDimension` is also unequally distributed.  Therefore, to avoid imbalance in data layout, 
+ review the distribution of values in your source data before deciding on a partitioning strategy.
+
+For segment pruning to be effective and translate into better query performance, you must use
+the `partitionDimension` at query time.  You can concatenate values from multiple
+dimensions into a new dimension to use as the `partitionDimension`. In this case, you
+must use that new dimension in your native filter `WHERE` clause.
+> [native filter](../querying/filters.md) /
+> [WHERE clause](../querying/sql.md#where).

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r680269609



##########
File path: docs/ingestion/native-batch.md
##########
@@ -366,8 +366,15 @@ Druid currently supports only one partition function.
 #### Single-dimension range partitioning
 
 > Single dimension range partitioning is currently not supported in the sequential mode of the Parallel task.
+
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
+> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.

Review comment:
       ```suggestion
   If you use this technique to partition your data, you can wind up with varying segment sizes if the values for `partitionDimension` from your original data are unequally distributed.
   ```
   @petermarshallio , is there a recommended action in this case? Select another dimension? Can autocompaction help?

##########
File path: docs/ingestion/native-batch.md
##########
@@ -366,8 +366,15 @@ Druid currently supports only one partition function.
 #### Single-dimension range partitioning
 
 > Single dimension range partitioning is currently not supported in the sequential mode of the Parallel task.
+
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
+> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
+
+> While it is technically possible to concatenate multiple dimensions into a single new dimension

Review comment:
       ```suggestion
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -366,8 +366,15 @@ Druid currently supports only one partition function.
 #### Single-dimension range partitioning
 
 > Single dimension range partitioning is currently not supported in the sequential mode of the Parallel task.
+
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
+> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
+
+> While it is technically possible to concatenate multiple dimensions into a single new dimension
+> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
+> in order for segment pruning to be effective.

Review comment:
       ```suggestion
   > It is possible to concatenate multiple dimensions into a single new dimension to use for the  `partitionDimension`. To take advantage of the performance improvements of segment pruning in this case, you must use the newly concatenated dimension at query time.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -366,8 +366,15 @@ Druid currently supports only one partition function.
 #### Single-dimension range partitioning
 
 > Single dimension range partitioning is currently not supported in the sequential mode of the Parallel task.
+
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
+> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
+
+> While it is technically possible to concatenate multiple dimensions into a single new dimension
+> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] techdocsmith commented on a change in pull request #11506: Docs - note when partitioning using concatenated dimensions

Posted by GitBox <gi...@apache.org>.

techdocsmith commented on a change in pull request #11506:
URL: https://github.com/apache/druid/pull/11506#discussion_r694382528



##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and

Review comment:
       ```suggestion
   in your `partitionDimension` is also unequally distributed.  Therefore, to avoid imbalance in data layout, 
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  You can concatenate values from multiple

Review comment:
       ```suggestion
   the `partitionDimension` at query time.  You can concatenate values from multiple
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  You can concatenate values from multiple
+> dimensions into a new dimension to use as the `partitionDimension`. In this case, you

Review comment:
       ```suggestion
   dimensions into a new dimension to use as the `partitionDimension`. In this case, you
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use
+> the `partitionDimension` at query time.  You can concatenate values from multiple
+> dimensions into a new dimension to use as the `partitionDimension`. In this case, you
+> must use that new `partitionDimension` dimension in your

Review comment:
       ```suggestion
   must use that new dimension in your native filter `WHERE` clause.
   ```
   I'm not convinced links are doing much work here. (maybe to filters).

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.

Review comment:
       ```suggestion
    review the distribution of values in your source data before deciding on a partitioning strategy.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data

Review comment:
       ```suggestion
   When you use this technique to partition your data, segment sizes may be unequally distributed if the data
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -369,11 +369,16 @@ Druid currently supports only one partition function.
 
 The Parallel task will use one subtask when you set `maxNumConcurrentSubTasks` to 1.
 
-> Be aware that, with this technique, segment sizes could be skewed if your chosen `partitionDimension` is also skewed in source data.
-
-> While it is technically possible to concatenate multiple dimensions into a single new dimension
-> that you go on to specify in `partitionDimension`, remember that you _must_ then use this newly concatenated dimension at query time
-> in order for segment pruning to be effective.
+> When using this technique to partition your data, segment sizes may be unequally distributed if the data
+> in your `partitionDimension` is also unequally distributed.  Therefore, avoid imbalance in data layout and
+> review the distribution of values in your source data before deciding on a partitioning strategy.
+
+> In order for segment pruning to be effective and translate into better query performance, you _must_ use

Review comment:
       ```suggestion
   For segment pruning to be effective and translate into better query performance, you must use
   ```
   we don't use ital for emphasis. "must" is enough emphasis




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org