You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "ektravel (via GitHub)" <gi...@apache.org> on 2023/05/18 15:35:46 UTC

[GitHub] [druid] ektravel commented on a diff in pull request #14298: Updated default value of maxTotalRows to reflect the value in the code

ektravel commented on code in PR #14298:
URL: https://github.com/apache/druid/pull/14298#discussion_r1197973284


##########
docs/development/extensions-core/kafka-supervisor-reference.md:
##########
@@ -198,7 +198,7 @@ The `tuningConfig` is optional and default parameters will be used if no `tuning
 | `maxRowsInMemory`                 | Integer        | The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size. Maximum heap memory usage for indexing scales with `maxRowsInMemory` * (2 + `maxPendingPersists`). Normally user does not need to set this, but depending on the nature of data, if rows are short in terms of bytes, user may not want to store a million rows in memory and this value should be set.                                                                           | no (default == 1000000)                                                                                      |
 | `maxBytesInMemory`                | Long           | The number of bytes to aggregate in heap memory before persisting. This is based on a rough estimate of memory usage and not actual usage. Normally this is computed internally and user does not need to set it. The maximum heap memory usage for indexing is `maxBytesInMemory` * (2 + `maxPendingPersists`).                                                                                                                                                                                                                                                                                                                                        | no (default == One-sixth of max JVM memory)                                                                  |
 | `maxRowsPerSegment`               | Integer        | The number of rows to aggregate into a segment; this number is post-aggregation rows. Handoff will happen either if `maxRowsPerSegment` or `maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens earlier.                                                                                                                                                                                                                                                                                                                                                                                                                   | no (default == 5000000)                                                                                      |
-| `maxTotalRows`                    | Long           | The number of rows to aggregate across all segments; this number is post-aggregation rows. Handoff will happen either if `maxRowsPerSegment` or `maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens earlier.                                                                                                                                                                                                                                                                                                                                                                                                              | no (default == unlimited)                                                                                    |
+| `maxTotalRows`                    | Long           | The number of rows to aggregate across all segments; this number is post-aggregation rows. Handoff will happen either if `maxRowsPerSegment` or `maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens earlier.                                                                                                                                                                                                                                                                                                                                                                                                              | no (default == 20000000)                                                                                    |

Review Comment:
   ```suggestion
   | `maxTotalRows`                    | Long           | The number of post-aggregation rows to aggregate across all segments. Handoff happens when either `maxRowsPerSegment` or `maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens first.                                                                                                                                                                                                                                                                                                                                                                                                              | no (default == 20000000)                                                                                    |
   ```
   What does "this number is post-aggregation rows" mean in this context? Can we say "The number of post-aggregation rows to aggregate across all segments" instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org