You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "kfaraz (via GitHub)" <gi...@apache.org> on 2023/05/31 02:14:24 UTC
[GitHub] [druid] kfaraz commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

kfaraz commented on code in PR #14356:
URL: https://github.com/apache/druid/pull/14356#discussion_r1210994575


##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -284,7 +284,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is specified, default param
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values.| no (default = same as `indexSpec`)|
 |`reportParseExceptions`|Boolean|If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped.|no (default == false)|
 |`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read Kinesis messages that are no longer available.<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.|no (default == false)|
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read Kinesis messages that are no longer available.<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being *DROPPED* (if `useEarliestSequenceNumber` is false) or *DUPLICATED* (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.|no (default == false)|

Review Comment:
   Could you please indicate what has changed in this line? The diff doesn't show it clearly. The only change I could find is `_DROPPED_` to `*DROPPED*` which is equivalent markdown afaik. If that is the case, best revert this change as it affects neither readability nor the rendered HTML.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -352,14 +353,14 @@ look for credentials set in environment variables, via [Web Identity Token](http
 profile provider (in this order).
 
 To ingest data from Kinesis, ensure that the policy attached to your IAM role contains the necessary permissions.
-The permissions needed depend on the value of `useListShards`.
+The required permissions depend on the value of `useListShards`.
 
 If the `useListShards` flag is set to `true`, you need following permissions:
 
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `ListShards` : required to get the shards for a stream of interest
+- `ListStreams`: required to list your data streams

Review Comment:
   Both `*` and `-` can be used for lists. We can probably skip this change. Other places in the docs might be using `*` for lists too, so I don't think consistency is a real requirement here.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -380,14 +381,14 @@ If the `useListShards` flag is set to `true`, you need following permissions:
 
 If the `useListShards` flag is set to `false`, you need following permissions:
 
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `DescribeStream`: required to describe the specified data stream
+- `ListStreams`: required to list your data streams
+- `Get*`: required for `GetShardIterator`
+- `GetRecords`: required to get data records from a data stream's shard
+- `DescribeStream`: required to describe the specified data stream
 
 **Example policy**
 
-```    
+```

Review Comment:
   Use ` ```json ` maybe?



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -302,7 +302,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is specified, default param
 |`maxParseExceptions`|Integer|The maximum number of parse exceptions that can occur before the task halts ingestion and fails. Overridden if `reportParseExceptions` is set.|no, unlimited default|
 |`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid can keep track of the most recent parse exceptions. "maxSavedParseExceptions" limits how many exception instances will be saved. These saved exceptions will be made available after the task finishes in the [task completion report](../../ingestion/tasks.md#task-reports). Overridden if `reportParseExceptions` is set.|no, default == 0|
 |`maxRecordsPerPoll`|Integer|The maximum number of records/events to be fetched from buffer per poll. The actual maximum will be `Max(maxRecordsPerPoll, Max(bufferSize, 1))`|no (see [Determining fetch settings](#determining-fetch-settings) for defaults)|
-|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or merged, the supervisor will recompute shard -> task group mappings, and signal any running tasks created under the old mappings to stop early at (current time + `repartitionTransitionDuration`). Stopping the tasks early allows Druid to begin reading from the new shards more quickly. The repartition transition wait time controlled by this property gives the stream additional time to write records to the new shards after the split/merge, which helps avoid the issues with empty shard handling described at https://github.com/apache/druid/issues/7600.|no, (default == PT2M)|
+|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or merged, the supervisor will recompute shard -> task group mappings, and signal any running tasks created under the old mappings to stop early at (current time + `repartitionTransitionDuration`). Stopping the tasks early allows Druid to begin reading from the new shards more quickly. The repartition transition wait time controlled by this property gives the stream additional time to write records to the new shards after the split/merge, which helps avoid the issues with empty shard handling described at <https://github.com/apache/druid/issues/7600>.|no, (default == PT2M)|

Review Comment:
   ```suggestion
   |`repartitionTransitionDuration`|ISO8601 Period|When shards are split or merged, the supervisor will recompute shard -> task group mappings, and signal any running tasks created under the old mappings to stop early at (current time + `repartitionTransitionDuration`). Stopping the tasks early allows Druid to begin reading from the new shards more quickly. The repartition transition wait time controlled by this property gives the stream additional time to write records to the new shards after the split/merge, which helps avoid the issues with empty shard handling described [in this issue](https://github.com/apache/druid/issues/7600).|no, (default == PT2M)|
   ```
   
   Maybe use some text which is hyperlinked to the url.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -477,25 +479,25 @@ it will just ensure that no indexing tasks are running until the supervisor is r
 
 ### Resetting Supervisors
 
-The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears stored 
-sequence numbers, causing the supervisor to start reading from either the earliest or 
-latest sequence numbers in Kinesis (depending on the value of `useEarliestSequenceNumber`). 
-After clearing stored sequence numbers, the supervisor kills and recreates active tasks, 
-so that tasks begin reading from valid sequence numbers. 
+The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears stored

Review Comment:
   I would suggest leaving out this change as it is only removing trailing spaces. It would be better to do this whenever we update the content in this section.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -635,10 +639,11 @@ To enable this feature, set `deaggregate` to true in your `ioConfig` when submit
 
 When changing the shard count for a Kinesis stream, there will be a window of time around the resharding operation with early shutdown of Kinesis ingestion tasks and possible task failures.
 
-The early shutdowns and task failures are expected, and they occur because the supervisor will update the shard -> task group mappings as shards are closed and fully read, to ensure that tasks are not running 
-with an assignment of closed shards that have been fully read and to ensure a balanced distribution of active shards across tasks. 
+The early shutdowns and task failures are expected, and they occur because the supervisor will update the shard -> task group mappings as shards are closed and fully read, to ensure that tasks are not running

Review Comment:
   Same comment about trailing spaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org