You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/05 21:52:56 UTC

[GitHub] [druid] 2bethere commented on a change in pull request #10935: First refactor of compaction

2bethere commented on a change in pull request #10935:
URL: https://github.com/apache/druid/pull/10935#discussion_r588735377



##########
File path: docs/configuration/index.md
##########
@@ -820,24 +820,24 @@ A description of the compaction config is:
 |`taskPriority`|[Priority](../ingestion/tasks.md#priority) of compaction task.|no (default = 25)|
 |`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per compaction task. Since a time chunk must be processed in its entirety, if the segments for a particular time chunk have a total size in bytes greater than this parameter, compaction will not run for that time chunk. Because each compaction task runs with a single thread, setting this value too far above 1–2GB will result in compaction tasks taking an excessive amount of time.|no (default = 419430400)|
 |`maxRowsPerSegment`|Max number of rows per segment after compaction.|no|
-|`skipOffsetFromLatest`|The offset for searching segments to be compacted. Strongly recommended to set for realtime dataSources. |no (default = "P1D")|
+|`skipOffsetFromLatest`|The offset for searching segments to be compacted in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly recommended to set for realtime dataSources. See [Data handling with compaction](../ingestion/compaction.md#data-handling-with-compaction)|no (default = "P1D")|
 |`tuningConfig`|Tuning config for compaction tasks. See below [Compaction Task TuningConfig](#compaction-tuningconfig).|no|
 |`taskContext`|[Task context](../ingestion/tasks.md#context) for compaction tasks.|no|
+|`granularitySpec`|Custom `granularitySpec` to describe the `segmentGranularity` for the compacted segments.|No|
 
 An example of compaction config is:
 
 ```json
 {
-  "dataSource": "wikiticker"
+  "dataSource": "wikiticker",
+  "granularitySpec" : {
+    "segmentGranularity : "none"
+  }
 }
 ```
 
-Note that compaction tasks can fail if their locks are revoked by other tasks of higher priorities.
-Since realtime tasks have a higher priority than compaction task by default,
-it can be problematic if there are frequent conflicts between compaction tasks and realtime tasks.
-If this is the case, the coordinator's automatic compaction might get stuck because of frequent compaction task failures.
-This kind of problem may happen especially in Kafka/Kinesis indexing systems which allow late data arrival.
-If you see this problem, it's recommended to set `skipOffsetFromLatest` to some large enough value to avoid such conflicts between compaction tasks and realtime tasks.
+Compaction tasks fail when higher priority tasks cause Druid to revokes their locks. By default, realtime tasks like ingestion have a higher priority than compaction tasks. Therefore frequent conflicts between compaction tasks and realtime tasks can cause the coordinator's automatic compaction to get stuck.

Review comment:
       Nice! This is way clearer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org