You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/10/05 07:16:28 UTC
[GitHub] [kafka] kowshik commented on pull request #11345: Allow empty last segment to have missing offset index during recovery

kowshik commented on pull request #11345:
URL: https://github.com/apache/kafka/pull/11345#issuecomment-934133766


   @ccding During a call to `log.flush()`, we remember [here](https://github.com/apache/kafka/blob/1d3b96389b325520648d29b6363941f50e5b6d35/core/src/main/scala/kafka/log/UnifiedLog.scala#L1522) the offset upto which the log was flushed. So, a subsequent `flush()` will be a no-op [due to this check](https://github.com/apache/kafka/blob/1d3b96389b325520648d29b6363941f50e5b6d35/core/src/main/scala/kafka/log/UnifiedLog.scala#L1517) unless the `logEndOffset` has advanced. This means that empty active segments shouldn't add additional burden to the `log.flush()` operation during each call, unless new empty active segments are generated in between 2 calls but that's quite uncommon (see [relevant comment](https://github.com/apache/kafka/pull/11345#discussion_r719132048)).
   
   Furthermore, we typically don't configure to flush the log periodically or during appends. Even then, when we call `log.flush()` without passing an offset parameter, the intent was always to flush all data written to the log. It is just a corner case that "all data written to the log" needs to also include an empty active segment for correctness reasons since the `logEndOffset` is derived from it [during recovery](https://github.com/apache/kafka/blob/db1f581da7f3440cfd5be93800b4a9a2d7327a35/core/src/main/scala/kafka/log/LogLoader.scala#L401). Some related details on documentation below:
   
   (1) Periodic flush: https://github.com/apache/kafka/blob/0fe4e24c095f57606a9c33a3135f2f81a1fb0285/core/src/main/scala/kafka/log/LogManager.scala#L1244-L1246
   
   Here, `log.flush()` executes only when the `timeSinceLastFlush >= log.config.flushMs`. The `flush.ms` configuration is documented [here](https://github.com/apache/kafka/blob/a46b82bea9abbd08e550d985f87e79a6d912a9ab/clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java#L58-L63) and it has a default value of `Long.MaxValue` (very high!) defined [here](https://github.com/apache/kafka/blob/85548acafbc799ee371b531802580fcde170ddc1/core/src/main/scala/kafka/server/KafkaConfig.scala#L135). The doc recommends us not to override the config unless needed.
   
   (2) Flush during appends:
   
   https://github.com/apache/kafka/blob/db42afd6e24ef4291390b4d1c1f10758beedefed/core/src/main/scala/kafka/log/UnifiedLog.scala#L933
   
   Here, `flush()` executes only when `unflushedMessages >= config.flushInterval`. Similar explanation to the above. The `flush.messages` configuration is documented [here](https://github.com/apache/kafka/blob/a46b82bea9abbd08e550d985f87e79a6d912a9ab/clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java#L50-L56) and it has a default value of `Long.MaxValue` (very high!) defined [here](https://github.com/apache/kafka/blob/85548acafbc799ee371b531802580fcde170ddc1/core/src/main/scala/kafka/server/KafkaConfig.scala#L133). The doc recommends us not to override the config unless needed.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org