You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "huxihx (JIRA)" <ji...@apache.org> on 2018/02/02 03:42:00 UTC

[jira] [Commented] (KAFKA-6425) Calculating cleanBytes in LogToClean might not be correct

    [ https://issues.apache.org/jira/browse/KAFKA-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349734#comment-16349734 ] 

huxihx commented on KAFKA-6425:
-------------------------------

Hi all, any updates for this jira?

> Calculating cleanBytes in LogToClean might not be correct
> ---------------------------------------------------------
>
>                 Key: KAFKA-6425
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6425
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.0.0
>            Reporter: huxihx
>            Priority: Major
>
> In class `LogToClean`, the calculation for `cleanBytes` is as below:
> {code:java}
> val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size.toLong).sum
> {code}
> Most of the time, the `firstDirtyOffset` is the base offset of active segment which works pretty well with log.logSegments, so we can calculate the cleanBytes by safely summing up the sizes of all log segments whose base offset is less than `firstDirtyOffset`.
> However, things changed after `firstUnstableOffset` was introduced. Users could indirectly change this offset to a non-base offset(changing log start offset for instance). In this case, it's not correct to sum up the total size for a log segment. Instead, we should only sum up the bytes between the base offset and `firstUnstableOffset`.
> Let me show an example:
> Say I have three log segments, shown as below:
> 0L       -->  log segment1, size: 1000Bytes
> 1234L -->  log segment2, size: 1000Bytes
> 4567L --> active log segment, current size: 500Bytes
> Based on the current code, if `firstUnstableOffset` is deliberately set to 2000L(this could be possible, since it's lower bounded by the log start offset and user could explicitly change LSO), then `cleanBytes` is calculated as 2000Bytes which is wrong. The expected value should be 1000 + (bytes between offset 1234L and 2000L) 
> [~junrao] [~ijuma] Do all of these make sense?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)