You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "José Armando García Sancio (Jira)" <ji...@apache.org> on 2023/04/24 21:16:00 UTC

[jira] [Updated] (KAFKA-14932) Heuristic for increasing the log start offset after replicas are caught up

     [ https://issues.apache.org/jira/browse/KAFKA-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

José Armando García Sancio updated KAFKA-14932:
-----------------------------------------------
    Description: 
The implementation in [https://github.com/apache/kafka/pull/9816] increases the log start offset as soon as a snapshot is created that is greater than the log start offset. This is correct but causes some inefficiency in some cases.
 # Any follower, voters or observers, with an end offset between the leader's log start offset and the leader's latest snapshot will get invalidated. This will cause those follower to fetch the new snapshot and reload it's state machine.
 # Any {{Listener}} or state machine that has a {{nextExpectedOffset()}} less than the latest snapshot will get invalidated. This will cause the state machine to have to reload its state from the latest snapshot.

To minimize the frequency of these reloads KIP-630 proposes adding the following configuration:
 * {{metadata.start.offset.lag.time.max.ms}} - The maximum amount of time that leader will wait for an offset to get replicated to all of the live replicas before advancing the {{{}LogStartOffset{}}}. See section “When to Increase the LogStartOffset”. The default is 7 days.

This description and implementation should be extended to also apply to the state machine, or {{{}Listener{}}}. The local log start offset should be increased when all of the {{{}ListenerContext{}}}'s {{nextExpectedOffset()}} is greater than the offset of the latest snapshot.

I should point that this logic is slightly different when the replica is a leader vs of a follower.
 # Leader should only advance the log start offset if:
 ## All of followers fetched past a snapshot
 ## All of the Listener have read past a snapshot
 ## Or there is a timeout
 # While followers should only advance the log start offset to the leader log start offset:
 ## If there is a local snapshot greater that the leader's log start offset
 ## All of the Listener have read past the leader's log start offset

Another requirement is that the log start offset must always be zero or equal to the end offset of another snapshot. This is needed so that the raft client can know the epoch of the offset prior to the log start offset. In practice this mean that the topic partition log will have 2 snapshots when log start offset is greater than 0.

This can be implemented by changing {{ReplicatedLog::startOffset}} to:
{code:java}
OffsetAndEpoch startOffsetAndEpoch(); {code}

  was:
The implementation in [https://github.com/apache/kafka/pull/9816] increases the log start offset as soon as a snapshot is created that is greater than the log start offset. This is correct but causes some inefficiency in some cases.
 # Any follower, voters or observers, with an end offset between the leader's log start offset and the leader's latest snapshot will get invalidated. This will cause those follower to fetch the new snapshot and reload it's state machine.
 # Any {{Listener}} or state machine that has a {{nextExpectedOffset()}} less than the latest snapshot will get invalidated. This will cause the state machine to have to reload its state from the latest snapshot.

To minimize the frequency of these reloads KIP-630 proposes adding the following configuration:
 * {{metadata.start.offset.lag.time.max.ms}} - The maximum amount of time that leader will wait for an offset to get replicated to all of the live replicas before advancing the {{{}LogStartOffset{}}}. See section “When to Increase the LogStartOffset”. The default is 7 days.

This description and implementation should be extended to also apply to the state machine, or {{{}Listener{}}}. The local log start offset should be increased when all of the {{{}ListenerContext{}}}'s {{nextExpectedOffset()}} is greater than the offset of the latest snapshot.


> Heuristic for increasing the log start offset after replicas are caught up
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-14932
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14932
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: José Armando García Sancio
>            Assignee: José Armando García Sancio
>            Priority: Major
>
> The implementation in [https://github.com/apache/kafka/pull/9816] increases the log start offset as soon as a snapshot is created that is greater than the log start offset. This is correct but causes some inefficiency in some cases.
>  # Any follower, voters or observers, with an end offset between the leader's log start offset and the leader's latest snapshot will get invalidated. This will cause those follower to fetch the new snapshot and reload it's state machine.
>  # Any {{Listener}} or state machine that has a {{nextExpectedOffset()}} less than the latest snapshot will get invalidated. This will cause the state machine to have to reload its state from the latest snapshot.
> To minimize the frequency of these reloads KIP-630 proposes adding the following configuration:
>  * {{metadata.start.offset.lag.time.max.ms}} - The maximum amount of time that leader will wait for an offset to get replicated to all of the live replicas before advancing the {{{}LogStartOffset{}}}. See section “When to Increase the LogStartOffset”. The default is 7 days.
> This description and implementation should be extended to also apply to the state machine, or {{{}Listener{}}}. The local log start offset should be increased when all of the {{{}ListenerContext{}}}'s {{nextExpectedOffset()}} is greater than the offset of the latest snapshot.
> I should point that this logic is slightly different when the replica is a leader vs of a follower.
>  # Leader should only advance the log start offset if:
>  ## All of followers fetched past a snapshot
>  ## All of the Listener have read past a snapshot
>  ## Or there is a timeout
>  # While followers should only advance the log start offset to the leader log start offset:
>  ## If there is a local snapshot greater that the leader's log start offset
>  ## All of the Listener have read past the leader's log start offset
> Another requirement is that the log start offset must always be zero or equal to the end offset of another snapshot. This is needed so that the raft client can know the epoch of the offset prior to the log start offset. In practice this mean that the topic partition log will have 2 snapshots when log start offset is greater than 0.
> This can be implemented by changing {{ReplicatedLog::startOffset}} to:
> {code:java}
> OffsetAndEpoch startOffsetAndEpoch(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)