You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/05/30 00:51:00 UTC
[jira] [Commented] (KAFKA-8001) Fetch from future replica stalls when local replica becomes a leader

    [ https://issues.apache.org/jira/browse/KAFKA-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851436#comment-16851436 ] 

ASF GitHub Bot commented on KAFKA-8001:
---------------------------------------

soondenana commented on pull request #6839: KAFKA-8001: Move log from replica into partition
URL: https://github.com/apache/kafka/pull/6839
 
 
   A partition object contain one or many replica objects. These replica
   objects in turn can have the "log" if the replica corresponds to the
   local node. All the code in Partition or ReplicaManager peek into
   replica object to fetch the log if they need to operate on that. As
   replica object can represent a local replica or a remote one, this
   lead to a bunch of "if-else" code in log fetch and offset update code.
   
   NOTE: In addition to a "log" that is in use during normal operation, if
   an alter log directory command is issued, we also create a future log
   object. This object catches up with local log and then we switch the log
   directory. So temporarily a Partition can have two local logs. Before
   this change both logs are inside replica objects.
   
   This change is an attempt to untangle this relationship. In particular
   it moves "log" from a replica object to Partition. So a partition contains
   a local log to which all writes go. And it maintains a list of replica
   for offset and "caught up time" data that it uses for replication
   protocol. The replica correspoding to Local node contains a log object,
   but the object is now read only and no code except Replica and test code
   use it. Every other part of code in Partion and ReplicaManger use the
   log object stored in Partition. This uncouples the replica-log relation
   and all the "if-else" code went away. Couple of more structural changes
   are made in this change:
   1. Two subclasses of Replica are introduced: LocalReplica and
   RemoteReplica. This makes it clear what each replica stores and is
   capable of.
   2. The "log" in Partition is also wrapped in a LogInfo wrapper, which
   encapuslates all the code that either operated on "log" or maintained
   state of it.
   
   Unit tests have been updated to take care of change in heirarchy.
   Tested by running multiple brokers and produced and consumed data. Also
   changed log directory back and forth to make sure that alter log
   directory use case works.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Fetch from future replica stalls when local replica becomes a leader
> --------------------------------------------------------------------
>
>                 Key: KAFKA-8001
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8001
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.1.0, 2.1.1
>            Reporter: Anna Povzner
>            Assignee: Vikas Singh
>            Priority: Critical
>
> With KIP-320, fetch from follower / future replica returns FENCED_LEADER_EPOCH if current leader epoch in the request is lower than the leader epoch known to the leader (or local replica in case of future replica fetching). In case of future replica fetching from the local replica, if local replica becomes the leader of the partition, the next fetch from future replica fails with FENCED_LEADER_EPOCH and fetching from future replica is stopped until the next leader change. 
> Proposed solution: on local replica leader change, future replica should "become a follower" again, and go through the truncation phase. Or we could optimize it, and just update partition state of the future replica to reflect the updated current leader epoch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)