You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/05/30 00:51:00 UTC
[jira] [Commented] (KAFKA-8001) Fetch from future replica stalls
when local replica becomes a leader
[ https://issues.apache.org/jira/browse/KAFKA-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851436#comment-16851436 ]
ASF GitHub Bot commented on KAFKA-8001:
---------------------------------------
soondenana commented on pull request #6839: KAFKA-8001: Move log from replica into partition
URL: https://github.com/apache/kafka/pull/6839
A partition object contain one or many replica objects. These replica
objects in turn can have the "log" if the replica corresponds to the
local node. All the code in Partition or ReplicaManager peek into
replica object to fetch the log if they need to operate on that. As
replica object can represent a local replica or a remote one, this
lead to a bunch of "if-else" code in log fetch and offset update code.
NOTE: In addition to a "log" that is in use during normal operation, if
an alter log directory command is issued, we also create a future log
object. This object catches up with local log and then we switch the log
directory. So temporarily a Partition can have two local logs. Before
this change both logs are inside replica objects.
This change is an attempt to untangle this relationship. In particular
it moves "log" from a replica object to Partition. So a partition contains
a local log to which all writes go. And it maintains a list of replica
for offset and "caught up time" data that it uses for replication
protocol. The replica correspoding to Local node contains a log object,
but the object is now read only and no code except Replica and test code
use it. Every other part of code in Partion and ReplicaManger use the
log object stored in Partition. This uncouples the replica-log relation
and all the "if-else" code went away. Couple of more structural changes
are made in this change:
1. Two subclasses of Replica are introduced: LocalReplica and
RemoteReplica. This makes it clear what each replica stores and is
capable of.
2. The "log" in Partition is also wrapped in a LogInfo wrapper, which
encapuslates all the code that either operated on "log" or maintained
state of it.
Unit tests have been updated to take care of change in heirarchy.
Tested by running multiple brokers and produced and consumed data. Also
changed log directory back and forth to make sure that alter log
directory use case works.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
> Fetch from future replica stalls when local replica becomes a leader
> --------------------------------------------------------------------
>
> Key: KAFKA-8001
> URL: https://issues.apache.org/jira/browse/KAFKA-8001
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.1.0, 2.1.1
> Reporter: Anna Povzner
> Assignee: Vikas Singh
> Priority: Critical
>
> With KIP-320, fetch from follower / future replica returns FENCED_LEADER_EPOCH if current leader epoch in the request is lower than the leader epoch known to the leader (or local replica in case of future replica fetching). In case of future replica fetching from the local replica, if local replica becomes the leader of the partition, the next fetch from future replica fails with FENCED_LEADER_EPOCH and fetching from future replica is stopped until the next leader change.
> Proposed solution: on local replica leader change, future replica should "become a follower" again, and go through the truncation phase. Or we could optimize it, and just update partition state of the future replica to reflect the updated current leader epoch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)