You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/12/13 21:45:14 UTC

[GitHub] [kafka] cmccabe commented on a diff in pull request #12983: MINOR: ControllerServer should use the new metadata loader and snapshot generator

cmccabe commented on code in PR #12983:
URL: https://github.com/apache/kafka/pull/12983#discussion_r1047783202


##########
metadata/src/main/java/org/apache/kafka/image/publisher/MetadataPublisher.java:
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kafka.image.publisher;
+
+import org.apache.kafka.image.MetadataDelta;
+import org.apache.kafka.image.MetadataImage;
+import org.apache.kafka.image.loader.LogDeltaManifest;
+import org.apache.kafka.image.loader.SnapshotManifest;
+
+
+/**
+ * Publishes metadata deltas which we have loaded from the log and snapshots.
+ *
+ * Publishers receive a stream of callbacks from the metadata loader which keeps them notified
+ * of the latest cluster metadata. This interface abstracts away some of the complications of
+ * following the cluster metadata. For example, if the loader needs to read a snapshot, it will
+ * present the contents of the snapshot in the form of a delta from the previous state.
+ */
+public interface MetadataPublisher extends AutoCloseable {

Review Comment:
   The idea behind calling them "publishers" it that they publish metadata to the rest of the system. "Listener" is already a very overloaded name -- for example, the `MetadataLoader` is a `RaftClient.Listener`. I think it would be confusing to have another listener.
   
   The thing about the publishers is that metadata is NOT published immediately after we get it. We kind of have this kind of nomenclature:
   
   1. `listener`: gets the raw update stream from the Raft layer. This includes a lot of junk that we don't care about and it has some messy differences between snapshots and log batches, so most code doesn't want to deal with it. Ultimately `QuorumController` and `MetadataLoader` will probably be the two pieces of code that do.
   3. `loader`: translates the raw update stream into delta and image objects. Handles metadata transactions and snapshot application.
   4. `publisher`: publishes images to the rest of the system and to external places (like the filesystem).
   
   Loader errors are very bad, clearly, because it means you've failed to correctly translate the updates. Publisher errors might not be that bad. It could just mean that you couldn't reinitialize a thread pool when a config changed, for example. So we have separate metrics for these two things on the broker. To be fair, the publisher error metric refers to `metadata-apply-error-count` rather than `metadata-publish-error-count`. But Applier isn't a great class name, and `Applier#apply` doesn't seem better than `Publisher#publish`... well, to me, at least.
   
   The idea is that most people are operating at the 3rd layer (publisher) and don't need to deal with messy translation and loading issues. So they can register their own publisher and just handle the incoming update object stream. So a big part of this is breaking up the single big publisher into a bunch of smaller ones to facilitate this modularity. It's not as trivial as just making N callbacks instead of 1 because with multiple publishers you have to deal with stuff like "this publisher never saw any metadata before, now it needs to get a full update." etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org