You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/05/18 18:04:09 UTC
[GitHub] [kafka] guozhangwang commented on a change in pull request #10609: KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure

guozhangwang commented on a change in pull request #10609:
URL: https://github.com/apache/kafka/pull/10609#discussion_r634626047



##########
File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java
##########
@@ -411,19 +442,37 @@ private void cleanRemovedTasksCalledByCleanerThread(final long cleanupDelayMs) {
                 }
             }
         }
+        maybeCleanEmptyNamedTopologyDirs();

Review comment:
       Should we move this into the try/catch IOException block as well (ditto below)?

##########
File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java
##########
@@ -462,39 +512,49 @@ private void cleanRemovedTasksCalledByUser() throws Exception {
      * List all of the task directories that are non-empty
      * @return The list of all the non-empty local directories for stream tasks
      */
-    File[] listNonEmptyTaskDirectories() {
-        final File[] taskDirectories;
-        if (!hasPersistentStores || !stateDir.exists()) {
-            taskDirectories = new File[0];
-        } else {
-            taskDirectories =
-                stateDir.listFiles(pathname -> {
-                    if (!pathname.isDirectory() || !TASK_DIR_PATH_NAME.matcher(pathname.getName()).matches()) {
-                        return false;
-                    } else {
-                        return !taskDirIsEmpty(pathname);
-                    }
-                });
-        }
-
-        return taskDirectories == null ? new File[0] : taskDirectories;
+    List<TaskDirectory> listNonEmptyTaskDirectories() {
+        return listTaskDirectories(pathname -> {
+            if (!pathname.isDirectory() || !TASK_DIR_PATH_NAME.matcher(pathname.getName()).matches()) {
+                return false;
+            } else {
+                return !taskDirIsEmpty(pathname);
+            }
+        });
     }
 
     /**
-     * List all of the task directories
+     * List all of the task directories along with their parent directory if they belong to a named topology
      * @return The list of all the existing local directories for stream tasks
      */
-    File[] listAllTaskDirectories() {
-        final File[] taskDirectories;
-        if (!hasPersistentStores || !stateDir.exists()) {
-            taskDirectories = new File[0];
-        } else {
-            taskDirectories =
-                stateDir.listFiles(pathname -> pathname.isDirectory()
-                                                   && TASK_DIR_PATH_NAME.matcher(pathname.getName()).matches());
+    List<TaskDirectory> listAllTaskDirectories() {
+        return listTaskDirectories(pathname -> pathname.isDirectory() && TASK_DIR_PATH_NAME.matcher(pathname.getName()).matches());
+    }
+
+    private List<TaskDirectory> listTaskDirectories(final FileFilter filter) {
+        final List<TaskDirectory> taskDirectories = new ArrayList<>();
+        if (hasPersistentStores && stateDir.exists()) {
+            if (hasNamedTopologies) {

Review comment:
       Is it possible that we can have named topology state dirs and unamed (original) state dirs co-exist here?

##########
File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java
##########
@@ -411,19 +442,37 @@ private void cleanRemovedTasksCalledByCleanerThread(final long cleanupDelayMs) {
                 }
             }
         }
+        maybeCleanEmptyNamedTopologyDirs();
+    }
+
+    private void maybeCleanEmptyNamedTopologyDirs() {

Review comment:
       Could we just remove empty named topology dirs along the way instead of doing that in a second pass at the end? 
   
   EDIT: nvm, after some thoughts I feel it is more complicated than easier.

##########
File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfo.java
##########
@@ -125,6 +130,29 @@ public int errorCode() {
         return data.errorCode();
     }
 
+    // For version > MIN_NAMED_TOPOLOGY_VERSION
+    private void setTaskOffsetSumDataWithNamedTopologiesFromTaskOffsetSumMap(final Map<TaskId, Long> taskOffsetSums) {
+        final Map<Integer, List<SubscriptionInfoData.PartitionToOffsetSum>> topicGroupIdToPartitionOffsetSum = new HashMap<>();
+        for (final Map.Entry<TaskId, Long> taskEntry : taskOffsetSums.entrySet()) {
+            final TaskId task = taskEntry.getKey();
+            topicGroupIdToPartitionOffsetSum.computeIfAbsent(task.topicGroupId, t -> new ArrayList<>()).add(
+                    new SubscriptionInfoData.PartitionToOffsetSum()
+                            .setPartition(task.partition)
+                            .setOffsetSum(taskEntry.getValue()));
+        }
+
+        data.setTaskOffsetSums(taskOffsetSums.entrySet().stream().map(t -> {
+            final SubscriptionInfoData.TaskOffsetSum taskOffsetSum = new SubscriptionInfoData.TaskOffsetSum();
+            final TaskId task = t.getKey();
+            taskOffsetSum.setTopicGroupId(task.topicGroupId);
+            taskOffsetSum.setPartition(task.partition);

Review comment:
       Could you remind me why we want to include the partition id in the new version as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org