You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/03/26 21:29:23 UTC

[GitHub] [kafka] ableegoldman commented on a change in pull request #10417: HOTFIX: wrap StreamThread#runLoop in outer catch block

ableegoldman commented on a change in pull request #10417:
URL: https://github.com/apache/kafka/pull/10417#discussion_r602595373



##########
File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
##########
@@ -565,59 +565,57 @@ public void run() {
      * @throws StreamsException      if the store's change log does not contain the partition
      */
     boolean runLoop() {
-        subscribeConsumer();
+        try {
+            subscribeConsumer();
 
-        // if the thread is still in the middle of a rebalance, we should keep polling
-        // until the rebalance is completed before we close and commit the tasks
-        while (isRunning() || taskManager.isRebalanceInProgress()) {
-            try {
-                if (assignmentErrorCode.get() == AssignorError.SHUTDOWN_REQUESTED.code()) {
-                    log.warn("Detected that shutdown was requested. " +
-                            "All clients in this app will now begin to shutdown");
-                    mainConsumer.enforceRebalance();
-                }
-                final Long size = cacheResizeSize.getAndSet(-1L);
-                if (size != -1L) {
-                    cacheResizer.accept(size);
-                }
-                runOnce();
-                if (nextProbingRebalanceMs.get() < time.milliseconds()) {
-                    log.info("Triggering the followup rebalance scheduled for {} ms.", nextProbingRebalanceMs.get());
-                    mainConsumer.enforceRebalance();
-                    nextProbingRebalanceMs.set(Long.MAX_VALUE);
-                }
-            } catch (final TaskCorruptedException e) {
-                log.warn("Detected the states of tasks " + e.corruptedTasks() + " are corrupted. " +
-                        "Will close the task as dirty and re-create and bootstrap from scratch.", e);
+            // if the thread is still in the middle of a rebalance, we should keep polling
+            // until the rebalance is completed before we close and commit the tasks
+            while (isRunning() || taskManager.isRebalanceInProgress()) {
                 try {
-                    taskManager.handleCorruption(e.corruptedTasks());
-                } catch (final TaskMigratedException taskMigrated) {
-                    handleTaskMigrated(taskMigrated);
-                }
-            } catch (final TaskMigratedException e) {
-                handleTaskMigrated(e);
-            } catch (final UnsupportedVersionException e) {
-                final String errorMessage = e.getMessage();
-                if (errorMessage != null &&
+                    if (assignmentErrorCode.get() == AssignorError.SHUTDOWN_REQUESTED.code()) {
+                        log.warn("Detected that shutdown was requested. " +
+                                     "All clients in this app will now begin to shutdown");
+                        mainConsumer.enforceRebalance();
+                    }
+                    final Long size = cacheResizeSize.getAndSet(-1L);
+                    if (size != -1L) {
+                        cacheResizer.accept(size);
+                    }
+                    runOnce();
+                    if (nextProbingRebalanceMs.get() < time.milliseconds()) {
+                        log.info("Triggering the followup rebalance scheduled for {} ms.", nextProbingRebalanceMs.get());
+                        mainConsumer.enforceRebalance();
+                        nextProbingRebalanceMs.set(Long.MAX_VALUE);
+                    }
+                } catch (final TaskCorruptedException e) {
+                    log.warn("Detected the states of tasks " + e.corruptedTasks() + " are corrupted. " +
+                                 "Will close the task as dirty and re-create and bootstrap from scratch.", e);
+                    try {
+                        taskManager.handleCorruption(e.corruptedTasks());
+                    } catch (final TaskMigratedException taskMigrated) {
+                        handleTaskMigrated(taskMigrated);
+                    }
+                } catch (final TaskMigratedException e) {
+                    handleTaskMigrated(e);
+                } catch (final UnsupportedVersionException e) {
+                    final String errorMessage = e.getMessage();
+                    if (errorMessage != null &&
                         errorMessage.startsWith("Broker unexpectedly doesn't support requireStable flag on version ")) {
 
-                    log.error("Shutting down because the Kafka cluster seems to be on a too old version. " +
-                                    "Setting {}=\"{}\" requires broker version 2.5 or higher.",
-                            StreamsConfig.PROCESSING_GUARANTEE_CONFIG,
-                            EXACTLY_ONCE_BETA);
-                }
-                failedStreamThreadSensor.record();
-                this.streamsUncaughtExceptionHandler.accept(e);
-                if (processingMode == ProcessingMode.EXACTLY_ONCE_ALPHA || processingMode == ProcessingMode.EXACTLY_ONCE_BETA) {

Review comment:
       > I removed it in the UnsupportedVersion case because that’s always EOS anyways. I removed it for the general catch Throwable case after our earlier discussion a while back — always returning false will not force ALOS to clean up the checkpoint, and it should not write an additional checkpoint if it hit an exception. Plus it would have ultimately closed as dirty anyways, the difference is just that it made the code harder to follow in the ALOS case




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org