You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/07/20 23:52:29 UTC

[GitHub] [pulsar] zliang-min opened a new pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

zliang-min opened a new pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401


   ### Motivation
   
   Currently, when the `ThreadRuntime` tries to stop a function instance, it will call the `stop` method on the `fnThread` if the thread is still alive after 10 seconds since it interrupts the thread, see [here](https://github.com/apache/pulsar/blob/d81b5f8b8e6cb17f307ec830accaf9dd95d7643b/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntime.java#L209).
   
   And Thread.stop is a deprecated method, and the issue is clearly documented in its [doc](https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#stop--), and I quote:
   
   > This method is inherently unsafe. Stopping a thread with Thread.stop causes it to unlock all of the monitors that it has locked (as a natural consequence of the unchecked ThreadDeath exception propagating up the stack). If any of the objects previously protected by these monitors were in an inconsistent state, the damaged objects become visible to other threads, potentially resulting in arbitrary behavior.
   
   And this behavior exactly caused an issue with `BatchSourceExecutor`. `BatchSourceExecutor` terminates the `discoveryThread` when it stops, and waits 10 seconds for the termination to complete, see [here](https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/batch/BatchSourceExecutor.java#L228). So, if the `discoveryThread` took long enough to terminate, the `awaitTermination` method could throw an `IllegalMonitorStateException` because of `fnThread.stop` is called. Below is the backtrace stack I got for this case:
   ```
   15:41:16,333 INFO [public/default/gimi-test-0] [instance: 0] ThreadPoolTaskScheduler - Shutting down ExecutorService
   15:41:26,334 ERROR [public/default/gimi-test-0] [instance: 0] JavaInstanceRunnable - Failed to close source org.apach
   e.pulsar.functions.source.batch.BatchSourceExecutor
   java.lang.IllegalMonitorStateException: null
           at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:149) ~[?:?]
           at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1302) ~[?:?]
           at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:439) ~[?:?]
           at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1458) ~[?:?]
           at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:709) ~[?:?]
           at org.apache.pulsar.functions.source.batch.BatchSourceExecutor.stop(BatchSourceExecutor.java:228) ~[pulsar-f
   unctions-instance.jar:2.9.0-SNAPSHOT]
           at org.apache.pulsar.functions.source.batch.BatchSourceExecutor.close(BatchSourceExecutor.java:210) ~[pulsar-
   functions-instance.jar:2.9.0-SNAPSHOT]
           at org.apache.pulsar.functions.instance.JavaInstanceRunnable.close(JavaInstanceRunnable.java:411) [pulsar-fun
   ctions-instance.jar:2.9.0-SNAPSHOT]
           at org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:298) [pulsar-funct
   ions-instance.jar:2.9.0-SNAPSHOT]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   The same issue has been discussed [here](https://stackoverflow.com/questions/29431344/illegalmonitorstateexception-on-awaittermination-function) as well.
   
   And the consequence of this error is `BatchSourceExecutor` will lost the chance to clean up the resources like the consumer of the intermediate topic, so it will leak a consumer that still consume from the topic without processing the in-come messages.
   
   This PR is to solve this issue.
   
   ### Modifications
   
   The change is very simple, I just removed the `fnThread.stop()` method call because it's deprecated, and replace it with a warning log. So we just let the function instance take its time to clean things up.
   
   ### Verifying this change
   
   I manually verified this change by:
   1. Updated the `BatchDataGeneratorSource` to add a sleep inside the `discover` method to make it run a long time, see [here](https://www.notion.so/temp-180a67ac3e704dcaa12d3b7efb5f9489#5e99da3fc8824d64b00ba635a092a42d), and build the connector.
   2. Start the above connector in cluster mode, below is the source config file I used to start the source:
     ```yaml
   tenant: "public"
   namespace: "default"
   name: "gimi-test"
   # archive: builtin://batch-data-generator
   archive: file:///tmp/pulsar/connectors/pulsar-io-batch-data-generator-2.9.0-SNAPSHOT.jar
   classname: org.apache.pulsar.io.batchdatagenerator.BatchDataGeneratorSource
   topicName: "persistent://public/default/gimi"
   parallelism: 1
   batchSourceConfig:
     discoveryTriggererClassName: org.apache.pulsar.io.batchdiscovery.CronTriggerer
     discoveryTriggererConfig:
       __CRON__: "*/30 * * * * *"
     ```
   3. Once discover starts, call the stop source API to stop the function instance, e.g. `http POST localhost:8080/admin/v3/sources/public/default/gimi-test/stop`.
   4. Check the function log, before the fix, you would see the `IllegalMonitorStateException` error. But after the fix, it won't happen anymore.
   5. Call the stats API on the intermediate topic, e.g. `http localhost:8080/admin/v2/persistent/public/default/gimi-test-intermediate/stats`. Before the fix, you would find that the consumer from the function instance is still there. And after the fix, you would see that the consumer got cleaned up.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API: no
     - The schema: no
     - The default values of configurations: no
     - The wire protocol: no
     - The rest endpoints: no
     - The admin cli options: no
     - Anything that affects deployment: no
   
   ### Documentation
   
   No doc change is needed, since it's an internal change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-883782142


   Thanks for providing doc-related info!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zliang-min commented on a change in pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
zliang-min commented on a change in pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#discussion_r674071987



##########
File path: pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntime.java
##########
@@ -206,7 +206,7 @@ public void stop() {
                 // kill the thread
                 fnThread.join(THREAD_SHUTDOWN_TIMEOUT_MILLIS, 0);
                 if (fnThread.isAlive()) {
-                    fnThread.stop();
+                    log.warn("The function instance thread is still alive after {} milliseconds, giving up waiting and moving forward.", THREAD_SHUTDOWN_TIMEOUT_MILLIS);

Review comment:
       Updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-886864131


   /pulsarbot rerun-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#discussion_r673575520



##########
File path: pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntime.java
##########
@@ -206,7 +206,7 @@ public void stop() {
                 // kill the thread
                 fnThread.join(THREAD_SHUTDOWN_TIMEOUT_MILLIS, 0);
                 if (fnThread.isAlive()) {
-                    fnThread.stop();
+                    log.warn("The function instance thread is still alive after {} milliseconds, giving up waiting and moving forward.", THREAD_SHUTDOWN_TIMEOUT_MILLIS);

Review comment:
       Nit. Can we change the message to:
   
   "The function instance thread is still alive after {} milliseconds. Giving up waiting and moving forward to close function."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zliang-min commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
zliang-min commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-883779133


   @jerrypeng please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nlu90 commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
nlu90 commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-885837919


   @jerrypeng Your suggested changes are safer, may be we can add the changes in this PR?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-886865959


   /pulsarbot rerun-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng merged pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
jerrypeng merged pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nlu90 commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
nlu90 commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-884393814


   With this change, will there be any dangling thread in the heap dump that's not cleaned up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-885967481






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-884715161


   @nlu90 In java there is always is no guaranteed way to kill a thread.  The thread has to cleanly exit.  To make sure threads are not leaked, I suggested followup work in my previous comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zliang-min commented on pull request #11401: [pulsar-function] Stop calling the deprated method Thread.stop when stopping the function thread in ThreadRuntime.

Posted by GitBox <gi...@apache.org>.
zliang-min commented on pull request #11401:
URL: https://github.com/apache/pulsar/pull/11401#issuecomment-885245738


   /pulsarbot rerun-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org