You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/14 14:53:59 UTC

[GitHub] [spark] attilapiros opened a new pull request #32169: [SPARK-35009][CORE] Avoid creating multiple worker monitor threads for the same worker and same task context

attilapiros opened a new pull request #32169:
URL: https://github.com/apache/spark/pull/32169


   
   ### What changes were proposed in this pull request?
   
   With this PR Spark avoids creating multiple monitor threads for the same worker and same task context.
   
   ### Why are the changes needed?
   
   Without this change unnecessary threads will be created. It even can cause job failure for example when a coalesce (without shuffle) from high partition number goes to very low one. This exception is exactly comes for such a run:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.210 executor driver): java.lang.OutOfMemoryError: unable to create new native thread
   	at java.lang.Thread.start0(Native Method)
   	at java.lang.Thread.start(Thread.java:717)
   	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:166)
   	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
   	at org.apache.spark.rdd.CoalescedRDD.$anonfun$compute$1(CoalescedRDD.scala:99)
   	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
   	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
   	at scala.collection.Iterator.foreach(Iterator.scala:941)
   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
   	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
   	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
   	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
   	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
   	at scala.collection.AbstractIterator.to(Iterator.scala:1429)
   	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
   	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
   	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429)
   	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
   	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
   	at scala.collection.AbstractIterator.toArray(Iterator.scala:1429)
   	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
   	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2260)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   Driver stacktrace:
   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2262)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2211)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2210)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2210)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1083)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1083)
   	at scala.Option.foreach(Option.scala:407)
   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1083)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2449)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2391)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2380)
   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:872)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2220)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2260)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2285)
   	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
   	at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
   	at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:180)
   	at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.OutOfMemoryError: unable to create new native thread
   	at java.lang.Thread.start0(Native Method)
   	at java.lang.Thread.start(Thread.java:717)
   	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:166)
   	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
   	at org.apache.spark.rdd.CoalescedRDD.$anonfun$compute$1(CoalescedRDD.scala:99)
   	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
   	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
   	at scala.collection.Iterator.foreach(Iterator.scala:941)
   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
   	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
   	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
   	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
   	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
   	at scala.collection.AbstractIterator.to(Iterator.scala:1429)
   	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
   	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
   	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429)
   	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
   	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
   	at scala.collection.AbstractIterator.toArray(Iterator.scala:1429)
   	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
   	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2260)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	... 1 more
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually I used a the following Python script used (`reproduce-SPARK-35009.py`):
   
   ```
   import pyspark
   
   conf = pyspark.SparkConf().setMaster("local[*]").setAppName("Test1")
   sc = pyspark.SparkContext.getOrCreate(conf)
   
   rows = 70000
   data = list(range(rows))
   rdd = sc.parallelize(data, rows)
   assert rdd.getNumPartitions() == rows
   rdd0 = rdd.filter(lambda x: False)
   data = rdd0.coalesce(1).collect()
   assert data == []
   ```
   
   Spark submit:
   ```
   $ ./bin/spark-submit reproduce-SPARK-35009.py
   ```
   
   #### With this change
   
   Checking the number of monitor threads with jcmd:
   ```
   $ jcmd
   85273 sun.tools.jcmd.JCmd
   85227 org.apache.spark.deploy.SparkSubmit reproduce-SPARK-35009.py
   41020 scala.tools.nsc.MainGenericRunner
   $ jcmd 85227 Thread.print | grep -c "Monitor for python"
   2
   $ jcmd 85227 Thread.print | grep -c "Monitor for python"
   2
   ...
   $ jcmd 85227 Thread.print | grep -c "Monitor for python"
   2
   $ jcmd 85227 Thread.print | grep -c "Monitor for python"
   2
   $ jcmd 85227 Thread.print | grep -c "Monitor for python"
   2
   $ jcmd 85227 Thread.print | grep -c "Monitor for python"
   2
   ```
   
   #### Without this change
   
   ```
   ...
   $ jcmd 90052 Thread.print | grep -c "Monitor for python"                                                                                                      [INSERT]
   5645
   ..
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820323913


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41987/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820133344






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820199894


   Jenkins result is strange as even with multiple python versions the test pass locally:
   ```
   $ python/run-tests --testnames pyspark.tests.test_taskcontext
   Running PySpark tests. Output is in /Users/attilazsoltpiros/git/attilapiros/spark/python/unit-tests.log
   Will test against the following Python executables: ['python3.6', 'pypy3']
   Will test the following Python tests: ['pyspark.tests.test_taskcontext']
   python3.6 python_implementation is CPython
   python3.6 version is: Python 3.6.13
   pypy3 python_implementation is PyPy
   pypy3 version is: Python 3.6.12 (db1e853f94de, Nov 18 2020, 09:49:36)
   [PyPy 7.3.3 with GCC Apple LLVM 12.0.0 (clang-1200.0.32.27)]
   Starting test(python3.6): pyspark.tests.test_taskcontext
   Starting test(pypy3): pyspark.tests.test_taskcontext
   Finished test(python3.6): pyspark.tests.test_taskcontext (49s)
   Finished test(pypy3): pyspark.tests.test_taskcontext (62s)
   Tests passed in 62 seconds
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819633820


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41933/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros closed pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros closed pull request #32169:
URL: https://github.com/apache/spark/pull/32169


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820362281


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137411/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826049783


   Although test failure is unrelated this is a good opportunity to rebase on the master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820089733


   **[Test build #137397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137397/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820295940


   I'm okay with this fix .. but would be great to have more looks. cc @ueshin too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820083032


   @attilapiros, there was a bit of unexpected infra issue for GA build. Can you sync/replace to the latest master branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821303386


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821228353


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821283106


   **[Test build #137488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137488/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821163278


   I buy it. Let's retest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826042291


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42403/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32169:
URL: https://github.com/apache/spark/pull/32169#discussion_r613924852



##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -161,10 +162,21 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             logWarning("Failed to close worker socket", e)
         }
       }
+      if (reuseWorker) {
+        val key = (worker, context.taskAttemptId)
+        PythonRunner.runningMonitorThreads.remove(key)

Review comment:
       Hmmm .. I don't like this very much because we're adding some more possibility for leaking ... this task completion lister itself is not something guaranteed in fact IIRC. 
   
   BTW, it would be great to elabourate what's going on here with a comment. So the problem is that there are multiple monitor threads that wait for the same Python (forked) worker with the same socket .. right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #32169:
URL: https://github.com/apache/spark/pull/32169#discussion_r619604269



##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -161,10 +162,21 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             logWarning("Failed to close worker socket", e)
         }
       }
+      if (reuseWorker) {
+        val key = (worker, context.taskAttemptId)
+        PythonRunner.runningMonitorThreads.remove(key)

Review comment:
       @HyukjinKwon your comment is addressed in https://github.com/apache/spark/pull/32169/commits/7ed511066f7be831f8ea52ac9f93a3687baecc8d
   
   

##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -161,10 +162,21 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             logWarning("Failed to close worker socket", e)
         }
       }
+      if (reuseWorker) {
+        val key = (worker, context.taskAttemptId)
+        PythonRunner.runningMonitorThreads.remove(key)

Review comment:
       @HyukjinKwon your comment is addressed in https://github.com/apache/spark/pull/32169/commits/61248eed6d90eaf9af088f0dd2ec4349c4cb628c




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821216936


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819623662


   Locally all the python tests are successful:
   ``
   ...
   Starting test(/Users/attilazsoltpiros/.pyenv/versions/3.8.1/bin/python3): pyspark.tests.test_daemon
   Finished test(/Users/attilazsoltpiros/.pyenv/versions/3.8.1/bin/python3): pyspark.tests.test_daemon (5s)
   ...
   Tests passed in 949 seconds
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros edited a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros edited a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820199894


   Jenkins result is strange as even with multiple python versions the test pass locally:
   ```
   $ python/run-tests --testnames pyspark.tests.test_worker
   Running PySpark tests. Output is in /Users/attilazsoltpiros/git/attilapiros/spark/python/unit-tests.log
   Will test against the following Python executables: ['python3.6', 'pypy3']
   Will test the following Python tests: ['pyspark.tests.test_worker']
   python3.6 python_implementation is CPython
   python3.6 version is: Python 3.6.13
   pypy3 python_implementation is PyPy
   pypy3 version is: Python 3.6.12 (db1e853f94de, Nov 18 2020, 09:49:36)
   [PyPy 7.3.3 with GCC Apple LLVM 12.0.0 (clang-1200.0.32.27)]
   Starting test(python3.6): pyspark.tests.test_worker
   Starting test(pypy3): pyspark.tests.test_worker
   Finished test(python3.6): pyspark.tests.test_worker (14s)
   Finished test(pypy3): pyspark.tests.test_worker (15s)
   Tests passed in 15 seconds
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821188983


   **[Test build #137488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137488/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820089733


   **[Test build #137397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137397/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819741657


   ok, so the python tests was successful even on jenkins too:
   https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137355/testReport/pyspark.tests.test_daemon/DaemonTests/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820316887






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826056157


   **[Test build #137879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137879/testReport)** for PR 32169 at commit [`61248ee`](https://github.com/apache/spark/commit/61248eed6d90eaf9af088f0dd2ec4349c4cb628c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820285433


   **[Test build #137411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137411/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826055998


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137873/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819723871


   **[Test build #137355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137355/testReport)** for PR 32169 at commit [`4f1b687`](https://github.com/apache/spark/commit/4f1b687c1a6f6f0d3e0dabca0d88a78e4315f9d6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826056157


   **[Test build #137879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137879/testReport)** for PR 32169 at commit [`61248ee`](https://github.com/apache/spark/commit/61248eed6d90eaf9af088f0dd2ec4349c4cb628c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821163360


   Jenkins test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826077278


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826063001


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42408/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820362281


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137411/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821219694


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819745007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137355/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826051729






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826072647


   **[Test build #137879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137879/testReport)** for PR 32169 at commit [`61248ee`](https://github.com/apache/spark/commit/61248eed6d90eaf9af088f0dd2ec4349c4cb628c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820361135


   **[Test build #137411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137411/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32169:
URL: https://github.com/apache/spark/pull/32169#discussion_r613924852



##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -161,10 +162,21 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             logWarning("Failed to close worker socket", e)
         }
       }
+      if (reuseWorker) {
+        val key = (worker, context.taskAttemptId)
+        PythonRunner.runningMonitorThreads.remove(key)

Review comment:
       Hmmm .. I don't like this very much because we're adding some more possibility for leaking ... this task completion lister itself is not something guaranteed in fact IIRC. 
   
   BTW, it would be great to elabourate what's going on here. So the problem is that there are multiple monitor threads that wait for the same Python (forked) worker with the same socket .. right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820276068


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826036819






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826049783


   Although test failure is unrelated this is a good opportunity to rebase on the master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821188983


   **[Test build #137488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137488/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826077278


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819584328


   **[Test build #137355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137355/testReport)** for PR 32169 at commit [`4f1b687`](https://github.com/apache/spark/commit/4f1b687c1a6f6f0d3e0dabca0d88a78e4315f9d6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826041092


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42403/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821228353


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820169471


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137397/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819584328


   **[Test build #137355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137355/testReport)** for PR 32169 at commit [`4f1b687`](https://github.com/apache/spark/commit/4f1b687c1a6f6f0d3e0dabca0d88a78e4315f9d6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820169471


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137397/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820133369


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41973/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819658089


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41933/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros edited a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros edited a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819741657


   ok, so the python tests was successful even on the jenkins (+ locally, but failed in my github action):
   https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137355/testReport/pyspark.tests.test_daemon/DaemonTests/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819658089


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41933/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826055998






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826036819


   **[Test build #137873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137873/testReport)** for PR 32169 at commit [`7ed5110`](https://github.com/apache/spark/commit/7ed511066f7be831f8ea52ac9f93a3687baecc8d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819749376


   Is there any downside or problem that this could cause? I don't know enough about what a monitor thread does to really evaluate it. The change itself looks clean and does what it says.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820285433


   **[Test build #137411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137411/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-823073547


   Thanks @srowen ! 
   There is one more comment to address: https://github.com/apache/spark/pull/32169#discussion_r613924852
   Let me think about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826055998






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32169: [WIP][SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819692843


   **[Test build #748664690](https://github.com/attilapiros/spark/actions/runs/748664690)** for PR 32169 at commit [`4f1b687`](https://github.com/attilapiros/spark/commit/4f1b687c1a6f6f0d3e0dabca0d88a78e4315f9d6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #32169:
URL: https://github.com/apache/spark/pull/32169#discussion_r619604269



##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -161,10 +162,21 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             logWarning("Failed to close worker socket", e)
         }
       }
+      if (reuseWorker) {
+        val key = (worker, context.taskAttemptId)
+        PythonRunner.runningMonitorThreads.remove(key)

Review comment:
       @HyukjinKwon your comment is addressed in https://github.com/apache/spark/pull/32169/commits/7ed511066f7be831f8ea52ac9f93a3687baecc8d
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826055998


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137873/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826042291


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42403/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826036819


   **[Test build #137873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137873/testReport)** for PR 32169 at commit [`7ed5110`](https://github.com/apache/spark/commit/7ed511066f7be831f8ea52ac9f93a3687baecc8d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826063001


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42408/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820111567


   @mridulm you mean using the `TaskCompletionListener`, right?
   
   As I see the code of the `MonitorThread`:  one of its responsibility to handle task interruption:
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L582-L584
   
   The code goes on what to do when the task is interrupted and not completed.
   
   But task interruption is not a completion you can see when it flagged to be interrupted no listener informed:
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/TaskContextImpl.scala#L149-L151 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820168402


   **[Test build #137397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137397/testReport)** for PR 32169 at commit [`c4a5e2d`](https://github.com/apache/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826599747


   I leave it here for 4 more days to let the others to review it and if no issue comes up I'll merge it (assuming it's still passing CI and no review is in progress).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-819745007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137355/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826060591


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42408/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820395723


   @srowen 
   
   In `MonitorThread` this the interesting part for us:
   https://github.com/apache/spark/blob/4f1b687c1a6f6f0d3e0dabca0d88a78e4315f9d6/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L582-L597
   
   So when the task is completed then this Monitor thread do nothing interesting just stopping without doing anything. 
   Its main purpose to handle task interruptions. 
   
   So when the task was interrupted before this PR then multiple `MonitorThread` were called `destroyPythonWorker` for the same socket which was delegating to
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/SparkEnv.scala#L128
   
   We know here the key is the same as the socket was the same. 
   
   Going further the road we reach `stopWorker`:
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L344-L361
   
   As I see `useDeamon` must be true as create only reusing sockets when the `useDaemon` is true:
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L102
   
   Now back to `stopWorker`. We can see it just sends a pid. But that pid is coming from `HashMap` where the socket is the key:
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L83
   
   So multiple monitor threads for the same socket will just sends the same pid multiple times via `daemon.getOutputStream` which is the stdin of the daemon process. (The taskcontext/taskAttemptID is only needed for my `runningMonitorThreads` to monitor each separate task interruptions separately.)
   
   Let's see what happens in the daemon side:
   https://github.com/apache/spark/blob/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639/python/pyspark/daemon.py#L126-L135
   
   So we are sending just a SIGKILL to the PID arrived via the stdin. 
   This is really redundant for the same pid. Those errors (sending kill to a nonexisting PID) are ignored by `except OSError:`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820289588


   cc @zsxwing FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820133369


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41973/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826040677


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42403/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-821303386


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-826051729


   **[Test build #137873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137873/testReport)** for PR 32169 at commit [`7ed5110`](https://github.com/apache/spark/commit/7ed511066f7be831f8ea52ac9f93a3687baecc8d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #32169:
URL: https://github.com/apache/spark/pull/32169#discussion_r619604269



##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -161,10 +162,21 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             logWarning("Failed to close worker socket", e)
         }
       }
+      if (reuseWorker) {
+        val key = (worker, context.taskAttemptId)
+        PythonRunner.runningMonitorThreads.remove(key)

Review comment:
       @HyukjinKwon your comment is addressed in https://github.com/apache/spark/pull/32169/commits/61248eed6d90eaf9af088f0dd2ec4349c4cb628c




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820094334


   @attilapiros I dont have much context about python runner; but curious if `MonitorThread` can follow the same pattern/lifecycle as `writerThread` in that method ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820087911


   **[Test build #750728539](https://github.com/attilapiros/spark/actions/runs/750728539)** for PR 32169 at commit [`c4a5e2d`](https://github.com/attilapiros/spark/commit/c4a5e2dfa38d754f92ea6f4b98f549b7d6108639).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32169: [SPARK-35009][CORE] Avoid creating multiple python worker monitor threads for the same worker and same task context

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32169:
URL: https://github.com/apache/spark/pull/32169#issuecomment-820323913


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41987/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org