You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/03 15:42:23 UTC
[GitHub] [spark] wankunde commented on pull request #32866: [SPARK-35713]Bug fix for thread leak in JobCancellationSuite
wankunde commented on pull request #32866:
URL: https://github.com/apache/spark/pull/32866#issuecomment-932975741
Hi, @Ngone51 @LuciferYang
In our prod environment, some executors failed to kill tasks. Could you give me some help?
Reaper thread log:
```
21/09/27 23:44:24,882 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:44:34,882 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2240871 ms
21/09/27 23:44:34,885 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:44:44,885 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2250874 ms
21/09/27 23:44:44,888 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:44:54,888 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2260877 ms
21/09/27 23:44:54,891 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:45:04,891 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2270880 ms
21/09/27 23:45:04,894 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:45:14,894 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2280883 ms
21/09/27 23:45:14,896 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:45:24,897 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2290886 ms
21/09/27 23:45:24,899 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:45:34,899 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2300888 ms
21/09/27 23:45:34,902 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
21/09/27 23:45:44,902 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2310891 ms
21/09/27 23:45:44,904 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
```
Task Thread stack:
```sh
"Executor 553 task launch worker for task 768777879, task 26.0 in stage 1285726.0 of app application_1630907351152_13315" #1106477 daemon prio=5 os_prio=0 tid=0x000000002a6b2000 nid=0x20b9f runnable [0x00007f87a9039000]
java.lang.Thread.State: RUNNABLE
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare_0_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare(Unknown Source)
at org.apache.spark.sql.execution.UnsafeKVExternalSorter$KVComparator.compare(UnsafeKVExternalSorter.java:272)
at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:70)
at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:44)
at org.apache.spark.util.collection.TimSort$SortState.gallopRight(TimSort.java:638)
at org.apache.spark.util.collection.TimSort$SortState.mergeHi(TimSort.java:887)
at org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:536)
at org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:462)
at org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:364)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.createWithExistingInMemorySorter(UnsafeExternalSorter.java:111)
at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:158)
at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:248)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:50)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:730)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.rdd.RDD$$anon$2.hasNext(RDD.scala:332)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:176)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:486)
at org.apache.spark.executor.Executor$TaskRunner$$Lambda$533/2066049817.apply(Unknown Source)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1379)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:489)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00007f8c72788150> (a java.util.concurrent.ThreadPoolExecutor$Worker)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org