You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/03 15:42:23 UTC

[GitHub] [spark] wankunde commented on pull request #32866: [SPARK-35713]Bug fix for thread leak in JobCancellationSuite

wankunde commented on pull request #32866:
URL: https://github.com/apache/spark/pull/32866#issuecomment-932975741


   Hi, @Ngone51 @LuciferYang 
   
   In our prod environment, some executors failed to kill tasks.  Could you give me some help?
   
   Reaper thread log:
   
   ```
   21/09/27 23:44:24,882 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:44:34,882 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2240871 ms
   21/09/27 23:44:34,885 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:44:44,885 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2250874 ms
   21/09/27 23:44:44,888 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:44:54,888 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2260877 ms
   21/09/27 23:44:54,891 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:45:04,891 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2270880 ms
   21/09/27 23:45:04,894 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:45:14,894 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2280883 ms
   21/09/27 23:45:14,896 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:45:24,897 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2290886 ms
   21/09/27 23:45:24,899 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:45:34,899 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2300888 ms
   21/09/27 23:45:34,902 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   21/09/27 23:45:44,902 WARN [Task reaper-1745] executor.Executor:69 : Killed task 768777879 is still running after 2310891 ms
   21/09/27 23:45:44,904 WARN [Task reaper-1745] executor.Executor:69 : Thread dump from task 768777879:
   ```
   
   Task Thread stack:
   ```sh
   "Executor 553 task launch worker for task 768777879, task 26.0 in stage 1285726.0 of app application_1630907351152_13315" #1106477 daemon prio=5 os_prio=0 tid=0x000000002a6b2000 nid=0x20b9f runnable [0x00007f87a9039000]
      java.lang.Thread.State: RUNNABLE
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare_0_0$(Unknown Source)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare(Unknown Source)
           at org.apache.spark.sql.execution.UnsafeKVExternalSorter$KVComparator.compare(UnsafeKVExternalSorter.java:272)
           at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:70)
           at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:44)
           at org.apache.spark.util.collection.TimSort$SortState.gallopRight(TimSort.java:638)
           at org.apache.spark.util.collection.TimSort$SortState.mergeHi(TimSort.java:887)
           at org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:536)
           at org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:462)
           at org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
           at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
           at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
           at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:364)
           at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
           at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.createWithExistingInMemorySorter(UnsafeExternalSorter.java:111)
           at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:158)
           at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:248)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_1$(Unknown Source)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_0$(Unknown Source)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:50)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:730)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
           at org.apache.spark.rdd.RDD$$anon$2.hasNext(RDD.scala:332)
           at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:176)
           at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
           at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
           at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
           at org.apache.spark.scheduler.Task.run(Task.scala:127)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:486)
           at org.apache.spark.executor.Executor$TaskRunner$$Lambda$533/2066049817.apply(Unknown Source)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1379)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:489)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   
      Locked ownable synchronizers:
           - <0x00007f8c72788150> (a java.util.concurrent.ThreadPoolExecutor$Worker)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org