You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/06/12 09:12:00 UTC

[jira] [Resolved] (SPARK-12240) FileNotFoundException: (Too many open files) when using multiple groupby on DataFrames

     [ https://issues.apache.org/jira/browse/SPARK-12240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-12240.
----------------------------------
    Resolution: Cannot Reproduce

I can't reproduce this as below:

{code}
from pyspark.sql import functions as F

df = spark.createDataFrame([[1, 2, 3], [4, 5, 6]], ["col1","col2", "max_cum_rank"])
df_t = df.filter(df['max_cum_rank'] == 0).select(['col1','col2']).groupby('col1').agg(F.min('col2')).groupby('min(col2)').agg(F.countDistinct('col1')).toPandas()
{code}

Please reopen this if I misunderstood or there is a self-reproducer.

I guess this should be related with the ulimit.

> FileNotFoundException: (Too many open files) when using multiple groupby on DataFrames
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-12240
>                 URL: https://issues.apache.org/jira/browse/SPARK-12240
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.5.0
>         Environment: Debian 3.2.68-1+deb7u6 x86_64 GNU/Linux
>            Reporter: Shubhanshu Mishra
>              Labels: dataframe, grouping, pyspark
>
> Whenever, I try to do multiple grouping using data frames my job crashes with the error FileNotFoundException and message  = too many open files. 
> I can do these groupings using RDD easily but when I use the DataFrame operation I see these issues. 
> The code I am running:
> ```
> df_t = df.filter(df['max_cum_rank'] == 0).select(['col1','col2']).groupby('col1').agg(F.min('col2')).groupby('min(col2)').agg(F.countDistinct('col1')).toPandas()
> ```
> In [151]: df_t = df.filter(df['max_cum_rank'] == 0).select(['col1','col2']).groupby('col1').agg(F.min('col2')).groupby('min(col2)').agg(F.countDistinct('col1')).toPandas()
> [Stage 27:=====================================================>(415 + 1) / 416]15/12/09 06:36:36 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/22/temp_shuffle_1abbf917-842c-41ef-b113-ed60ee22e675
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/22/temp_shuffle_1abbf917-842c-41ef-b113-ed60ee22e675 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/29/temp_shuffle_e35e6e28-fdbf-4775-a32d-d0f5fd882e9e
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/29/temp_shuffle_e35e6e28-fdbf-4775-a32d-d0f5fd882e9e (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/18/temp_shuffle_2d26adcb-e3bb-4a01-8998-7428ebe5544d
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/18/temp_shuffle_2d26adcb-e3bb-4a01-8998-7428ebe5544d (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/33/temp_shuffle_e82d6779-eb0e-465f-9bbc-8375f5dc1030
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/33/temp_shuffle_e82d6779-eb0e-465f-9bbc-8375f5dc1030 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/0d/temp_shuffle_5de10ef5-7709-4c2e-a3c1-5e58b3dcef33
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/0d/temp_shuffle_5de10ef5-7709-4c2e-a3c1-5e58b3dcef33 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:160)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:174)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.stop(SortShuffleWriter.scala:104)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6060
> 15/12/09 06:36:36 ERROR Executor: Exception in task 14.0 in stage 28.0 (TID 6060)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/06/temp_shuffle_4d7ca6f5-c2b0-4b6e-8054-df53f0cbc54e (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6047
> 15/12/09 06:36:36 ERROR Executor: Exception in task 1.0 in stage 28.0 (TID 6047)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/30/temp_shuffle_7db91926-013b-4a1e-b640-8d34049336cc (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6061
> 15/12/09 06:36:36 ERROR Executor: Exception in task 15.0 in stage 28.0 (TID 6061)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/28/temp_shuffle_41e174db-d29d-4129-a652-2a9f739f1b59 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6063
> 15/12/09 06:36:36 ERROR Executor: Exception in task 17.0 in stage 28.0 (TID 6063)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/23/temp_shuffle_38a8bbc9-5094-48c1-a96d-502478d887d7 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6062
> 15/12/09 06:36:36 ERROR Executor: Exception in task 16.0 in stage 28.0 (TID 6062)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/13/temp_shuffle_36e1cb5a-cd8d-446e-bc3f-0b39da891aa5 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6053
> 15/12/09 06:36:36 ERROR Executor: Exception in task 7.0 in stage 28.0 (TID 6053)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/00/temp_shuffle_0f9c515e-d4fd-43b7-9868-f4a18c092608 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR TaskSetManager: Task 16 in stage 28.0 failed 1 times; aborting job
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6049
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6074
> 15/12/09 06:36:36 ERROR Executor: Exception in task 3.0 in stage 28.0 (TID 6049)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/21/temp_shuffle_150f2b2e-e262-45b3-8660-5cc347747405 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6070
> 15/12/09 06:36:36 ERROR Executor: Exception in task 28.0 in stage 28.0 (TID 6074)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/3b/temp_shuffle_5e59f4d6-3be9-43a6-a54f-8c035d057acb (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 24.0 in stage 28.0 (TID 6070)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/00/temp_shuffle_fed61d01-6e05-40f6-8378-936db6e5d22f (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6051
> 15/12/09 06:36:36 ERROR Executor: Exception in task 5.0 in stage 28.0 (TID 6051)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/2d/temp_shuffle_3b9f5d64-7cf2-4106-90c7-0b4dc99a8287 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6056
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6067
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6052
> 15/12/09 06:36:36 ERROR Executor: Exception in task 6.0 in stage 28.0 (TID 6052)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/05/temp_shuffle_9f0996ab-d960-4091-be1d-82e6598dd901 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6046
> 15/12/09 06:36:36 ERROR Executor: Exception in task 21.0 in stage 28.0 (TID 6067)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/07/temp_shuffle_cef223a8-42fe-4f31-be3e-3e00825848d5 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 10.0 in stage 28.0 (TID 6056)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/1f/temp_shuffle_ab66c3d0-3770-4018-8bb3-ae22e67ebfcd (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6075
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6069
> 15/12/09 06:36:36 ERROR Executor: Exception in task 0.0 in stage 28.0 (TID 6046)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/08/temp_shuffle_5d5e9a47-4d2b-4b40-8ecf-1bae5e97e500 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 23.0 in stage 28.0 (TID 6069)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/33/temp_shuffle_0fecf52c-b590-4764-b7fc-5953e08bd044 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 29.0 in stage 28.0 (TID 6075)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/07/temp_shuffle_c0c6e399-7143-4224-a5ab-6f6259f4837f (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6057
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6048
> 15/12/09 06:36:36 ERROR Executor: Exception in task 11.0 in stage 28.0 (TID 6057)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/35/temp_shuffle_56640593-78bd-4764-89ae-19c6d6f05490 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 2.0 in stage 28.0 (TID 6048)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/05/temp_shuffle_ea528ebe-a201-4e2b-87fd-9391e29411ed (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6066
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6058
> 15/12/09 06:36:36 ERROR Executor: Exception in task 20.0 in stage 28.0 (TID 6066)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/2b/temp_shuffle_5618ff6c-86f1-4116-811a-0ba663b9b0d0 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6059
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6065
> 15/12/09 06:36:36 ERROR Executor: Exception in task 12.0 in stage 28.0 (TID 6058)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/02/temp_shuffle_83491db1-9489-499f-b700-7d336d4935d0 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 33554432 bytes, TID = 6077
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6064
> 15/12/09 06:36:36 ERROR Executor: Exception in task 19.0 in stage 28.0 (TID 6065)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/00/temp_shuffle_c8ae819f-2352-4a5f-901b-24d8f0902e6d (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 13.0 in stage 28.0 (TID 6059)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/0a/temp_shuffle_e4f1a947-1d61-429c-8dfb-dde7b2732b37 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6073
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6050
> 15/12/09 06:36:36 ERROR Executor: Exception in task 18.0 in stage 28.0 (TID 6064)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/0a/temp_shuffle_5e4c5ffc-8168-4b33-8b3e-3ae57bdf9cee (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6071
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6068
> 15/12/09 06:36:36 ERROR Executor: Exception in task 25.0 in stage 28.0 (TID 6071)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/34/temp_shuffle_fcea2161-b7e9-4ce0-af5e-6147cd2b3a4c (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6072
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6055
> 15/12/09 06:36:36 ERROR Executor: Exception in task 4.0 in stage 28.0 (TID 6050)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/38/temp_shuffle_a7275df0-f59a-49a0-b2b2-29b1e30d6d3c (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 27.0 in stage 28.0 (TID 6073)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/19/temp_shuffle_4e719ea3-2bbb-4bde-9238-2aaf988f64ed (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 33554432 bytes, TID = 6080
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 16777216 bytes, TID = 6054
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 33554432 bytes, TID = 6076
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 33554432 bytes, TID = 6078
> 15/12/09 06:36:36 ERROR Executor: Exception in task 9.0 in stage 28.0 (TID 6055)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/09/temp_shuffle_8e4e3189-ae82-48ef-b86d-caeef7706741 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 26.0 in stage 28.0 (TID 6072)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/32/temp_shuffle_36fbaa0d-82e2-4328-90ad-50a9b3e0ee83 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Managed memory leak detected; size = 33554432 bytes, TID = 6079
> 15/12/09 06:36:36 ERROR Executor: Exception in task 22.0 in stage 28.0 (TID 6068)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/2e/temp_shuffle_9d1e1d71-7aac-4ab9-ada7-7020471623a3 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/12/09 06:36:36 ERROR Executor: Exception in task 8.0 in stage 28.0 (TID 6054)
> java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/25/temp_shuffle_4896feb2-f011-49ce-bdaa-4a6d20b1b306 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> ---------------------------------------------------------------------------
> Py4JJavaError                             Traceback (most recent call last)
> <ipython-input-151-a63780470b3c> in <module>()
> ----> 1 df_t = df.filter(df['max_cum_rank'] == 0).select(['col1','col2']).groupby('col1').agg(F.min('col2')).groupby('min(col2)').agg(F.countDistinct('col1')).toPandas()
> /pathSW/SOFTWARE/spark-1.5.0/python/pyspark/sql/dataframe.pyc in toPandas(self)
>    1269         """
>    1270         import pandas as pd
> -> 1271         return pd.DataFrame.from_records(self.collect(), columns=self.columns)
>    1272 
>    1273     ##########################################################################################
> /pathSW/SOFTWARE/spark-1.5.0/python/pyspark/sql/dataframe.pyc in collect(self)
>     277         """
>     278         with SCCallSiteSync(self._sc) as css:
> --> 279             port = self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
>     280         return list(_load_from_socket(port, BatchedSerializer(PickleSerializer())))
>     281 
> /pathSW/SOFTWARE/spark-1.5.0/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
>     536         answer = self.gateway_client.send_command(command)
>     537         return_value = get_return_value(answer, self.gateway_client,
> --> 538                 self.target_id, self.name)
>     539 
>     540         for temp_arg in temp_args:
> /pathSW/SOFTWARE/spark-1.5.0/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>      34     def deco(*a, **kw):
>      35         try:
> ---> 36             return f(*a, **kw)
>      37         except py4j.protocol.Py4JJavaError as e:
>      38             s = e.java_exception.toString()
> /pathSW/SOFTWARE/spark-1.5.0/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
>     298                 raise Py4JJavaError(
>     299                     'An error occurred while calling {0}{1}{2}.\n'.
> --> 300                     format(target_id, '.', name), value)
>     301             else:
>     302                 raise Py4JError(
> Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 16 in stage 28.0 failed 1 times, most recent failure: Lost task 16.0 in stage 28.0 (TID 6062, localhost): java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/13/temp_shuffle_36e1cb5a-cd8d-446e-bc3f-0b39da891aa5 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>         at scala.Option.foreach(Option.scala:236)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
> sqlContext = SQLContext(sc)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:905)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:904)
>         at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:373)
>         at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:259)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: /path/tmp/blockmgr-fde0f618-e443-4841-96c4-54c5e5b8fa0f/13/temp_shuffle_36e1cb5a-cd8d-446e-bc3f-0b39da891aa5 (Too many open files)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         ... 1 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org