You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Imran Akbar <sk...@gmail.com> on 2016/05/13 14:43:55 UTC

memory leak exception

I'm trying to save a table using this code in pyspark with 1.6.1:

prices = sqlContext.sql("SELECT AVG(amount) AS mean_price, country FROM src
GROUP BY country")
prices.collect()
prices.write.saveAsTable('prices', format='parquet', mode='overwrite',
path='/mnt/bigdisk/tables')

but I'm getting this error:

16/05/13 02:04:24 INFO HadoopRDD: Input split:
file:/mnt/bigdisk/src.csv:100663296+33554432

16/05/13 02:04:33 WARN TaskMemoryManager: leak 68.0 MB memory from
org.apache.spark.unsafe.map.BytesToBytesMap@f9f1b5e

16/05/13 02:04:33 ERROR Executor: Managed memory leak detected; size =
71303168 bytes, TID = 4085

16/05/13 02:04:33 ERROR Executor: Exception in task 2.0 in stage 35.0 (TID
4085)

java.io.FileNotFoundException:
/mnt/bigdisk/spark_tmp/blockmgr-69da47e4-3a75-4244-80d3-9c7c0943e7f8/25/temp_shuffle_77078209-a2c5-466c-bba1-ff1a700f257c
(No such file or directory)

        at java.io.FileOutputStream.open(Native Method)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)

        at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)

        at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)

        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

        at org.apache.spark.scheduler.Task.run(Task.scala:89)


any ideas what could be wrong?


thanks,

imran