You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Imran Akbar <sk...@gmail.com> on 2016/05/13 14:43:55 UTC
memory leak exception
I'm trying to save a table using this code in pyspark with 1.6.1:
prices = sqlContext.sql("SELECT AVG(amount) AS mean_price, country FROM src
GROUP BY country")
prices.collect()
prices.write.saveAsTable('prices', format='parquet', mode='overwrite',
path='/mnt/bigdisk/tables')
but I'm getting this error:
16/05/13 02:04:24 INFO HadoopRDD: Input split:
file:/mnt/bigdisk/src.csv:100663296+33554432
16/05/13 02:04:33 WARN TaskMemoryManager: leak 68.0 MB memory from
org.apache.spark.unsafe.map.BytesToBytesMap@f9f1b5e
16/05/13 02:04:33 ERROR Executor: Managed memory leak detected; size =
71303168 bytes, TID = 4085
16/05/13 02:04:33 ERROR Executor: Exception in task 2.0 in stage 35.0 (TID
4085)
java.io.FileNotFoundException:
/mnt/bigdisk/spark_tmp/blockmgr-69da47e4-3a75-4244-80d3-9c7c0943e7f8/25/temp_shuffle_77078209-a2c5-466c-bba1-ff1a700f257c
(No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
any ideas what could be wrong?
thanks,
imran