You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "kramer2009@126.com" <kr...@126.com> on 2016/05/17 03:09:10 UTC

Will spark swap memory out to disk if the memory is not enough?

I know the cache operation can cache data in memoyr/disk... 

But I am expecting to know will other operation will do the same?

For example, I created a dataframe called df. The df is big so when I run
some action like :

df.sort(column_name).show()
df.collect()

It will throw error like :
	16/05/17 10:53:36 ERROR Executor: Managed memory leak detected; size =
2359296 bytes, TID = 15
	16/05/17 10:53:36 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID
15)
	org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
	  File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
line 111, in main
		process()
	  File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
line 106, in process
		serializer.dump_stream(func(split_index, iterator), outfile)
	  File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py",
line 263, in dump_stream
		vs = list(itertools.islice(iterator, batch))
	  File "<stdin>", line 1, in <lambda>
	IndexError: list index out of range


I want to know is there any way or configuration to let spark swap memory
into disk for this situation？



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-spark-swap-memory-out-to-disk-if-the-memory-is-not-enough-tp26968.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Will spark swap memory out to disk if the memory is not enough?

Posted by Ted Yu <yu...@gmail.com>.

Have you seen this thread ?

http://search-hadoop.com/m/q3RTtRbEiIXuOOS&subj=Re+PySpark+issue+with+sortByKey+IndexError+list+index+out+of+range+

which led to SPARK-4384

On Mon, May 16, 2016 at 8:09 PM, kramer2009@126.com <kr...@126.com>
wrote:

> I know the cache operation can cache data in memoyr/disk...
>
> But I am expecting to know will other operation will do the same?
>
> For example, I created a dataframe called df. The df is big so when I run
> some action like :
>
> df.sort(column_name).show()
> df.collect()
>
> It will throw error like :
>         16/05/17 10:53:36 ERROR Executor: Managed memory leak detected;
> size =
> 2359296 bytes, TID = 15
>         16/05/17 10:53:36 ERROR Executor: Exception in task 0.0 in stage
> 12.0 (TID
> 15)
>         org.apache.spark.api.python.PythonException: Traceback (most
> recent call
> last):
>           File
> "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
> line 111, in main
>                 process()
>           File
> "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
> line 106, in process
>                 serializer.dump_stream(func(split_index, iterator),
> outfile)
>           File
>
> "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py",
> line 263, in dump_stream
>                 vs = list(itertools.islice(iterator, batch))
>           File "<stdin>", line 1, in <lambda>
>         IndexError: list index out of range
>
>
> I want to know is there any way or configuration to let spark swap memory
> into disk for this situation？
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Will-spark-swap-memory-out-to-disk-if-the-memory-is-not-enough-tp26968.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>