You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Divyanshu Kumar <co...@gmail.com> on 2021/01/21 12:15:31 UTC

Facing memory leak with Pyarrow enabled and toPandas()

Hi, I am facing this issue while using toPandas() and Pyarrow
simultaneously.

pandas - toPandas() giving memory leak error with PySpark arrow enabled -
Stack Overflow
<https://stackoverflow.com/questions/65824894/topandas-giving-memory-leak-error-with-pyspark-arrow-enabled>

Re: Facing memory leak with Pyarrow enabled and toPandas()

Posted by Gourav Sengupta <go...@gmail.com>.
Hi
Can you please mention the spark version, give us the code for setting up
spark session, and the operation you are talking about? It will be good to
know the amount of memory that your system has as well and number of
executors you are using per system
In general I have faced issues when doing group by or running aggregates
over datasets which are more than 2 GB but my system has lower ram.

Regards
Gourav

On Thu, 21 Jan 2021, 12:24 Divyanshu Kumar, <co...@gmail.com> wrote:

> Hi, I am facing this issue while using toPandas() and Pyarrow
> simultaneously.
>
> pandas - toPandas() giving memory leak error with PySpark arrow enabled -
> Stack Overflow
> <https://stackoverflow.com/questions/65824894/topandas-giving-memory-leak-error-with-pyspark-arrow-enabled>
>