You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by lev <ka...@gmail.com> on 2016/10/17 09:02:03 UTC

Possible memory leak after closing spark context in v2.0.1

Hello,

I'm in the process of migrating my application to spark 2.0.1,
And I think there is some memory leaks related to Broadcast joins.

the application has many unit tests,
and each individual test suite passes, but when running all together, it
fails on OOM errors.

In the begging of each suite I create a new spark session with the session
builder:
/val spark = sessionBuilder.getOrCreate()
/
and in the end of each suite, I call the stop method:
/spark.stop()/

I added a profiler to the application, and looks like broadcast objects are
taking most of the memory:
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27910/profiler.png> 

Since each test suite passes when running by itself,
I think that the broadcasts are leaking between the tests suites.

Any suggestions on how to resolve this?

thanks 





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Possible-memory-leak-after-closing-spark-context-in-v2-0-1-tp27910.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Possible memory leak after closing spark context in v2.0.1

Posted by Lev Katzav <ka...@gmail.com>.
I don't have in my code any object broadcasting.
I do have broadcast join hints (df1.join(broadcast(df2)))

I tried, starting and stopping the spark context for every test (and not
once per suite),
and it did stop the OOM errors, so I guess that there is no leakage after
the context is stopped.
also removing the broadcast hint had stopped the errors.

So perhaps DFs that were broadcast are never released?
I would assume that the same way cached rdds are evicted when there is no
more free memory, broadcasts will behave the same. Is that incorrect?

Thanks


On Mon, Oct 17, 2016 at 4:52 PM, Sean Owen <so...@cloudera.com> wrote:

> Did you unpersist the broadcast objects?
>
> On Mon, Oct 17, 2016 at 10:02 AM lev <ka...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm in the process of migrating my application to spark 2.0.1,
>> And I think there is some memory leaks related to Broadcast joins.
>>
>> the application has many unit tests,
>> and each individual test suite passes, but when running all together, it
>> fails on OOM errors.
>>
>> In the begging of each suite I create a new spark session with the session
>> builder:
>> /val spark = sessionBuilder.getOrCreate()
>> /
>> and in the end of each suite, I call the stop method:
>> /spark.stop()/
>>
>> I added a profiler to the application, and looks like broadcast objects
>> are
>> taking most of the memory:
>> <http://apache-spark-user-list.1001560.n3.nabble.com/
>> file/n27910/profiler.png>
>>
>> Since each test suite passes when running by itself,
>> I think that the broadcasts are leaking between the tests suites.
>>
>> Any suggestions on how to resolve this?
>>
>> thanks
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Possible-memory-leak-after-
>> closing-spark-context-in-v2-0-1-tp27910.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Possible memory leak after closing spark context in v2.0.1

Posted by Sean Owen <so...@cloudera.com>.
Did you unpersist the broadcast objects?

On Mon, Oct 17, 2016 at 10:02 AM lev <ka...@gmail.com> wrote:

> Hello,
>
> I'm in the process of migrating my application to spark 2.0.1,
> And I think there is some memory leaks related to Broadcast joins.
>
> the application has many unit tests,
> and each individual test suite passes, but when running all together, it
> fails on OOM errors.
>
> In the begging of each suite I create a new spark session with the session
> builder:
> /val spark = sessionBuilder.getOrCreate()
> /
> and in the end of each suite, I call the stop method:
> /spark.stop()/
>
> I added a profiler to the application, and looks like broadcast objects are
> taking most of the memory:
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n27910/profiler.png
> >
>
> Since each test suite passes when running by itself,
> I think that the broadcasts are leaking between the tests suites.
>
> Any suggestions on how to resolve this?
>
> thanks
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Possible-memory-leak-after-closing-spark-context-in-v2-0-1-tp27910.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>