You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sa...@wellsfargo.com on 2016/01/12 19:15:16 UTC

Big data job only finishes with Legacy memory management

Hello,

I am tinkering with Spark 1.6. I have this 1.5 Billion rows data, to which I apply several window functions such as lag, first, etc. The job is quite expensive, I am running a small cluster with executors running with 70GB of ram.

Using new memory management system, the job fails around the middle with heap memory limit exceeded problem. Tried also tinkering with different of the new memory settings with no success. 70GB * 4 nodes is a lot of resources for this kind of job.

Legacy mode memory management runs this job succesfully with default memory settings.

How could I further analyze this problem to provide assistance and better diagnostics??
All the job goes around the dataframe api, with nothing strange (no udf or custom operations).

Saif