You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Arthur Li <li...@126.com> on 2021/12/23 14:10:45 UTC

How to estimate the executor memory size according by the data

Dear experts,

Recently there’s some OOM issue in my demo jobs which consuming data from the hive database, and I know I can increase the executor memory size to eliminate the OOM error. While I don’t know how to do the executor memory assessment and how to automatically adopt the executor memory size by the data size.

Any options I appreciated.
Arthur Li

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: How to estimate the executor memory size according by the data

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,

just trying to understand:
1.  Are you using JDBC to consume data from HIVE?
2. Or are you reading data directly from S3 and just using HIVE Metastore
in SPARK just to find out where the table is stored and its metadata?

Regards,
Gourav Sengupta

On Thu, Dec 23, 2021 at 2:13 PM Arthur Li <li...@126.com> wrote:

> Dear experts,
>
> Recently there’s some OOM issue in my demo jobs which consuming data from
> the hive database, and I know I can increase the executor memory size to
> eliminate the OOM error. While I don’t know how to do the executor memory
> assessment and how to automatically adopt the executor memory size by the
> data size.
>
> Any options I appreciated.
> Arthur Li
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

RE: How to estimate the executor memory size according by the data

Posted by Luca Canali <lu...@cern.ch>.

Hi Arthur,

If you are using Spark 3.x you can use executor metrics for memory instrumentation.  
Metrics are available on the WebUI, see https://spark.apache.org/docs/latest/web-ui.html#stage-detail (search for Peak execution memory).  
Memory execution metrics are available also in the REST API and the Spark metrics system, see https://spark.apache.org/docs/latest/monitoring.html  
Further information on the topic also at https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring  
  
Best,
Luca

-----Original Message-----
From: Arthur Li <li...@126.com> 
Sent: Thursday, December 23, 2021 15:11
To: user@spark.apache.org
Subject: How to estimate the executor memory size according by the data

Dear experts,

Recently there’s some OOM issue in my demo jobs which consuming data from the hive database, and I know I can increase the executor memory size to eliminate the OOM error. While I don’t know how to do the executor memory assessment and how to automatically adopt the executor memory size by the data size.

Any options I appreciated.
Arthur Li

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org