You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by heng xi <ix...@gmail.com> on 2022/03/22 06:57:33 UTC

About Spark SQL Engine Driver release memory Issue?

I made some strict changes with *DataSourceScanExec*,as blew:

[image: 企业微信截图_ce000531-da99-42f7-b947-9058ba7b3d2b.png]

Ran serval times sample sql that throw the exception,lead driver occupied
high memory and will not release any more

[image: 企业微信截图_a3201d1d-e609-4f9d-a385-87cc8591b8b1.png]

Using *jmap* inspect composite of driver memory
[image: 企业微信截图_c210b620-d029-4003-b532-2311668fcb44.png]
After analysis,I trace many Char[] and String with locatedFileStatus call
by InMemoryFileIndex
[image: wecom-temp-43ea8dbce712c3ccdeda839d6bcbd075.png]

Then, I made Some changes with origin strict modify on *InMemoryFileIndex*
and *DataSourceScanExec* as blew:
*InMemoryFileIndex*
[image: wecom-temp-443e484742bc3d808e3be8fc2c31cafe.png]

*DataSourceScanExec*
[image: 截屏2022-03-22 下午2.46.52.png]
[image: 企业微信截图_cda37f86-0aa9-4967-a010-c848b2941a5a.png]

Ran the same times sample sql, the driver memory show as below:
[image: wecom-temp-1dae315b48791fc1d435068c0c31df52.png]

it indicate the driver release the unused partition metadata in memory
quickly

I wait for some make better suggestions from users of the mail group, thank
you!