You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Tornike Gurgenidze <to...@freeuni.edu.ge> on 2022/07/28 21:23:20 UTC

SparkSubmitHook - high memory consumption in YARN cluster mode

Hi all,

I opened a ticket (https://github.com/apache/airflow/issues/24171) a while
back and I just want to make sure that it got stale deservedly :)

We used to have an issue with memory consumption on Airflow celery workers
where tasks were often killed by OOM killer. Most of our workload was
running Spark jobs in Yarn cluster mode using SparkSubmitHook. The main
driver for the high memory consumption were spark-submit processes, that
took about 500mb of memory each even though in yarn cluster mode they were
doing essentially nothing. We changed the hook to kill spark-submit process
right after Yarn accepts the application and track the status with "yarn
application -status" calls instead similar to how spark standalone mode is
being tracked right now and OOM issues went away.

It seems like an issue lots of other users with similar usage pattern
should probably be experiencing, unless they have unnecessarily large
memory allocated to Airflow workers. I want to know if anyone else has had
a similar experience. Is it worth it to work on including our fix in the
upstream repo? Or maybe everyone else has already switched to managed Spark
services and it's just us? :)
--
Tornike

Re: SparkSubmitHook - high memory consumption in YARN cluster mode

Posted by Tornike Gurgenidze <to...@freeuni.edu.ge>.
You're right. That will probably make the code a lot simpler. thank you.

On Fri, Jul 29, 2022 at 11:59 AM Jeff Zhang <zj...@gmail.com> wrote:

> You don't need to kill spark-submit process by yourself, just configure
> this spark conf spark.yarn.submit.waitAppCompletion to be false, then
> spark submit process will exit right after yarn accepts it.
>
> On Fri, Jul 29, 2022 at 5:23 AM Tornike Gurgenidze <
> togurg14@freeuni.edu.ge> wrote:
>
>> Hi all,
>>
>> I opened a ticket (https://github.com/apache/airflow/issues/24171) a
>> while back and I just want to make sure that it got stale deservedly :)
>>
>> We used to have an issue with memory consumption on Airflow celery
>> workers where tasks were often killed by OOM killer. Most of our workload
>> was running Spark jobs in Yarn cluster mode using SparkSubmitHook. The main
>> driver for the high memory consumption were spark-submit processes, that
>> took about 500mb of memory each even though in yarn cluster mode they were
>> doing essentially nothing. We changed the hook to kill spark-submit process
>> right after Yarn accepts the application and track the status with "yarn
>> application -status" calls instead similar to how spark standalone mode is
>> being tracked right now and OOM issues went away.
>>
>> It seems like an issue lots of other users with similar usage pattern
>> should probably be experiencing, unless they have unnecessarily large
>> memory allocated to Airflow workers. I want to know if anyone else has
>> had a similar experience. Is it worth it to work on including our fix in
>> the upstream repo? Or maybe everyone else has already switched to managed
>> Spark services and it's just us? :)
>> --
>> Tornike
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

--
Tornike

Re: SparkSubmitHook - high memory consumption in YARN cluster mode

Posted by Jeff Zhang <zj...@gmail.com>.
You don't need to kill spark-submit process by yourself, just configure
this spark conf spark.yarn.submit.waitAppCompletion to be false, then spark
submit process will exit right after yarn accepts it.

On Fri, Jul 29, 2022 at 5:23 AM Tornike Gurgenidze <to...@freeuni.edu.ge>
wrote:

> Hi all,
>
> I opened a ticket (https://github.com/apache/airflow/issues/24171) a
> while back and I just want to make sure that it got stale deservedly :)
>
> We used to have an issue with memory consumption on Airflow celery
> workers where tasks were often killed by OOM killer. Most of our workload
> was running Spark jobs in Yarn cluster mode using SparkSubmitHook. The main
> driver for the high memory consumption were spark-submit processes, that
> took about 500mb of memory each even though in yarn cluster mode they were
> doing essentially nothing. We changed the hook to kill spark-submit process
> right after Yarn accepts the application and track the status with "yarn
> application -status" calls instead similar to how spark standalone mode is
> being tracked right now and OOM issues went away.
>
> It seems like an issue lots of other users with similar usage pattern
> should probably be experiencing, unless they have unnecessarily large
> memory allocated to Airflow workers. I want to know if anyone else has
> had a similar experience. Is it worth it to work on including our fix in
> the upstream repo? Or maybe everyone else has already switched to managed
> Spark services and it's just us? :)
> --
> Tornike
>
>

-- 
Best Regards

Jeff Zhang