You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by an...@gmail.com, an...@gmail.com on 2018/12/06 14:25:33 UTC

Large number of SQL Connection from airflow scheduler

HI,

I have a query. It will be great if somebody in the community can clarify it.

Basically, for each dag_processing process, I can see the orm being reconfigured and disposed towards the end. This is creating and closing a lot of connections on MySQL server, which crashes in scenarios where there are a lot of new dags to process.

I want to know if this is the intended behavior and why? Also what kind of improvement can be planned in this area so that we can reuse the connection pool amongst these processes.

Thanks,
Anand

Re: Large number of SQL Connection from airflow scheduler

Posted by Jarek Potiuk <Ja...@polidea.com>.
What might also cause a big number of connections is using Airflow
Variables at the top level of the parsed DAGs. There was a discussion
earlier in the devlist (
https://lists.apache.org/thread.html/d56088211663cefc0c0311ca2b980b79e0234f9cad20082da5bc6358@%3Cdev.airflow.apache.org%3E)
that with each variable a new database connection is established. So maybe
that's the root cause of the problem?

J.

On Fri, Dec 7, 2018 at 4:20 AM Kevin Yang <yr...@gmail.com> wrote:

> Hi Anand,
> From my experience that reusing connection pool across those parsing
> processes would be dangerous( recall I got some connection corruption error
> because of that). If you find that to be too many connection you can
> probably tune down the max_threads
> <
> https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L507
> >
> to
> reduce the number of parsing processes or make the scheduler parse slower
> by tune down min_file_process_interval
> <
> https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L464
> >
>
> Cheers,
> Kevin Y
>
> On Thu, Dec 6, 2018 at 5:53 PM anandgupta1412@gmail.com <
> anandgupta1412@gmail.com> wrote:
>
> > HI,
> >
> > I have a query. It will be great if somebody in the community can clarify
> > it.
> >
> > Basically, for each dag_processing process, I can see the orm being
> > reconfigured and disposed towards the end. This is creating and closing a
> > lot of connections on MySQL server, which crashes in scenarios where
> there
> > are a lot of new dags to process.
> >
> > I want to know if this is the intended behavior and why? Also what kind
> of
> > improvement can be planned in this area so that we can reuse the
> connection
> > pool amongst these processes.
> >
> > Thanks,
> > Anand
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com
[image: Polidea] <https://www.polidea.com/>

We create human & business stories through technology.
Check out our projects! <https://www.polidea.com/our-work>
[image: Github] <https://github.com/Polidea> [image: Facebook]
<https://www.facebook.com/Polidea.Software> [image: Twitter]
<https://twitter.com/polidea> [image: Linkedin]
<https://www.linkedin.com/company/polidea> [image: Instagram]
<https://instagram.com/polidea> [image: Behance]
<https://www.behance.net/polidea>

Re: Large number of SQL Connection from airflow scheduler

Posted by Kevin Yang <yr...@gmail.com>.
Hi Anand,
From my experience that reusing connection pool across those parsing
processes would be dangerous( recall I got some connection corruption error
because of that). If you find that to be too many connection you can
probably tune down the max_threads
<https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L507>
to
reduce the number of parsing processes or make the scheduler parse slower
by tune down min_file_process_interval
<https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L464>

Cheers,
Kevin Y

On Thu, Dec 6, 2018 at 5:53 PM anandgupta1412@gmail.com <
anandgupta1412@gmail.com> wrote:

> HI,
>
> I have a query. It will be great if somebody in the community can clarify
> it.
>
> Basically, for each dag_processing process, I can see the orm being
> reconfigured and disposed towards the end. This is creating and closing a
> lot of connections on MySQL server, which crashes in scenarios where there
> are a lot of new dags to process.
>
> I want to know if this is the intended behavior and why? Also what kind of
> improvement can be planned in this area so that we can reuse the connection
> pool amongst these processes.
>
> Thanks,
> Anand
>