You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Arkadiusz Kołodziejski <ar...@gmail.com> on 2017/08/30 14:53:16 UTC

Airflow + MsSQL database hangs

Hello,

I tried to use Airflow 1.8.2RC2 and 1.8.2.RC4 with MSSQL database.
Unfortunately I always got Airflow Scheduler hangs. It is always after a
few minutes after 'airflow scheduler ' process start. I would like to share
results of my investigation of this problem.



Here is example stacktrace from hanged  process:

Current thread 0x00007f08faed0700 (most recent call first):

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
line 440 in do_rollback

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/pool.py",
line 829 in _reset

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/pool.py",
line 687 in _finalize_fairy

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/pool.py",
line 811 in _checkin

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/pool.py",
line 960 in close

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/engine/base.py",
line 859 in close

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/orm/session.py",
line 542 in close

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/orm/session.py",
line 473 in commit

  File
"/home/administrator/code/wf/workflow-airflow/venv/lib/python3.5/site-packages/sqlalchemy/orm/session.py",
line 906 in commit

  File
"/home/administrator/code/wf/workflow-airflow/venv/src/apache-airflow/airflow/jobs.py",
line 161 in heartbeat

  File
"/home/administrator/code/wf/workflow-airflow/venv/src/apache-airflow/airflow/jobs.py",
line 1454 in _execute_helper

  File
"/home/administrator/code/wf/workflow-airflow/venv/src/apache-airflow/airflow/jobs.py",
line 1311 in _execute

  File
"/home/administrator/code/wf/workflow-airflow/venv/src/apache-airflow/airflow/jobs.py",
line 201 in run

  File
"/home/administrator/code/wf/workflow-airflow/venv/src/apache-airflow/airflow/bin/cli.py",
line 882 in scheduler

  File
"/home/administrator/code/wf/workflow-airflow/venv/src/apache-airflow/airflow/bin/airflow",
line 28 in <module>

  File "/home/administrator/code/wf/workflow-airflow/venv/bin/airflow",
line 6 in <module>



But sometimes hangs on other method of cursor.



Facts:

   - Hangs on MSSQL DB and do not on POSTGRESQL
   - Hangs on remote DB connections , i do not observe this on local DB
   connections
   - Hangs with pymssql and pyodbc dialects
   - Hangs in 1.8.2RC2 and 1.8.2RC4
   - Hangs with SQLAlchemy Engine StaticPool, SingletonThreadPool and
   QueuePool
   - *Works with NullPool* ( new connection on get from pool)



I tried to use NullPool ( change in Airflow) but creating over 1000
connections in minutes is to high time overhead.



Has anyone faced this kind on probles with MSSQL DB ?


Thanks,

Arek


================
 I am an Intel employee. All comments and opinions are my own and do not
represent the views of Intel.

Re: Airflow + MsSQL database hangs

Posted by Arkadiusz Kołodziejski <ar...@gmail.com>.
Tried decreasing sql_alchemy_pool_size, parallelism, dag_concurrency to 1
and still Hanging
Tried row versioning-based isolation levels on DB and still Hanging.

NullPoll is working but has not acceptable performance.

Thanks,
Arek


2017-08-30 23:10 GMT+02:00 Ruslan Dautkhanov <da...@gmail.com>:

> *to start the tuning process )
>
>
>
> --
> Ruslan Dautkhanov
>
> On Wed, Aug 30, 2017 at 3:10 PM, Ruslan Dautkhanov <da...@gmail.com>
> wrote:
>
> > >> I tried to use NullPool ( change in Airflow) but creating over 1000
> > >> connections in minutes is to high time overhead.
> >
> > It might be a tuning problem with your backend mssql database.
> > Try decreasing sql_alchemy_pool_size, parallelism, dag_concurrency to a
> > fairly small values to see if you can reproduce this issue?
> >
> > Out of the box SQL Server not necessarily scales well (unlike Oracle for
> > example).
> > SQL Server defaults to blocking selects on uncommitted data.
> > It seems in that exception stack hanging happens in commits.
> > For highly concurrent workloads it's recommended row versioning-based
> > isolation levels to start the runing process
> > https://msdn.microsoft.com/en-us/library/ms175095.aspx
> >
> >
> >
> > --
> > Ruslan Dautkhanov
> >
> > On Wed, Aug 30, 2017 at 8:53 AM, Arkadiusz Kołodziejski <
> > arkadiusz.kolodziejski@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> I tried to use Airflow 1.8.2RC2 and 1.8.2.RC4 with MSSQL database.
> >> Unfortunately I always got Airflow Scheduler hangs. It is always after a
> >> few minutes after 'airflow scheduler ' process start. I would like to
> >> share
> >> results of my investigation of this problem.
> >>
> >>
> >>
> >> Here is example stacktrace from hanged  process:
> >>
> >> Current thread 0x00007f08faed0700 (most recent call first):
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/engine/default.py",
> >> line 440 in do_rollback
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/pool.py",
> >> line 829 in _reset
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/pool.py",
> >> line 687 in _finalize_fairy
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/pool.py",
> >> line 811 in _checkin
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/pool.py",
> >> line 960 in close
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/engine/base.py",
> >> line 859 in close
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/orm/session.py",
> >> line 542 in close
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/orm/session.py",
> >> line 473 in commit
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
> >> n3.5/site-packages/sqlalchemy/orm/session.py",
> >> line 906 in commit
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
> >> e-airflow/airflow/jobs.py",
> >> line 161 in heartbeat
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
> >> e-airflow/airflow/jobs.py",
> >> line 1454 in _execute_helper
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
> >> e-airflow/airflow/jobs.py",
> >> line 1311 in _execute
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
> >> e-airflow/airflow/jobs.py",
> >> line 201 in run
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
> >> e-airflow/airflow/bin/cli.py",
> >> line 882 in scheduler
> >>
> >>   File
> >> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
> >> e-airflow/airflow/bin/airflow",
> >> line 28 in <module>
> >>
> >>   File "/home/administrator/code/wf/workflow-airflow/venv/bin/airflow",
> >> line 6 in <module>
> >>
> >>
> >>
> >> But sometimes hangs on other method of cursor.
> >>
> >>
> >>
> >> Facts:
> >>
> >>    - Hangs on MSSQL DB and do not on POSTGRESQL
> >>    - Hangs on remote DB connections , i do not observe this on local DB
> >>    connections
> >>    - Hangs with pymssql and pyodbc dialects
> >>    - Hangs in 1.8.2RC2 and 1.8.2RC4
> >>    - Hangs with SQLAlchemy Engine StaticPool, SingletonThreadPool and
> >>    QueuePool
> >>    - *Works with NullPool* ( new connection on get from pool)
> >>
> >>
> >>
> >> I tried to use NullPool ( change in Airflow) but creating over 1000
> >> connections in minutes is to high time overhead.
> >>
> >>
> >>
> >> Has anyone faced this kind on probles with MSSQL DB ?
> >>
> >>
> >> Thanks,
> >>
> >> Arek
> >>
> >>
> >> ================
> >>  I am an Intel employee. All comments and opinions are my own and do not
> >> represent the views of Intel.
> >>
> >
> >
>

Re: Airflow + MsSQL database hangs

Posted by Ruslan Dautkhanov <da...@gmail.com>.
*to start the tuning process )



-- 
Ruslan Dautkhanov

On Wed, Aug 30, 2017 at 3:10 PM, Ruslan Dautkhanov <da...@gmail.com>
wrote:

> >> I tried to use NullPool ( change in Airflow) but creating over 1000
> >> connections in minutes is to high time overhead.
>
> It might be a tuning problem with your backend mssql database.
> Try decreasing sql_alchemy_pool_size, parallelism, dag_concurrency to a
> fairly small values to see if you can reproduce this issue?
>
> Out of the box SQL Server not necessarily scales well (unlike Oracle for
> example).
> SQL Server defaults to blocking selects on uncommitted data.
> It seems in that exception stack hanging happens in commits.
> For highly concurrent workloads it's recommended row versioning-based
> isolation levels to start the runing process
> https://msdn.microsoft.com/en-us/library/ms175095.aspx
>
>
>
> --
> Ruslan Dautkhanov
>
> On Wed, Aug 30, 2017 at 8:53 AM, Arkadiusz Kołodziejski <
> arkadiusz.kolodziejski@gmail.com> wrote:
>
>> Hello,
>>
>> I tried to use Airflow 1.8.2RC2 and 1.8.2.RC4 with MSSQL database.
>> Unfortunately I always got Airflow Scheduler hangs. It is always after a
>> few minutes after 'airflow scheduler ' process start. I would like to
>> share
>> results of my investigation of this problem.
>>
>>
>>
>> Here is example stacktrace from hanged  process:
>>
>> Current thread 0x00007f08faed0700 (most recent call first):
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/engine/default.py",
>> line 440 in do_rollback
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/pool.py",
>> line 829 in _reset
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/pool.py",
>> line 687 in _finalize_fairy
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/pool.py",
>> line 811 in _checkin
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/pool.py",
>> line 960 in close
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/engine/base.py",
>> line 859 in close
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/orm/session.py",
>> line 542 in close
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/orm/session.py",
>> line 473 in commit
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/lib/pytho
>> n3.5/site-packages/sqlalchemy/orm/session.py",
>> line 906 in commit
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
>> e-airflow/airflow/jobs.py",
>> line 161 in heartbeat
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
>> e-airflow/airflow/jobs.py",
>> line 1454 in _execute_helper
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
>> e-airflow/airflow/jobs.py",
>> line 1311 in _execute
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
>> e-airflow/airflow/jobs.py",
>> line 201 in run
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
>> e-airflow/airflow/bin/cli.py",
>> line 882 in scheduler
>>
>>   File
>> "/home/administrator/code/wf/workflow-airflow/venv/src/apach
>> e-airflow/airflow/bin/airflow",
>> line 28 in <module>
>>
>>   File "/home/administrator/code/wf/workflow-airflow/venv/bin/airflow",
>> line 6 in <module>
>>
>>
>>
>> But sometimes hangs on other method of cursor.
>>
>>
>>
>> Facts:
>>
>>    - Hangs on MSSQL DB and do not on POSTGRESQL
>>    - Hangs on remote DB connections , i do not observe this on local DB
>>    connections
>>    - Hangs with pymssql and pyodbc dialects
>>    - Hangs in 1.8.2RC2 and 1.8.2RC4
>>    - Hangs with SQLAlchemy Engine StaticPool, SingletonThreadPool and
>>    QueuePool
>>    - *Works with NullPool* ( new connection on get from pool)
>>
>>
>>
>> I tried to use NullPool ( change in Airflow) but creating over 1000
>> connections in minutes is to high time overhead.
>>
>>
>>
>> Has anyone faced this kind on probles with MSSQL DB ?
>>
>>
>> Thanks,
>>
>> Arek
>>
>>
>> ================
>>  I am an Intel employee. All comments and opinions are my own and do not
>> represent the views of Intel.
>>
>
>

Re: Airflow + MsSQL database hangs

Posted by Ruslan Dautkhanov <da...@gmail.com>.
>> I tried to use NullPool ( change in Airflow) but creating over 1000
>> connections in minutes is to high time overhead.

It might be a tuning problem with your backend mssql database.
Try decreasing sql_alchemy_pool_size, parallelism, dag_concurrency to a
fairly small values to see if you can reproduce this issue?

Out of the box SQL Server not necessarily scales well (unlike Oracle for
example).
SQL Server defaults to blocking selects on uncommitted data.
It seems in that exception stack hanging happens in commits.
For highly concurrent workloads it's recommended row versioning-based
isolation levels to start the runing process
https://msdn.microsoft.com/en-us/library/ms175095.aspx



-- 
Ruslan Dautkhanov

On Wed, Aug 30, 2017 at 8:53 AM, Arkadiusz Kołodziejski <
arkadiusz.kolodziejski@gmail.com> wrote:

> Hello,
>
> I tried to use Airflow 1.8.2RC2 and 1.8.2.RC4 with MSSQL database.
> Unfortunately I always got Airflow Scheduler hangs. It is always after a
> few minutes after 'airflow scheduler ' process start. I would like to share
> results of my investigation of this problem.
>
>
>
> Here is example stacktrace from hanged  process:
>
> Current thread 0x00007f08faed0700 (most recent call first):
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/engine/default.py",
> line 440 in do_rollback
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/pool.py",
> line 829 in _reset
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/pool.py",
> line 687 in _finalize_fairy
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/pool.py",
> line 811 in _checkin
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/pool.py",
> line 960 in close
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/engine/base.py",
> line 859 in close
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/orm/session.py",
> line 542 in close
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/orm/session.py",
> line 473 in commit
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/lib/
> python3.5/site-packages/sqlalchemy/orm/session.py",
> line 906 in commit
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/src/
> apache-airflow/airflow/jobs.py",
> line 161 in heartbeat
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/src/
> apache-airflow/airflow/jobs.py",
> line 1454 in _execute_helper
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/src/
> apache-airflow/airflow/jobs.py",
> line 1311 in _execute
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/src/
> apache-airflow/airflow/jobs.py",
> line 201 in run
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/src/
> apache-airflow/airflow/bin/cli.py",
> line 882 in scheduler
>
>   File
> "/home/administrator/code/wf/workflow-airflow/venv/src/
> apache-airflow/airflow/bin/airflow",
> line 28 in <module>
>
>   File "/home/administrator/code/wf/workflow-airflow/venv/bin/airflow",
> line 6 in <module>
>
>
>
> But sometimes hangs on other method of cursor.
>
>
>
> Facts:
>
>    - Hangs on MSSQL DB and do not on POSTGRESQL
>    - Hangs on remote DB connections , i do not observe this on local DB
>    connections
>    - Hangs with pymssql and pyodbc dialects
>    - Hangs in 1.8.2RC2 and 1.8.2RC4
>    - Hangs with SQLAlchemy Engine StaticPool, SingletonThreadPool and
>    QueuePool
>    - *Works with NullPool* ( new connection on get from pool)
>
>
>
> I tried to use NullPool ( change in Airflow) but creating over 1000
> connections in minutes is to high time overhead.
>
>
>
> Has anyone faced this kind on probles with MSSQL DB ?
>
>
> Thanks,
>
> Arek
>
>
> ================
>  I am an Intel employee. All comments and opinions are my own and do not
> represent the views of Intel.
>