You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by siddharth anand <sa...@apache.org> on 2016/10/01 08:03:39 UTC

Re: Airflow bugs but stays running

Hi Renaud,
I've never encountered this issue though I do run postgres & LocalExecutor
and am running 1.7.1.3 in all of my environments.

I'm running on master on my local dev machine. I changed the valid
sql_alchemy_conn
= postgresql://siddharth@localhost:5432/airflow to the invalid
sql_alchemy_conn = postgresql://siddharth@localhost:5432/airflowaaaaa.

When trying to start the scheduler and webserver, both exited immediately
with
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:
database "airflowaaaaa" does not exist

It looks like your problem is that the scheduler keeps trying to
reestablish a connection and expects the problem to be transient. Why would
restarting the process via supervisord solve your problem? Also, isn't the
flaky dns resolver issue your core concern? You can open a JIRA to track
this, but more information is needed to

-s

On Fri, Sep 30, 2016 at 8:07 AM, Renaud Grisoni <re...@gmail.com>
wrote:

> Hi all,
>
> I use Airflow v1.7.1.3 with the local scheduler and I encounter a problem
> with the scheduler :
> For some reason, the airflow database is no more accessible so the
> scheduler display the OperationalError below. My problem is the scheduler
> does not kill itself after this error, it is running but it does not run
> any DAG any more. I cannot automatically restart it with Supervisor because
> its process is always displayed as runnning. Each time I have a network
> error, Airflow display this error and enters in this "zombie" mode, and my
> DAG are not processed.
>
> Have you heard about this problem, any suggestions?
>
>
>
> 29/09/2016 21:09:53Traceback (most recent call last):
> 29/09/2016 21:09:53  File "/usr/bin/airflow", line 15, in <module>
> 29/09/2016 21:09:53    args.func(args)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 455, in
> scheduler
> 29/09/2016 21:09:53    job.run()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 173, in run
> 29/09/2016 21:09:53    self._execute()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 712, in _execute
> 29/09/2016 21:09:53    paused_dag_ids = dagbag.paused_dags()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/airflow/models.py", line 429, in
> paused_dags
> 29/09/2016 21:09:53    DagModel.is_paused == True)]
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2761, in
> __iter__
> 29/09/2016 21:09:53    return self._execute_and_instances(context)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2774, in
> _execute_and_instances
> 29/09/2016 21:09:53    close_with_result=True)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2765, in
> _connection_from_session
> 29/09/2016 21:09:53    **kw)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 893, in
> connection
> 29/09/2016 21:09:53    execution_options=execution_options)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 898, in
> _connection_for_bind
> 29/09/2016 21:09:53    engine, execution_options)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 334, in
> _connection_for_bind
> 29/09/2016 21:09:53    conn = bind.contextual_connect()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2039,
> in
> contextual_connect
> 29/09/2016 21:09:53    self._wrap_pool_connect(self.pool.connect, None),
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078,
> in
> _wrap_pool_connect
> 29/09/2016 21:09:53    e, dialect, self)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1405,
> in
> _handle_dbapi_exception_noconnection
> 29/09/2016 21:09:53    exc_info
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in
> raise_from_cause
> 29/09/2016 21:09:53    reraise(type(exception), exception, tb=exc_tb,
> cause=cause)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2074,
> in
> _wrap_pool_connect
> 29/09/2016 21:09:53    return fn()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 376, in
> connect
> 29/09/2016 21:09:53    return _ConnectionFairy._checkout(self)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 713, in
> _checkout
> 29/09/2016 21:09:53    fairy = _ConnectionRecord.checkout(pool)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 485, in
> checkout
> 29/09/2016 21:09:53    rec.checkin()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line
> 60,
> in __exit__
> 29/09/2016 21:09:53    compat.reraise(exc_type, exc_value, exc_tb)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 482, in
> checkout
> 29/09/2016 21:09:53    dbapi_connection = rec.get_connection()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 563, in
> get_connection
> 29/09/2016 21:09:53    self.connection = self.__connect()
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 607, in
> __connect
> 29/09/2016 21:09:53    connection = self.__pool._invoke_creator(self)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line
> 97, in connect
> 29/09/2016 21:09:53    return dialect.connect(*cargs, **cparams)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 385,
> in connect
> 29/09/2016 21:09:53    return self.dbapi.connect(*cargs, **cparams)
> 29/09/2016 21:09:53  File
> "/usr/lib/python2.7/site-packages/psycopg2/__init__.py", line 164, in
> connect
> 29/09/2016 21:09:53    conn = _connect(dsn,
> connection_factory=connection_factory, async=async)
> 29/09/2016 21:09:53sqlalchemy.exc.OperationalError:
> (psycopg2.OperationalError) could not translate host name "db-airflow" to
> address: Name does not resolve
>

Re: Airflow bugs but stays running

Posted by siddharth anand <sa...@apache.org>.
... sent too soon...

but, more info is needed to reproduce on our side. What version of Postgres
are you running and what is your env (e.g. cloud), etc...?

On Sat, Oct 1, 2016 at 1:03 AM, siddharth anand <sa...@apache.org> wrote:

> Hi Renaud,
> I've never encountered this issue though I do run postgres & LocalExecutor
> and am running 1.7.1.3 in all of my environments.
>
> I'm running on master on my local dev machine. I changed the valid sql_alchemy_conn
> = postgresql://siddharth@localhost:5432/airflow to the invalid
> sql_alchemy_conn = postgresql://siddharth@localhost:5432/airflowaaaaa.
>
> When trying to start the scheduler and webserver, both exited immediately
> with
> sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:
> database "airflowaaaaa" does not exist
>
> It looks like your problem is that the scheduler keeps trying to
> reestablish a connection and expects the problem to be transient. Why would
> restarting the process via supervisord solve your problem? Also, isn't the
> flaky dns resolver issue your core concern? You can open a JIRA to track
> this, but more information is needed to
>
> -s
>
> On Fri, Sep 30, 2016 at 8:07 AM, Renaud Grisoni <re...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I use Airflow v1.7.1.3 with the local scheduler and I encounter a problem
>> with the scheduler :
>> For some reason, the airflow database is no more accessible so the
>> scheduler display the OperationalError below. My problem is the scheduler
>> does not kill itself after this error, it is running but it does not run
>> any DAG any more. I cannot automatically restart it with Supervisor
>> because
>> its process is always displayed as runnning. Each time I have a network
>> error, Airflow display this error and enters in this "zombie" mode, and my
>> DAG are not processed.
>>
>> Have you heard about this problem, any suggestions?
>>
>>
>>
>> 29/09/2016 21:09:53Traceback (most recent call last):
>> 29/09/2016 21:09:53  File "/usr/bin/airflow", line 15, in <module>
>> 29/09/2016 21:09:53    args.func(args)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 455, in
>> scheduler
>> 29/09/2016 21:09:53    job.run()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 173, in run
>> 29/09/2016 21:09:53    self._execute()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 712, in _execute
>> 29/09/2016 21:09:53    paused_dag_ids = dagbag.paused_dags()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/airflow/models.py", line 429, in
>> paused_dags
>> 29/09/2016 21:09:53    DagModel.is_paused == True)]
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2761, in
>> __iter__
>> 29/09/2016 21:09:53    return self._execute_and_instances(context)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2774, in
>> _execute_and_instances
>> 29/09/2016 21:09:53    close_with_result=True)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2765, in
>> _connection_from_session
>> 29/09/2016 21:09:53    **kw)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 893,
>> in
>> connection
>> 29/09/2016 21:09:53    execution_options=execution_options)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 898,
>> in
>> _connection_for_bind
>> 29/09/2016 21:09:53    engine, execution_options)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 334,
>> in
>> _connection_for_bind
>> 29/09/2016 21:09:53    conn = bind.contextual_connect()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2039,
>> in
>> contextual_connect
>> 29/09/2016 21:09:53    self._wrap_pool_connect(self.pool.connect, None),
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078,
>> in
>> _wrap_pool_connect
>> 29/09/2016 21:09:53    e, dialect, self)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1405,
>> in
>> _handle_dbapi_exception_noconnection
>> 29/09/2016 21:09:53    exc_info
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202,
>> in
>> raise_from_cause
>> 29/09/2016 21:09:53    reraise(type(exception), exception, tb=exc_tb,
>> cause=cause)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2074,
>> in
>> _wrap_pool_connect
>> 29/09/2016 21:09:53    return fn()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 376, in
>> connect
>> 29/09/2016 21:09:53    return _ConnectionFairy._checkout(self)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 713, in
>> _checkout
>> 29/09/2016 21:09:53    fairy = _ConnectionRecord.checkout(pool)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 485, in
>> checkout
>> 29/09/2016 21:09:53    rec.checkin()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line
>> 60,
>> in __exit__
>> 29/09/2016 21:09:53    compat.reraise(exc_type, exc_value, exc_tb)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 482, in
>> checkout
>> 29/09/2016 21:09:53    dbapi_connection = rec.get_connection()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 563, in
>> get_connection
>> 29/09/2016 21:09:53    self.connection = self.__connect()
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 607, in
>> __connect
>> 29/09/2016 21:09:53    connection = self.__pool._invoke_creator(self)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line
>> 97, in connect
>> 29/09/2016 21:09:53    return dialect.connect(*cargs, **cparams)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line
>> 385,
>> in connect
>> 29/09/2016 21:09:53    return self.dbapi.connect(*cargs, **cparams)
>> 29/09/2016 21:09:53  File
>> "/usr/lib/python2.7/site-packages/psycopg2/__init__.py", line 164, in
>> connect
>> 29/09/2016 21:09:53    conn = _connect(dsn,
>> connection_factory=connection_factory, async=async)
>> 29/09/2016 21:09:53sqlalchemy.exc.OperationalError:
>> (psycopg2.OperationalError) could not translate host name "db-airflow" to
>> address: Name does not resolve
>>
>
>