You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2016/06/10 17:47:21 UTC

[jira] [Commented] (AIRFLOW-233) Detached DagRun error in scheduler loop

    [ https://issues.apache.org/jira/browse/AIRFLOW-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324943#comment-15324943 ] 

Bolke de Bruin commented on AIRFLOW-233:
----------------------------------------

The PR from #224 resolves this. It's some weird sqlalchemy stuff going on that I cant seem to debug as it does not happen in the debugger. I'm suspecting the order of the queries are run gets them in a detached state. By using make_transient the whole issue is gone and it actually reduces the amount of queries to the db.



> Detached DagRun error in scheduler loop
> ---------------------------------------
>
>                 Key: AIRFLOW-233
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-233
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun, scheduler
>         Environment: Airflow master (git log below), Postgres backend, LocalExecutor
> {code}
> b7def7f1f9a97d584e9076cdad48287e652a2d41 [AIRFLOW-142] setup_env.sh doesn't download hive tarball if hdp is specified as distro
> 0bd5515a42f7912b0d4ac8bf33dec2f01539b555 [AIRFLOW-218] Added option to enable webserver gunicorn access/err logs
> 80210b2bd768668e55e498995a3820900d9119ba Merge pull request #1569 from mistercrunch/docs
> {code}
>            Reporter: Jeremiah Lowin
>            Assignee: Bolke de Bruin
>
> Running Airflow master, every scheduler loop has at least one detached DagRun error. This is the output:
> {code}
> [2016-06-10 09:41:54,772] {jobs.py:669} ERROR - Instance <DagRun at 0x10ab80dd8> is not bound to a Session; attribute refresh operation cannot proceed
> Traceback (most recent call last):
>   File "/Users/jlowin/git/airflow/airflow/jobs.py", line 666, in _do_dags
>     self.process_dag(dag, tis_out)
>   File "/Users/jlowin/git/airflow/airflow/jobs.py", line 524, in process_dag
>     State.UP_FOR_RETRY))
>   File "/Users/jlowin/git/airflow/airflow/utils/db.py", line 53, in wrapper
>     result = func(*args, **kwargs)
>   File "/Users/jlowin/git/airflow/airflow/models.py", line 3387, in get_task_instances
>     TI.dag_id == self.dag_id,
>   File "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__
>     return self.impl.get(instance_state(instance), dict_)
>   File "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", line 578, in get
>     value = state._load_expired(state, passive)
>   File "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 474, in _load_expired
>     self.manager.deferred_scalar_loader(self, toload)
>   File "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 610, in load_scalar_attributes
>     (state_str(state)))
> sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x10ab80dd8> is not bound to a Session; attribute refresh operation cannot proceed
> {code}
> This is the test DAG in question:
> {code}
> from airflow import DAG
> from airflow.operators import PythonOperator
> from datetime import datetime
> import logging
> import time
> default_args = {
>     'owner': 'airflow',
>     'depends_on_past': False,
>     'start_date': datetime(2016, 4, 24),
> }
> dag_name = 'dp_test'
> dag = DAG(
>         dag_name,
>         default_args=default_args,
>         schedule_interval='*/2 * * * *')
> def cb(**kw):
>         time.sleep(2)
>         logging.info('Done %s' % kw['ds'])
> d = PythonOperator(task_id="delay", provide_context=True, python_callable=cb, dag=dag)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)