You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/26 17:57:25 UTC

[GitHub] [airflow] MatthewRBruce commented on issue #13685: scheduler dies with "sqlalchemy.exc.IntegrityError: (MySQLdb._exceptions.IntegrityError) (1062, "Duplicate entry 'huge_demo13499411352-2021-01-15 01:04:00.000000' for key 'dag_run.dag_id'")"

MatthewRBruce commented on issue #13685:
URL: https://github.com/apache/airflow/issues/13685#issuecomment-767719815


   Sam and I have been looking at this, and so far here’s my theory:
   
   It appears this happens when the Scheduler is restarted/killed during an active DagRun.  Based on looking at the scheduling code, and the logs, If a DagRun is at it’s max_active_runs (indicated by a log line like this: {scheduler_job.py:1598} INFO - DAG airflow-utils.send-airflow-heartbeat is at (or above) max_active_runs (1 of 1), not creating any more runs ) then the next_dagrun_create_after field for that DagRun will be set to None/NULL  here:
   https://github.com/apache/airflow/blob/2.0.0/airflow/jobs/scheduler_job.py#L1604
   
   Now if the scheduler is restarted/killed before this Dag Finishes and it starts back up, the DagModel will be returned by `DagModel.dags_needing_dagrun(session)` here:
   https://github.com/apache/airflow/blob/2.0.0/airflow/jobs/scheduler_job.py#L1473-L1474
   and then a new DagRun will be created for it, resulting in a violation of the Unique Key Constraint
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org