You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/16 04:08:42 UTC
[GitHub] [airflow] JCoder01 opened a new issue #13099: Unable to start scheduler after stopped
JCoder01 opened a new issue #13099:
URL: https://github.com/apache/airflow/issues/13099
**Apache Airflow version**: 2.0.0rc3
**Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
**Environment**:
- **Cloud provider or hardware configuration**: Linux
- **OS** (e.g. from /etc/os-release): Ubuntu
- **Kernel** (e.g. `uname -a`):
- **Install tools**:
- **Others**:
**What happened**:
After shutting down the scheduler, while tasks were in running state, trying to restart the scheduler results in pk violations..
`[2020-12-15 22:43:29,673] {scheduler_job.py:1293} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(example_task_group, scheduled__2020-12-14T04:31:00+00:00) already exists.
`
**What you expected to happen**:
Scheduler restarts and picks up where it left off.
**How to reproduce it**:
Set example dag ( I used task_group) to schedule_interval `* * * * *` and start the scheduler and let it run for a few minutes.
Shut down the scheduler
Attempt to restart the scheduler
**Anything else we need to know**:
I came across this doing testing using the LocalExecutor in a virtual env. If no else is able to reproduce it, I'll try again in a clean virtual env.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-768315621
Looks like this is where the Scheduler runs Plugin code:
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1377
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1439
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1474
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1568
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/dagbag.py#L171
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/dagbag.py#L229
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/serialized_dag.py#L167
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/serialization/serialized_objects.py#L658
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/serialization/serialized_objects.py#L412
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L346
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L276
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L222
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748034037
I'm still trying to work through this and am slowing moving dags over from the original environment where the error occured to the clean environment. While I can't seem seem to get the database into the troubled state on it's own, I can force it into that state by stopping the scheduler, and running the below. Then when you start the scheduler, you get the pk violation.
```
update dag set next_dagrun = (select max(execution_date) from dag_run where dag_id = 'example_task_group')
where dag_id = 'example_task_group2'
```
I think what is happening is if you have a very small number of dags, in the time it takes for the scheduler to throw the error, one of the parser processes updates the backend db with the correct `next_dagrun` and starting the scheduler again works fine. As the number of dags grow, the chances that the problematic dag gets updated before the scheduler shuts down due the pk violation decreases, so the error persists until the you are lucky enough to get the problematic dag parsed.
So while it's not clear how the database can get _into_ this state, would it make sense to add some "self healing" in the scheduler start up to reparse all the dags on startup? Or maybe rather than bailing there is some error handling in the scheduler that if a pk violation does arise, the dag is reparsed and tried to be scheduled again?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746173327
Here's the whole thing.
```
ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(example_task_group, scheduled__2020-12-14T04:32:00+00:00) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
self._create_dag_runs(query.all(), session)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1567, in _create_dag_runs
creating_job_id=self.id,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/models/dag.py", line 1776, in create_dagrun
session.flush()
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2536, in flush
self._flush(objects)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2678, in _flush
transaction.rollback(_capture_exception=True)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
with_traceback=exc_tb,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2638, in _flush
flush_context.execute()
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(example_task_group, scheduled__2020-12-14T04:32:00+00:00) already exists.
[SQL: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, creating_job_id, external_trigger, run_type, conf, last_scheduling_decision, dag_hash) VALUES (%(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, %(creat
ing_job_id)s, %(external_trigger)s, %(run_type)s, %(conf)s, %(last_scheduling_decision)s, %(dag_hash)s) RETURNING dag_run.id]
[parameters: {'dag_id': 'example_task_group', 'execution_date': datetime.datetime(2020, 12, 14, 4, 32, tzinfo=Timezone('UTC')), 'start_date': datetime.datetime(2020, 12, 16, 11, 5, 22, 127201, tzinfo=Timezone('UTC')), 'end_date': None, 'state': 'running', 'run_id': 'schedule
d__2020-12-14T04:32:00+00:00', 'creating_job_id': 701, 'external_trigger': False, 'run_type': <DagRunType.SCHEDULED: 'scheduled'>, 'conf': <psycopg2.extensions.Binary object at 0x7fc17b8ee4e0>, 'last_scheduling_decision': None, 'dag_hash': '1628bc132c765d9b68a06841eedfbc7d'}
]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-768315621
Looks like this is where the Scheduler runs Plugin code:
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1275
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1377
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1439
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1474
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1568
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/dagbag.py#L171
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/dagbag.py#L229
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/serialized_dag.py#L167
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/serialization/serialized_objects.py#L658
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/serialization/serialized_objects.py#L412
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L346
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L276
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L222
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] satishcb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
satishcb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747758471
We are using Airflow 1.10.9. We are calling the dag REST endpoint to trigger. Intermittently we see the call fails. In the postgres logs we see
`
ERROR: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(ext_trigger_create_cbid_profile_for_duns, manual__2020-12-17T22:39:06.726412+00:00) already exists.
STATEMENT: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, external_trigger, conf) VALUES ('ext_trigger_create_cbid_profile_for_duns', '2020-12-17T22:39:06.726412+00:00'::timestamptz, '2020-12-17T22:39:06.962088+00:00'::timestamptz, NULL, 'running', 'manual__2020-12-17T22:39:06.726412+00:00', true, ....
`
Since it is related to error reporting it here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748065707
Yeah, I think at the very least we should catch the unique constraint violation and ignore it/carry on.
That said, I'd like to work out _how_ this is possible, as I thought every update to the DAG model was behind a row-level lock, so this sort of failure wasn't meant to be possible.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748273956
Oh interesting. Just by putting that in a plugin things break?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil removed a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil removed a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163680
> ```
> airflow scheduler
> ```
I was not able to reproduce it with Postgres 9.6 and 12
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-754589174
Happy New Year @ashb
There is a little bit of both. There is an outstanding issue where use of Variables (or probably anything that interacts with the backed db in the global scope) raises the `UNEXPECTED COMMIT` RunTime error.
The problem I reported with being unable to restart the scheduler seems to have been caused by wrapping my `Variable.get` in an overly generous try/except, hiding the run time error and causing things in the db to get out of sync.
Running the below should recreate the issue, I did this on python 3.7
```
mkdir airflow_scheduler_test
cd airflow_scheduler_test
export AIRFLOW_HOME=$(pwd)
virtualenv pyenv
source pyenv/bin/activate
pip install apache-airflow==2.0.0
airflow db init
airflow variables set test TEST
mkdir plugins
cat << EOF > plugins/test.py
from airflow.models.variable import Variable
print(Variable.get('test'))
EOF
airflow dags unpause example_bash_operator
airflow scheduler
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746129511
@JCoder01 Few more questions. Maybe you can provide as much information about the deployment you have.
1) which version of Postgres are you using?
2) what kind of deployment you have - is the database managed/on premise/remote
3) do you use an Database HA
4) was it a single-scheduler or multiple scheduler case?
5) Did you use our image or ca container, or installed airflow manually?
6) Is there a possibility you still had for example another (old?) version of airflow still accessing the database at the same time?
7) Did you migrate from airflow 1.10 or was it a fresh installation
Generally - as much information as possible about your case would be tremendously useful to investigate it!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746129511
@JCoder01 Few more questions. Maybe you can provide as much information about the deployment you have.
1) which version of Postgres are you using?
2) what kind of deployment you have - is the database managed/on premise/remote
3) do you use any Database HA? Active/Passive?
4) was it a single-scheduler or multiple scheduler case?
5) Did you use our image or another container, or installed airflow manually in host/virtualenv?
6) Is there a possibility you still had for example another (old?) version of airflow still accessing the database at the same time?
7) Did you migrate from airflow 1.10 or was it a fresh installation
Generally - as much information as possible about your case would be tremendously useful to investigate it!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748067032
> We are using Airflow 1.10.9. We are calling the dag REST endpoint to trigger. Intermittently we see the call fails. In the postgres logs we see
> `ERROR: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key" DETAIL: Key (dag_id, run_id)=(ext_trigger_create_cbid_profile_for_duns, manual__2020-12-17T22:39:06.726412+00:00) already exists. STATEMENT: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, external_trigger, conf) VALUES ('ext_trigger_create_cbid_profile_for_duns', '2020-12-17T22:39:06.726412+00:00'::timestamptz, '2020-12-17T22:39:06.962088+00:00'::timestamptz, NULL, 'running', 'manual__2020-12-17T22:39:06.726412+00:00', true, ....`
>
> Since it is related to error reporting it here.
@satishcb This is something else -- That error is saying that the pair `(ext_trigger_create_cbid_profile_for_duns, manual__2020-12-17T22:39:06.726412+00:00)` already exists in the DB. Since that is from the API that likely means you have already triggered that exact pair before.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746168673
Do you happen to have the full stack trace (there might be more above or below that one you shared) so I can see where in the scheduler it is happening?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746276730
of course, sorry about that. I
```
[2020-12-15 22:43:15,642] {scheduler_job.py:181} INFO - Started process (PID=46978) to work on /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:15,644] {scheduler_job.py:629} INFO - Processing file /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py for tasks to queue
[2020-12-15 22:43:15,645] {logging_mixin.py:103} INFO - [2020-12-15 22:43:15,644] {dagbag.py:440} INFO - Filling up the DagBag from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:15,656] {scheduler_job.py:639} INFO - DAG(s) dict_keys(['example_task_group']) retrieved from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:15,658] {logging_mixin.py:103} INFO - [2020-12-15 22:43:15,658] {dag.py:1813} INFO - Sync 1 DAGs
[2020-12-15 22:43:15,690] {logging_mixin.py:103} INFO - [2020-12-15 22:43:15,690] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:21:00+00:00
[2020-12-15 22:43:15,708] {scheduler_job.py:189} INFO - Processing /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py took 0.070 seconds
[2020-12-15 22:43:16,449] {scheduler_job.py:181} INFO - Started process (PID=47049) to work on /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:16,450] {scheduler_job.py:629} INFO - Processing file /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py for tasks to queue
[2020-12-15 22:43:16,451] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,451] {dagbag.py:440} INFO - Filling up the DagBag from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:16,460] {scheduler_job.py:639} INFO - DAG(s) dict_keys(['example_task_group']) retrieved from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:16,481] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,481] {dag.py:1813} INFO - Sync 1 DAGs
[2020-12-15 22:43:16,501] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,501] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:21:00+00:00
[2020-12-15 22:43:16,522] {scheduler_job.py:189} INFO - Processing /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py took 0.078 seconds
[2020-12-15 22:43:16,524] {scheduler_job.py:181} INFO - Started process (PID=47075) to work on /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:16,526] {scheduler_job.py:629} INFO - Processing file /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py for tasks to queue
[2020-12-15 22:43:16,527] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,527] {dagbag.py:440} INFO - Filling up the DagBag from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:16,538] {scheduler_job.py:639} INFO - DAG(s) dict_keys(['example_task_group']) retrieved from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:16,580] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,580] {dag.py:1813} INFO - Sync 1 DAGs
[2020-12-15 22:43:16,606] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,606] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:21:00+00:00
[2020-12-15 22:43:16,628] {scheduler_job.py:189} INFO - Processing /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py took 0.108 seconds
[2020-12-15 22:43:18,346] {scheduler_job.py:181} INFO - Started process (PID=47308) to work on /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:18,348] {scheduler_job.py:629} INFO - Processing file /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py for tasks to queue
[2020-12-15 22:43:18,348] {logging_mixin.py:103} INFO - [2020-12-15 22:43:18,348] {dagbag.py:440} INFO - Filling up the DagBag from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:18,363] {scheduler_job.py:639} INFO - DAG(s) dict_keys(['example_task_group']) retrieved from /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py
[2020-12-15 22:43:18,365] {logging_mixin.py:103} INFO - [2020-12-15 22:43:18,365] {dag.py:1813} INFO - Sync 1 DAGs
[2020-12-15 22:43:18,402] {logging_mixin.py:103} INFO - [2020-12-15 22:43:18,401] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:30:00+00:00
[2020-12-15 22:43:18,427] {scheduler_job.py:189} INFO - Processing /home/jcoder/git/airflow_2.0/macs-dags/macs-dags/dags/example_task_group.py took 0.085 seconds```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748295089
But both happen when the scheduler loads plugins.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748294931
In this simple example, it gets called when the scheduler is initializing and loads plugins.
In my actual plug-in, when the view get initialized like this
v_appbuilder_view = TestAppBuilderBaseView()
The call to variables is made.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746182448
I can see in the database is this:
for dag_run:
|id|dag_id|execution_date|state|
|--|------|--------------|-----|
|353|example_task_group|2020-12-13 23:32:00|running|
and in dag, it looks like the scheduler didn't pick up that run?
|dag_id|next_dagrun|next_dagrun_create_after|
|------|-----------|------------------------|
|example_task_group|2020-12-13 23:32:00|2020-12-13 23:33:00|
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-745932599
Looks serious. If we can confirm that one, I am afraid it might lead to RC4 @kaxil @ashb if this is easily triggerable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163927
> Happy New Year @ashb
> There is a little bit of both. There is an outstanding issue where use of Variables (or probably anything that interacts with the backed db in the global scope) in a plugin raises the `UNEXPECTED COMMIT` RunTime error.
> The problem I reported with being unable to restart the scheduler seems to have been caused by wrapping my `Variable.get` in an overly generous try/except, hiding the run time error and causing things in the db to get out of sync.
> Running the below should recreate the issue, I did this on python 3.7
>
> ```
> mkdir airflow_scheduler_test
> cd airflow_scheduler_test
> export AIRFLOW_HOME=$(pwd)
> virtualenv pyenv
> source pyenv/bin/activate
> pip install apache-airflow==2.0.0
>
> airflow db init
> airflow variables set test TEST
> mkdir plugins
> cat << EOF > plugins/test.py
> from airflow.models.variable import Variable
>
> print(Variable.get('test'))
> EOF
>
> airflow dags unpause example_bash_operator
> airflow scheduler
> ```
I was not able to reproduce it with Postgres 9.6 and 12, haven't tried it with 13.1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747083535
I spent a bit more time on this today, and while I can occasionally re-create it in the original environment, I can't recreate it in a fresh install.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] SamWheating commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767714797
This looks like the same issue as https://github.com/apache/airflow/issues/13685
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil closed issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13099:
URL: https://github.com/apache/airflow/issues/13099
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746143343
1.) looks like PostgreSQL 13.1
2.) on prem, in docker, used only for personal testing
3.) No,
4.) single-scheduler
5.) manual install in virutual env
6.) as far as I know, no, there were no other versions accessing the database.
7.) Fresh install
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil removed a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil removed a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163927
I was not able to reproduce it with Postgres 9.6 and 12, haven't tried it with 13.1
**EDIT**: I was able to reproduce it with 2.0.0 and Sqlite
```
❯ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-01-26 12:46:09,634] {scheduler_job.py:1241} INFO - Starting the scheduler
[2021-01-26 12:46:09,634] {scheduler_job.py:1246} INFO - Processing each file at most -1 times
[2021-01-26 12:46:09,639] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 4646
[2021-01-26 12:46:09,640] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2021-01-26 12:46:09,645] {settings.py:52} INFO - Configured default timezone Timezone('UTC')
[2021-01-26 12:46:09,657] {scheduler_job.py:1805} INFO - Reset the following 3 orphaned TaskInstances:
<TaskInstance: scenario1_case2_01.tasks__1_of_10 2021-01-25 00:00:00+00:00 [queued]>
<TaskInstance: scenario4_case2_1_40.tasks__1_of_10 2021-01-25 00:00:00+00:00 [queued]>
<TaskInstance: scenario1_case2_02.tasks__2_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
[2021-01-26 12:46:09,661] {dag_processing.py:518} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2 ) when using sqlite. So we set parallelism to 1.
[2021-01-26 12:46:09,845] {plugins_manager.py:231} ERROR - UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/plugins_manager.py", line 222, in load_plugins_from_plugin_directory
loader.exec_module(mod)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/kaxilnaik/airflow/plugins/test.py", line 3, in <module>
print(Variable.get('test'))
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 123, in get
var_val = Variable.get_variable_from_secrets(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 181, in get_variable_from_secrets
var_val = secrets_backend.get_variable(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 32, in create_session
session.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
self.transaction.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 472, in _prepare_impl
self.session.dispatch.before_commit(self.session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/sqlalchemy.py", line 217, in _validate_commit
raise RuntimeError("UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!")
RuntimeError: UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
[2021-01-26 12:46:09,853] {plugins_manager.py:232} ERROR - Failed to import plugin /Users/kaxilnaik/airflow/plugins/test.py
... followed by
[2021-01-26 13:07:54,518] {scheduler_job.py:1293} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: dag_run.dag_id, dag_run.run_id
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
self._create_dag_runs(query.all(), session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1582, in _create_dag_runs
creating_job_id=self.id,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/dag.py", line 1781, in create_dagrun
session.flush()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2523, in flush
self._flush(objects)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush
transaction.rollback(_capture_exception=True)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
exc_value, with_traceback=exc_tb,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush
flush_context.execute()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: dag_run.dag_id, dag_run.run_id
[SQL: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, creating_job_id, external_trigger, run_type, conf, last_scheduling_decision, dag_hash) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: ('scenario1_case2_01', '2021-01-25 00:00:00.000000', '2021-01-26 13:07:54.516957', None, 'running', 'scheduled__2021-01-25T00:00:00+00:00', 1, 0, <DagRunType.SCHEDULED: 'scheduled'>, <memory at 0x7fc388665530>, None, '09f9507aca47e7751a40cc98e546ac0d')]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748263101
@ashb I think I narrowed this down to being an issue with one of my plugins (a FAB view) that relies on an Airflow Variable. I think this is the same one I pinged you about on Slack a week or two ago and we thought it was likely due to me running the scheduler a debugger, but maybe that wasn't (entirely) the problem. I found that if I commented out the lines that did the call to `Variables` the pk violations stopped happening. If I put them back in, stopped the scheduler long enough for a dag to need to be triggered right away (with one scheduled to run every minute, that was one minute) and then started the scheduler again, it would trigger the pk violation.
In trying to recreate a minimal example, I added the following to a file in plugins/
```
from airflow.models.variable import Variable
v = Variable().get("VAR")
```
and upon starting the it triggered this Error:
`RuntimeError: UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!`
which was the same error I pinged you about.
This made be go back and look at the code in the plugin and I had:
```
try:
models = Variables().get("MODELS")
except:
models = []
```
If I took out the generic except, then rather than kicking out a pk violation, it raised the UNEXPECTED COMMIT error. So the pk violation was happening because of the overly broad except in the plugin code.
So this brings up the question about how does one use Variables in Plugins without breaking the HA code?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746264176
@JCoder01 COuld we have some more logs surrounding those times -- it would help us tell which places this is getting updated.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163680
> ```
> airflow scheduler
> ```
I was not able to reproduce it with breeze and Postgres 9.6 and 12
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747113218
Is it possible that you have different dags with the same DAG_ID (educated guesses here) ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767531551
I was able to reproduce this.
> cd airflow_scheduler_test
> export AIRFLOW_HOME=$(pwd)
> virtualenv pyenv
> source pyenv/bin/activate
> pip install apache-airflow==2.0.0
>
> airflow db init
> airflow variables set test TEST
> mkdir plugins
> cat << EOF > plugins/test.py
> from airflow.models.variable import Variable
>
> print(Variable.get('test'))
> EOF
>
> airflow dags unpause example_bash_operator
> airflow scheduler
Stacktrace:
<details><summary>CLICK ME</summary>
<p>
```
❯ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-01-26 13:09:59,150] {scheduler_job.py:1242} INFO - Starting the scheduler
[2021-01-26 13:09:59,151] {scheduler_job.py:1247} INFO - Processing each file at most -1 times
[2021-01-26 13:09:59,154] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 5278
[2021-01-26 13:09:59,156] {scheduler_job.py:1772} INFO - Resetting orphaned tasks for active dag runs
[2021-01-26 13:09:59,161] {settings.py:52} INFO - Configured default timezone Timezone('UTC')
[2021-01-26 13:09:59,194] {plugins_manager.py:231} ERROR - UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/plugins_manager.py", line 222, in load_plugins_from_plugin_directory
loader.exec_module(mod)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/kaxilnaik/airflow/plugins/test.py", line 3, in <module>
print(Variable.get('test'))
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 127, in get
var_val = Variable.get_variable_from_secrets(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 193, in get_variable_from_secrets
var_val = secrets_backend.get_variable(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 32, in create_session
session.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
self.transaction.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 472, in _prepare_impl
self.session.dispatch.before_commit(self.session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/sqlalchemy.py", line 217, in _validate_commit
raise RuntimeError("UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!")
RuntimeError: UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
[2021-01-26 13:09:59,201] {plugins_manager.py:232} ERROR - Failed to import plugin /Users/kaxilnaik/airflow/plugins/test.py
[2021-01-26 13:09:59,386] {scheduler_job.py:1623} INFO - DAG scenario1_case2_02 is at (or above) max_active_runs (1 of 1), not creating any more runs
[2021-01-26 13:09:59,387] {scheduler_job.py:1623} INFO - DAG scenario1_case2_01 is at (or above) max_active_runs (1 of 1), not creating any more runs
[2021-01-26 13:09:59,435] {scheduler_job.py:936} INFO - 3 tasks up for execution:
<TaskInstance: scenario1_case2_02.tasks__1_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
<TaskInstance: scenario1_case2_01.tasks__1_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
<TaskInstance: scenario4_case2_1_40.tasks__1_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
[2021-01-26 13:09:59,436] {scheduler_job.py:970} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 3 task instances ready to be queued
[2021-01-26 13:09:59,436] {scheduler_job.py:997} INFO - DAG scenario1_case2_02 has 0/1000000 running and queued tasks
[2021-01-26 13:09:59,436] {scheduler_job.py:997} INFO - DAG scenario1_case2_01 has 0/1000000 running and queued tasks
[2021-01-26 13:09:59,436] {scheduler_job.py:997} INFO - DAG scenario4_case2_1_40 has 0/16 running and queued tasks
[2021-01-26 13:09:59,437] {scheduler_job.py:1058} INFO - Setting the following tasks to queued state:
<TaskInstance: scenario1_case2_02.tasks__1_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
<TaskInstance: scenario1_case2_01.tasks__1_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
<TaskInstance: scenario4_case2_1_40.tasks__1_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
[2021-01-26 13:09:59,438] {scheduler_job.py:1100} INFO - Sending TaskInstanceKey(dag_id='scenario1_case2_02', task_id='tasks__1_of_10', execution_date=datetime.datetime(2021, 1, 25, 0, 0, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 10 and queue default
[2021-01-26 13:09:59,439] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'scenario1_case2_02', 'tasks__1_of_10', '2021-01-25T00:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/Users/kaxilnaik/airflow/dags/scenario1_case2_02.py']
[2021-01-26 13:09:59,439] {scheduler_job.py:1100} INFO - Sending TaskInstanceKey(dag_id='scenario1_case2_01', task_id='tasks__1_of_10', execution_date=datetime.datetime(2021, 1, 25, 0, 0, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 10 and queue default
[2021-01-26 13:09:59,439] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'scenario1_case2_01', 'tasks__1_of_10', '2021-01-25T00:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/Users/kaxilnaik/airflow/dags/scenario1_case2_01.py']
[2021-01-26 13:09:59,439] {scheduler_job.py:1100} INFO - Sending TaskInstanceKey(dag_id='scenario4_case2_1_40', task_id='tasks__1_of_10', execution_date=datetime.datetime(2021, 1, 25, 0, 0, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 10 and queue default
[2021-01-26 13:09:59,439] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'scenario4_case2_1_40', 'tasks__1_of_10', '2021-01-25T00:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/Users/kaxilnaik/airflow/dags/scenario4_case2_1.py']
[2021-01-26 13:09:59,440] {sequential_executor.py:59} INFO - Executing command: ['airflow', 'tasks', 'run', 'scenario1_case2_02', 'tasks__1_of_10', '2021-01-25T00:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/Users/kaxilnaik/airflow/dags/scenario1_case2_02.py']
[2021-01-26 13:10:00,926] {dagbag.py:440} INFO - Filling up the DagBag from /Users/kaxilnaik/airflow/dags/scenario1_case2_02.py
TEST
Running <TaskInstance: scenario1_case2_02.tasks__1_of_10 2021-01-25T00:00:00+00:00 [queued]> on host 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
[2021-01-26 13:10:01,570] {sequential_executor.py:59} INFO - Executing command: ['airflow', 'tasks', 'run', 'scenario1_case2_01', 'tasks__1_of_10', '2021-01-25T00:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/Users/kaxilnaik/airflow/dags/scenario1_case2_01.py']
[2021-01-26 13:10:02,992] {dagbag.py:440} INFO - Filling up the DagBag from /Users/kaxilnaik/airflow/dags/scenario1_case2_01.py
TEST
Running <TaskInstance: scenario1_case2_01.tasks__1_of_10 2021-01-25T00:00:00+00:00 [queued]> on host 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
[2021-01-26 13:10:03,628] {sequential_executor.py:59} INFO - Executing command: ['airflow', 'tasks', 'run', 'scenario4_case2_1_40', 'tasks__1_of_10', '2021-01-25T00:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/Users/kaxilnaik/airflow/dags/scenario4_case2_1.py']
[2021-01-26 13:10:05,054] {dagbag.py:440} INFO - Filling up the DagBag from /Users/kaxilnaik/airflow/dags/scenario4_case2_1.py
TEST
Running <TaskInstance: scenario4_case2_1_40.tasks__1_of_10 2021-01-25T00:00:00+00:00 [queued]> on host 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
[2021-01-26 13:10:05,750] {scheduler_job.py:1201} INFO - Executor reports execution of scenario1_case2_02.tasks__1_of_10 execution_date=2021-01-25 00:00:00+00:00 exited with status success for try_number 1
[2021-01-26 13:10:05,750] {scheduler_job.py:1201} INFO - Executor reports execution of scenario1_case2_01.tasks__1_of_10 execution_date=2021-01-25 00:00:00+00:00 exited with status success for try_number 1
[2021-01-26 13:10:05,750] {scheduler_job.py:1201} INFO - Executor reports execution of scenario4_case2_1_40.tasks__1_of_10 execution_date=2021-01-25 00:00:00+00:00 exited with status success for try_number 1
[2021-01-26 13:10:05,792] {scheduler_job.py:1293} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(scenario1_case2_02, scheduled__2021-01-25T00:00:00+00:00) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
self._create_dag_runs(query.all(), session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1582, in _create_dag_runs
creating_job_id=self.id,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/dag.py", line 1781, in create_dagrun
session.flush()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2523, in flush
self._flush(objects)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush
transaction.rollback(_capture_exception=True)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
exc_value, with_traceback=exc_tb,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush
flush_context.execute()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(scenario1_case2_02, scheduled__2021-01-25T00:00:00+00:00) already exists.
[SQL: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, creating_job_id, external_trigger, run_type, conf, last_scheduling_decision, dag_hash) VALUES (%(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, %(creating_job_id)s, %(external_trigger)s, %(run_type)s, %(conf)s, %(last_scheduling_decision)s, %(dag_hash)s) RETURNING dag_run.id]
[parameters: {'dag_id': 'scenario1_case2_02', 'execution_date': datetime.datetime(2021, 1, 25, 0, 0, tzinfo=Timezone('UTC')), 'start_date': datetime.datetime(2021, 1, 26, 13, 10, 5, 790285, tzinfo=Timezone('UTC')), 'end_date': None, 'state': 'running', 'run_id': 'scheduled__2021-01-25T00:00:00+00:00', 'creating_job_id': 1, 'external_trigger': False, 'run_type': <DagRunType.SCHEDULED: 'scheduled'>, 'conf': <psycopg2.extensions.Binary object at 0x7f95c0b11db0>, 'last_scheduling_decision': None, 'dag_hash': '3bb8093d821e43ddfc6ab56cd9eccfd2'}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
[2021-01-26 13:10:06,814] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 5278
[2021-01-26 13:10:07,005] {process_utils.py:206} INFO - Waiting up to 5 seconds for processes to exit...
[2021-01-26 13:10:07,041] {process_utils.py:66} INFO - Process psutil.Process(pid=5388, status='terminated', started='13:10:05') (5388) terminated with exit code None
[2021-01-26 13:10:07,043] {process_utils.py:66} INFO - Process psutil.Process(pid=5385, status='terminated', started='13:10:05') (5385) terminated with exit code None
[2021-01-26 13:10:07,043] {process_utils.py:66} INFO - Process psutil.Process(pid=5278, status='terminated', exitcode=0, started='13:09:59') (5278) terminated with exit code 0
[2021-01-26 13:10:07,043] {scheduler_job.py:1296} INFO - Exited execute loop
```
</p>
</details>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747100104
Different virtual envs and databases. both virtual envs and databases are on the same host. The only difference is what dags and provider packages are in each virtual env. I do have a dag where I was testing out overriding the `following_schedule` DAG method with a custom function to allow running at specific times. I have that dag in both envs thinking it was the culprit but it even with the dag disabled I see the error in the original environment, for a different dag.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767531551
I was able to reproduce this
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747402210
I'll keep trying to narrow it down. It seems an `airflow db init` does not reliably fix it.
It does seem to point to a race condition, based on after the error occurs, repeatedly trying to restart the scheduler will sometimes sort things out.
The only reliably way to restart the scheduler after the error occurs seems to be to manually update 'next_dag_run_create_after` to a date time in the future.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748273956
Oh interesting. Just by putting that in a plugin things break?
It should be possible to do it - when it's this plugin code being run?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746133738
And - is it possible to get a database dump in the "failing" statej? That would be super useful if you could send us a dump of the database so that we can import it and investigate further!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163680
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163680
> ```
> airflow scheduler
> ```
I was not able to reproduce it with Postgres 9.6 and 12
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-768532309
Will be fixed by one of https://github.com/apache/airflow/pull/13931 and https://github.com/apache/airflow/pull/13932
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163680
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746173327
Here's the whole thing.
` ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(example_task_group, scheduled__2020-12-14T04:32:00+00:00) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
self._create_dag_runs(query.all(), session)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1567, in _create_dag_runs
creating_job_id=self.id,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/airflow/models/dag.py", line 1776, in create_dagrun
session.flush()
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2536, in flush
self._flush(objects)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2678, in _flush
transaction.rollback(_capture_exception=True)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
with_traceback=exc_tb,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2638, in _flush
flush_context.execute()
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/jcoder/git/airflow_2.0/pyenv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "dag_run_dag_id_run_id_key"
DETAIL: Key (dag_id, run_id)=(example_task_group, scheduled__2020-12-14T04:32:00+00:00) already exists.
[SQL: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, creating_job_id, external_trigger, run_type, conf, last_scheduling_decision, dag_hash) VALUES (%(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, %(creat
ing_job_id)s, %(external_trigger)s, %(run_type)s, %(conf)s, %(last_scheduling_decision)s, %(dag_hash)s) RETURNING dag_run.id]
[parameters: {'dag_id': 'example_task_group', 'execution_date': datetime.datetime(2020, 12, 14, 4, 32, tzinfo=Timezone('UTC')), 'start_date': datetime.datetime(2020, 12, 16, 11, 5, 22, 127201, tzinfo=Timezone('UTC')), 'end_date': None, 'state': 'running', 'run_id': 'schedule
d__2020-12-14T04:32:00+00:00', 'creating_job_id': 701, 'external_trigger': False, 'run_type': <DagRunType.SCHEDULED: 'scheduled'>, 'conf': <psycopg2.extensions.Binary object at 0x7fc17b8ee4e0>, 'last_scheduling_decision': None, 'dag_hash': '1628bc132c765d9b68a06841eedfbc7d'}
]
(Background on this error at: http://sqlalche.me/e/13/gkpj)`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stelukutla edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
stelukutla edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-774539054
issue fixed after running the below command
**_airflow dags pause <dag_id>_**
and start the scheduler
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767892380
#13920 should fix the issue with Duplicate Dag Run.
I still need to figure out the fix to allow retrieving Variable / Connections in the Plugin
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163927
I was not able to reproduce it with Postgres 9.6 and 12, haven't tried it with 13.1
**EDIT**: I was able to reproduce it with 2.0.0 and Sqlite
```
❯ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-01-26 12:46:09,634] {scheduler_job.py:1241} INFO - Starting the scheduler
[2021-01-26 12:46:09,634] {scheduler_job.py:1246} INFO - Processing each file at most -1 times
[2021-01-26 12:46:09,639] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 4646
[2021-01-26 12:46:09,640] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2021-01-26 12:46:09,645] {settings.py:52} INFO - Configured default timezone Timezone('UTC')
[2021-01-26 12:46:09,657] {scheduler_job.py:1805} INFO - Reset the following 3 orphaned TaskInstances:
<TaskInstance: scenario1_case2_01.tasks__1_of_10 2021-01-25 00:00:00+00:00 [queued]>
<TaskInstance: scenario4_case2_1_40.tasks__1_of_10 2021-01-25 00:00:00+00:00 [queued]>
<TaskInstance: scenario1_case2_02.tasks__2_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
[2021-01-26 12:46:09,661] {dag_processing.py:518} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2 ) when using sqlite. So we set parallelism to 1.
[2021-01-26 12:46:09,845] {plugins_manager.py:231} ERROR - UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/plugins_manager.py", line 222, in load_plugins_from_plugin_directory
loader.exec_module(mod)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/kaxilnaik/airflow/plugins/test.py", line 3, in <module>
print(Variable.get('test'))
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 123, in get
var_val = Variable.get_variable_from_secrets(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 181, in get_variable_from_secrets
var_val = secrets_backend.get_variable(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 32, in create_session
session.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
self.transaction.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 472, in _prepare_impl
self.session.dispatch.before_commit(self.session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/sqlalchemy.py", line 217, in _validate_commit
raise RuntimeError("UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!")
RuntimeError: UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
[2021-01-26 12:46:09,853] {plugins_manager.py:232} ERROR - Failed to import plugin /Users/kaxilnaik/airflow/plugins/test.py
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-768315621
Looks like this is where the Scheduler runs Plugin code:
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/jobs/scheduler_job.py#L1568
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/dagbag.py#L171
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/dagbag.py#L229
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/models/serialized_dag.py#L167
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/serialization/serialized_objects.py#L658
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/serialization/serialized_objects.py#L412
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L346
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L276
https://github.com/apache/airflow/blob/65e49fc56f32b3e815fdf4a17be6b4e1c1e43c11/airflow/plugins_manager.py#L222
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747095140
What's the diference between the environments? Just airflow installation, database? virtualenv? anything else ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746152929
I have created another virtualenv/database and am seeing trying to recreate. I don't know about dumping the whole database, I don't want to accidentally share sensitive info.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163927
I was not able to reproduce it with Postgres 9.6 and 12, haven't tried it with 13.1
**EDIT**: I was able to reproduce it with 2.0.0 and Sqlite
```
❯ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2021-01-26 12:46:09,634] {scheduler_job.py:1241} INFO - Starting the scheduler
[2021-01-26 12:46:09,634] {scheduler_job.py:1246} INFO - Processing each file at most -1 times
[2021-01-26 12:46:09,639] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 4646
[2021-01-26 12:46:09,640] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2021-01-26 12:46:09,645] {settings.py:52} INFO - Configured default timezone Timezone('UTC')
[2021-01-26 12:46:09,657] {scheduler_job.py:1805} INFO - Reset the following 3 orphaned TaskInstances:
<TaskInstance: scenario1_case2_01.tasks__1_of_10 2021-01-25 00:00:00+00:00 [queued]>
<TaskInstance: scenario4_case2_1_40.tasks__1_of_10 2021-01-25 00:00:00+00:00 [queued]>
<TaskInstance: scenario1_case2_02.tasks__2_of_10 2021-01-25 00:00:00+00:00 [scheduled]>
[2021-01-26 12:46:09,661] {dag_processing.py:518} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2 ) when using sqlite. So we set parallelism to 1.
[2021-01-26 12:46:09,845] {plugins_manager.py:231} ERROR - UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/plugins_manager.py", line 222, in load_plugins_from_plugin_directory
loader.exec_module(mod)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/kaxilnaik/airflow/plugins/test.py", line 3, in <module>
print(Variable.get('test'))
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 123, in get
var_val = Variable.get_variable_from_secrets(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/variable.py", line 181, in get_variable_from_secrets
var_val = secrets_backend.get_variable(key=key)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 32, in create_session
session.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
self.transaction.commit()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 472, in _prepare_impl
self.session.dispatch.before_commit(self.session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/sqlalchemy.py", line 217, in _validate_commit
raise RuntimeError("UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!")
RuntimeError: UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
[2021-01-26 12:46:09,853] {plugins_manager.py:232} ERROR - Failed to import plugin /Users/kaxilnaik/airflow/plugins/test.py
... followed by
[2021-01-26 13:07:54,518] {scheduler_job.py:1293} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: dag_run.dag_id, dag_run.run_id
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
self._create_dag_runs(query.all(), session)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1582, in _create_dag_runs
creating_job_id=self.id,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
return func(*args, **kwargs)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/airflow/models/dag.py", line 1781, in create_dagrun
session.flush()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2523, in flush
self._flush(objects)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush
transaction.rollback(_capture_exception=True)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
exc_value, with_traceback=exc_tb,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush
flush_context.execute()
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/Users/kaxilnaik/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: dag_run.dag_id, dag_run.run_id
[SQL: INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, creating_job_id, external_trigger, run_type, conf, last_scheduling_decision, dag_hash) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: ('scenario1_case2_01', '2021-01-25 00:00:00.000000', '2021-01-26 13:07:54.516957', None, 'running', 'scheduled__2021-01-25T00:00:00+00:00', 1, 0, <DagRunType.SCHEDULED: 'scheduled'>, <memory at 0x7fc388665530>, None, '09f9507aca47e7751a40cc98e546ac0d')]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746287259
And last question for now (you may not have this one anymore): Did you get any of the "Setting next_dagrun for example_task_group" messages in the main scheduler output?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163927
> Happy New Year @ashb
> There is a little bit of both. There is an outstanding issue where use of Variables (or probably anything that interacts with the backed db in the global scope) in a plugin raises the `UNEXPECTED COMMIT` RunTime error.
> The problem I reported with being unable to restart the scheduler seems to have been caused by wrapping my `Variable.get` in an overly generous try/except, hiding the run time error and causing things in the db to get out of sync.
> Running the below should recreate the issue, I did this on python 3.7
>
> ```
> mkdir airflow_scheduler_test
> cd airflow_scheduler_test
> export AIRFLOW_HOME=$(pwd)
> virtualenv pyenv
> source pyenv/bin/activate
> pip install apache-airflow==2.0.0
>
> airflow db init
> airflow variables set test TEST
> mkdir plugins
> cat << EOF > plugins/test.py
> from airflow.models.variable import Variable
>
> print(Variable.get('test'))
> EOF
>
> airflow dags unpause example_bash_operator
> airflow scheduler
> ```
I was not able to reproduce it with Postgres 9.6 and 12
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil removed a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil removed a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163680
> ```
> airflow scheduler
> ```
I was not able to reproduce it with Postgres 9.6 and 12
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-754589174
Happy New Year @ashb
There is a little bit of both. There is an outstanding issue where use of Variables (or probably anything that interacts with the backed db in the global scope) in a plugin raises the `UNEXPECTED COMMIT` RunTime error.
The problem I reported with being unable to restart the scheduler seems to have been caused by wrapping my `Variable.get` in an overly generous try/except, hiding the run time error and causing things in the db to get out of sync.
Running the below should recreate the issue, I did this on python 3.7
```
mkdir airflow_scheduler_test
cd airflow_scheduler_test
export AIRFLOW_HOME=$(pwd)
virtualenv pyenv
source pyenv/bin/activate
pip install apache-airflow==2.0.0
airflow db init
airflow variables set test TEST
mkdir plugins
cat << EOF > plugins/test.py
from airflow.models.variable import Variable
print(Variable.get('test'))
EOF
airflow dags unpause example_bash_operator
airflow scheduler
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746232980
@ashb as requested:
```
[2020-12-15 22:43:15,690] {logging_mixin.py:103} INFO - [2020-12-15 22:43:15,690] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:21:00+00:00
[2020-12-15 22:43:16,501] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,501] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:21:00+00:00
[2020-12-15 22:43:16,606] {logging_mixin.py:103} INFO - [2020-12-15 22:43:16,606] {dag.py:2266} INFO - Setting next_dagrun for example_task_group to 2020-12-14T04:21:00+00:00
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-748333536
log and stacktrace
```
(pyenv) jcoder@bos-rndab01 ~/airflow_scheduler_test $ airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2020-12-18 16:41:19,262] {scheduler_job.py:1241} INFO - Starting the scheduler
[2020-12-18 16:41:19,262] {scheduler_job.py:1246} INFO - Processing each file at most -1 times
[2020-12-18 16:41:19,323] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 79
25
[2020-12-18 16:41:19,324] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2020-12-18 16:41:19,331] {settings.py:52} INFO - Configured default timezone Timezone('UTC')
[2020-12-18 16:41:19,349] {scheduler_job.py:1805} INFO - Reset the following 2 orphaned TaskInstances:
<TaskInstance: example_task_group2.section_2.inner_section_2.task_2 2020-12-17 21:47:00+00:00
[scheduled]>
<TaskInstance: example_task_group2.section_2.inner_section_2.task_2 2020-12-17 21:48:00+00:00
[scheduled]>
[2020-12-18 16:41:19,451] {plugins_manager.py:231} ERROR - UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
Traceback (most recent call last):
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/airflow/plugins_manager.
py", line 222, in load_plugins_from_plugin_directory
loader.exec_module(mod)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/jcoder/airflow_scheduler_test/plugins/Plugin.py", line 3, in <module>
v = Variable().get("VAR")
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/airflow/models/variable.
py", line 123, in get
var_val = Variable.get_variable_from_secrets(key=key)
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/airflow/models/variable.
py", line 181, in get_variable_from_secrets
var_val = secrets_backend.get_variable(key=key)
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/airflow/utils/session.py
", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/anaconda/2020/envs/py_37_pands0.25/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/airflow/utils/session.py
", line 32, in create_session
session.commit()
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.p
y", line 1042, in commit
self.transaction.commit()
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.p
y", line 504, in commit
self._prepare_impl()
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/sqlalchemy/orm/session.p
y", line 472, in _prepare_impl
self.session.dispatch.before_commit(self.session)
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/sqlalchemy/event/attr.py
", line 322, in __call__
fn(*args, **kw)
File "/home/jcoder/airflow_scheduler_test/pyenv/lib/python3.7/site-packages/airflow/utils/sqlalchemy
.py", line 217, in _validate_commit
raise RuntimeError("UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!")
RuntimeError: UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!
[2020-12-18 16:41:19,460] {plugins_manager.py:232} ERROR - Failed to import plugin /home/jcoder/airflo
w_scheduler_test/plugins/Plugin.py
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746190232
if you do a `airflow db init` on the database after it gets to this state, it fixes it, and updates the dag table to the correct `next_dagrun` so it almost seems like in the unlikely event that it gets into this state, the scheduler needs to do a round of updates on start up before starting to schedule tasks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747100104
Different virtual envs and databases. both virtual envs and databases are on the same host. The only difference is what dags and provider packages are in each virtual env. I do have a dag where I was testing out overriding the `following_schedule` DAG method with a custom function to allow running at specific times. I have that dag in both envs thinking it was the culprit but it even with the dag disabled I see the error in the original environment, for a different dag, but not in the second virtual env.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747341221
Marking this as can't reproduce for now -- it may be a race condition, but... 🤷🏻
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-767163927
> Happy New Year @ashb
> There is a little bit of both. There is an outstanding issue where use of Variables (or probably anything that interacts with the backed db in the global scope) in a plugin raises the `UNEXPECTED COMMIT` RunTime error.
> The problem I reported with being unable to restart the scheduler seems to have been caused by wrapping my `Variable.get` in an overly generous try/except, hiding the run time error and causing things in the db to get out of sync.
> Running the below should recreate the issue, I did this on python 3.7
>
> ```
> mkdir airflow_scheduler_test
> cd airflow_scheduler_test
> export AIRFLOW_HOME=$(pwd)
> virtualenv pyenv
> source pyenv/bin/activate
> pip install apache-airflow==2.0.0
>
> airflow db init
> airflow variables set test TEST
> mkdir plugins
> cat << EOF > plugins/test.py
> from airflow.models.variable import Variable
>
> print(Variable.get('test'))
> EOF
>
> airflow dags unpause example_bash_operator
> airflow scheduler
> ```
I was not able to reproduce it with Postgres 9.6 and 12, haven't tried it with 13.1
**EDIT**: I was able to reproduce it with 2.0.0 and Sqlite
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-768315621
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-745934655
Thanks for reporting @JCoder01 !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747114005
I think it is most likely because you had overriden `following_schedule`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] JCoder01 commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
JCoder01 commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747125761
@potiuk I generally believe anything is possible, but in this case I think it's rather unlikely. If that was the case, do you think it would matter if the dag was enabled or not? I only have two dags enabled.
@kaxil If it is this, do you have any ideas where to look to try and confirm? And if it is this, if the dag is in both environments, wouldn't this happen in both environments? Note that the dag with the custom `following_schedule` is not the one throwing the pk violation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stelukutla commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
stelukutla commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-774539054
issue fixed after running the below command
airflow dags pause <dag_id>
and start the scheduler
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-747114005
I think it is most likely because you updated `following_schedule`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746129511
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746179364
Okay right - as expected, nothing really interesting there -- it's `self._create_dag_runs(query.all(), session)` which is failing, which is just creating the dag runs that the previous query gave it https://github.com/apache/airflow/blob/69801f5ef016bcf21af348f4fdeb67f09db8f394/airflow/models/dag.py#L2220-L2242
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-746062483
I've certainly never seen this, and I've done heave testing of killing and restarting schedulers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #13099: Unable to start scheduler after stopped
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13099:
URL: https://github.com/apache/airflow/issues/13099#issuecomment-754549659
@JCoder01 where did we get with this issue? Was it a problem with your plugin or is this issue still outstanding?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org