You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/14 18:15:49 UTC
[GitHub] [airflow] msardana94 opened a new issue #13679: SQL Syntax errors on startup
msardana94 opened a new issue #13679:
URL: https://github.com/apache/airflow/issues/13679
**Apache Airflow version**: 2.0.0
**What happened**:
While investigating issues relating to task getting stuck, I saw this sql error in postgres logs. I am not entirely sure of what it impacts but I thought of letting you know.
```
ERROR: column "connection.password" must appear in the GROUP BY clause or be used in an aggregate function at character 8
STATEMENT: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.description AS connection_description, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted, count(connection.conn_id) AS count_1
FROM connection GROUP BY connection.conn_id
HAVING count(connection.conn_id) > 1
ERROR: current transaction is aborted, commands ignored until end of transaction block
STATEMENT: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.description AS connection_description, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted
FROM connection
WHERE connection.conn_type IS NULL
```
**How to reproduce it**:
1. Run `docker-compose run initdb`
2. Run `docker-compose run upgradedb`
<details> <summary> Here's my docker-compose </summary>
```
version: "3.2"
networks:
airflow:
services:
postgres:
container_name: af_postgres
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_DB=airflow
- POSTGRES_PASSWORD=airflow
volumes:
- ./postgresql/data:/var/lib/postgresql/data
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
networks:
- airflow
initdb:
container_name: af_initdb
image: docker.io/apache/airflow:2.0.0-python3.7
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
depends_on:
- postgres
entrypoint: /bin/bash
command: -c "airflow db init"
networks:
- airflow
upgradedb:
container_name: af_initdb
image: docker.io/apache/airflow:2.0.0-python3.7
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
depends_on:
- postgres
entrypoint: /bin/bash
command: -c "airflow db upgrade"
networks:
- airflow
webserver:
container_name: af_webserver
image: docker.io/apache/airflow:2.0.0-python3.7
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
ports:
- 80:8080
depends_on:
- postgres
command: webserver
networks:
- airflow
```
</details>
**Anything else we need to know**:
Upon looking the code, I believe having `Connection.conn_id` [here](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L613) will resolve the sql syntax error.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-760519023
```
airflow=# select version();
version
--------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.20 on x86_64-pc-linux-gnu (Debian 9.6.20-1.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
(1 row)
```
Using the `postgres:9.6` docker image.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763216757
I was not able to reproduce the issue. Your docker-compose file contains issues like duplicate contain name: `af_initdb` and it contains no scheduler, and does not create an admin account. And I am not sure why you had initdb and upgradedb both
```yaml
version: "3.2"
networks:
airflow:
driver: bridge
volumes:
postgres_data:
driver: local
services:
postgres:
container_name: af_postgres
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_DB=airflow
- POSTGRES_PASSWORD=airflow
volumes:
- postgres_data:/var/lib/postgresql/data
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
networks:
- airflow
webserver:
container_name: af_webserver
image: docker.io/apache/airflow:2.0.0-python3.7
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
ports:
- 80:8080
depends_on:
- postgres
command: bash -c 'airflow db upgrade && airflow users create --role Admin --username admin --email admin --firstname admin --lastname admin --password admin && airflow webserver'
networks:
- airflow
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-760938288
Sure I can give it a try. I will take a look at contribution quick start since this will be my first time and will reach out over slack with any questions.
> Please note that the behavior may be different depending on the migration state. see: https://github.com/apache/airflow/blob/master/airflow/migrations/versions/8d48763f6d53_add_unique_constraint_to_conn_id.py
I will take a look at the code to see what that means for this bug.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-762864931
> We do not write SQL queries, but use an ORM that generates queries based on the model stored in the code. The model is identical to the database schema in most cases, so everything works fine. The problem arises when the model is different from the database schema because you have not done all the migrations. Then ORM can generate SQL queries that conform to the new database schema, but does not work properly with the older versions. This is just my guess and I am not sure if this is the cause of these problems.
Yes I get that. I should have specified in the PR description that the issue is with the ORM code itself which generates an invalid SQL. I don't believe the error itself has anything to do with version of database schema.
> `airflow upgrade db` works fine if there are no entries in the database. If you would like to test it well, you should migrate the database version compatible with Airflow 1.10. Add a few entries, and then run database migrations to see if the problem was detected.
I will give that a try and see how that goes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 edited a comment on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 edited a comment on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-762554925
> @msardana94 are you willing to submit a PR? Please note that the behavior may be different depending on the migration state. see: https://github.com/apache/airflow/blob/master/airflow/migrations/versions/8d48763f6d53_add_unique_constraint_to_conn_id.py
@mik-laj can you elaborate on what you meant here? Isn't it safe to say that as long as sql syntax is accurate, it will check for conn duplicates appropriately?
Also, any thoughts on how to test this issue after adding a bug fix? I can take a look at the db logs after running `airflow upgrade db` but didn't see any unit tests for this in `tests_db.py`. My understanding is [upgrade_db](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L673) will be called when you run the cli command.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-760517032
What version of Postgres do you use?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763247598
You are right -- https://github.com/apache/airflow/pull/13783 should fix it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763216757
I was not able to reproduce the issue. Your docker-compose file contains issues like duplicate contain name: `af_initdb` and it contains no scheduler, and does not create an admin account. And I am not sure why you had initdb and upgradedb both.
The following `docker-compose.yml` emulates what you did in your docker-compose file
```yaml
version: "3.2"
networks:
airflow:
driver: bridge
volumes:
postgres_data:
driver: local
services:
postgres:
container_name: af_postgres
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_DB=airflow
- POSTGRES_PASSWORD=airflow
volumes:
- postgres_data:/var/lib/postgresql/data
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
networks:
- airflow
webserver:
container_name: af_webserver
image: docker.io/apache/airflow:2.0.0-python3.7
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
ports:
- 80:8080
depends_on:
- postgres
command: bash -c 'airflow db upgrade && airflow users create --role Admin --username admin --email admin --firstname admin --lastname admin --password admin && airflow webserver'
networks:
- airflow
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] XD-DENG closed issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
XD-DENG closed issue #13679:
URL: https://github.com/apache/airflow/issues/13679
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-762554925
> @msardana94 are you willing to submit a PR? Please note that the behavior may be different depending on the migration state. see: https://github.com/apache/airflow/blob/master/airflow/migrations/versions/8d48763f6d53_add_unique_constraint_to_conn_id.py
@mik-laj can you elaborate on what you meant here? Isn't it safe to say that as long as sql syntax is accurate, it will check for conn duplicates appropriately?
Also, any thoughts on how to test this issue after adding a bug fix? I can take a look at the logs after running `airflow upgrade db` but didn't see any unit tests for this in `tests_db.py`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-760374738
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 edited a comment on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 edited a comment on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763236068
Yes I just realized container names were duplicated (sorry an oversight on my part; fixed it). If you look at steps to reproduce mentioned in description, I don't use webserver and/or scheduler (just edited compose to remove it altogether). The error happens while running `airflow db upgrade`. I have also pin pointed specific line of code which I believe is incorrectly written resulting in this error.
After looking more into it, duplicate connection [check](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L618) will always raise an error whether or not there are duplicated connections because of invalid orm query (which gets resolved to sql specified in logs in description of issue).
> And I am not sure why you had initdb and upgradedb both.
Running `airflow db upgrade` when tables aren't setup yet appropriately raises an `ProgrammingError` exception as mentioned in comment [here](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L619). However, when tables are setup, it should _not_ raise any exception right? That's why I am running `db init` and then `db upgrade`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kodeine commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
kodeine commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-921432337
Hello @kaxil ,
I am still getting this error,
```
ostgresql_1 | 2021-09-17 03:45:45.666 GMT [252] LOG: incomplete startup packet
postgresql_1 | 2021-09-17 03:45:45.735 GMT [254] LOG: incomplete startup packet
airflow-worker_1 | airflow-worker 03:45:45.75 INFO ==> Restoring persisted Airflow installation
airflow-worker_1 | airflow-worker 03:45:45.99 INFO ==> Waiting for Airflow Webserser to be up
postgresql_1 | 2021-09-17 03:45:50.273 GMT [256] ERROR: column connection.description does not exist at character 210
postgresql_1 | 2021-09-17 03:45:50.273 GMT [256] STATEMENT: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.description AS connection_description, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted
postgresql_1 | FROM connection
postgresql_1 | WHERE connection.conn_type IS NULL
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 edited a comment on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 edited a comment on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763236068
Yes I just realized container names were duplicated (sorry an oversight on my part; fixed it). If you look at steps to reproduce mentioned in description, I don't use webserver and/or scheduler (just edited compose to remove it altogether). The error happens while running `airflow db upgrade`. I have also pin pointed specific line of code which I believe is incorrectly written resulting in this error.
After looking more into it, the impact of this bug is duplicate connection [check](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L612-L617) will not do what it's supposed to whether or not there are duplicated connections because of invalid orm query (which gets resolved to sql specified in logs in description of issue).
> And I am not sure why you had initdb and upgradedb both.
Running `airflow db upgrade` when tables aren't setup yet appropriately raises an `ProgrammingError` exception as mentioned in comment [here](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L619). However, when tables are setup, it should _not_ raise any exception right? That's why I am running `db init` and then `db upgrade`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] mik-laj commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-760886611
@msardana94 are you willing to submit a PR? Please note that the behavior may be different depending on the migration state. see: https://github.com/apache/airflow/blob/master/airflow/migrations/versions/8d48763f6d53_add_unique_constraint_to_conn_id.py
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763236068
Yes I just realized container names were duplicated (an oversight on my part). If you look at steps to reproduce mentioned in description, I don't use webserver and/or scheduler (just edited it to remove it altogether). The error happens while running `airflow db upgrade`. I have also pin pointed specific line of code which I believe is incorrectly written resulting in this error.
After looking more into it, duplicate connection [check](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L618) will always raise an error whether or not there are duplicated connections because of invalid orm query (which gets resolved to sql specified in logs in description of issue).
> And I am not sure why you had initdb and upgradedb both.
Running `airflow db upgrade` when tables aren't setup yet appropriately raises an `ProgrammingError` exception as mentioned in comment [here](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L619). However, when tables are setup, it should _not_ raise any exception right? That's why I am running `db init` and then `db upgrade`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] msardana94 edited a comment on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
msardana94 edited a comment on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-763236068
Yes I just realized container names were duplicated (sorry an oversight on my part; fixed it). If you look at steps to reproduce mentioned in description, I don't use webserver and/or scheduler (just edited it to remove it altogether). The error happens while running `airflow db upgrade`. I have also pin pointed specific line of code which I believe is incorrectly written resulting in this error.
After looking more into it, duplicate connection [check](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L618) will always raise an error whether or not there are duplicated connections because of invalid orm query (which gets resolved to sql specified in logs in description of issue).
> And I am not sure why you had initdb and upgradedb both.
Running `airflow db upgrade` when tables aren't setup yet appropriately raises an `ProgrammingError` exception as mentioned in comment [here](https://github.com/apache/airflow/blob/ab5f770bfcd8c690cbe4d0825896325aca0beeca/airflow/utils/db.py#L619). However, when tables are setup, it should _not_ raise any exception right? That's why I am running `db init` and then `db upgrade`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] mik-laj commented on issue #13679: SQL Syntax errors on startup
Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #13679:
URL: https://github.com/apache/airflow/issues/13679#issuecomment-762565884
> can you elaborate on what you meant here? Isn't it safe to say that as long as sql syntax is accurate, it will check for conn duplicates appropriately?
We do not write SQL queries, but use an ORM that generates queries based on the model stored in the code. The model is identical to the database schema in most cases, so everything works fine. The problem arises when the model is different from the database schema because you have not done all the migrations. Then ORM can generate SQL queries that conform to the new database schema, but does not work properly with the older versions. This is just my guess and I am not sure if this is the cause of these problems.
`airflow upgrade db` works fine if there are no entries in the database. If you would like to test it well, you should migrate the database version compatible with Airflow 1.10. Add a few entries, and then run database migrations to see if the problem was detected.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org