You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "tanvn (via GitHub)" <gi...@apache.org> on 2023/03/09 16:04:45 UTC

[GitHub] [airflow] tanvn opened a new pull request, #29999: Fix db clean command for mysql db

tanvn opened a new pull request, #29999:
URL: https://github.com/apache/airflow/pull/29999

   closes:  https://github.com/apache/airflow/issues/28051
   
   Tested on my local environment (running with MySQL 8) and confirmed that it is working well.
   This is my very first PR, so if there is anything else I should do (like adding new tests?), please let me know.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133226593


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,26 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   ```suggestion
             # MySQL with replication needs this split in to two queries, so just do it for all MySQL
             # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
             session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] tanvn commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "tanvn (via GitHub)" <gi...@apache.org>.
tanvn commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133487924


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,28 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        # MySQL with replication needs this split into two queries, so just do it for all MySQL
+        # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   @uranusjr I would love to do so but I am new to sqlalchemy and at first I do not know how to let the compiler to execute 2 queries instead of one?
   (I just took a quick look at https://docs.sqlalchemy.org/en/20/core/compiler.html#dialect-specific-compilation-rules)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] tanvn commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "tanvn (via GitHub)" <gi...@apache.org>.
tanvn commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133257524


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,26 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   Thanks! I added the comment!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #29999: Fix db clean command for mysql db

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #29999:
URL: https://github.com/apache/airflow/pull/29999


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133226593


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,26 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   ```suggestion
           # MySQL with replication needs this split in to two queries, so just do it for all MySQL
           # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
           session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #29999: Fix db clean command for mysql db

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #29999:
URL: https://github.com/apache/airflow/pull/29999#issuecomment-1466194261

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #29999: Fix db clean command for mysql db

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #29999:
URL: https://github.com/apache/airflow/pull/29999#issuecomment-1465151099

   One NIT I'd add @tanvn is to add similar comment  here as the one you referred in the issue:
   
   ```
           # MySQL with replication needs this split in to two queries, so just do it for all MySQL
           # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
   ```
   
   so that we know where it came from


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133500604


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,28 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        # MySQL with replication needs this split into two queries, so just do it for all MySQL
+        # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   Hmm good point. It’s probably possible with some SQL but likely not worthwhile.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #29999: Fix db clean command for mysql db

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #29999:
URL: https://github.com/apache/airflow/pull/29999#issuecomment-1462325880

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133226593


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,26 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   ```suggestion
                           # MySQL with replication needs this split in to two queries, so just do it for all MySQL
                           # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
                           session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133477557


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,28 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        # MySQL with replication needs this split into two queries, so just do it for all MySQL
+        # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   Instead of writing custom SQL here, would it be better to enhance `CreateTableAs` compilation to support the MySQL syntax instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] tanvn commented on a diff in pull request #29999: Fix db clean command for mysql db

Posted by "tanvn (via GitHub)" <gi...@apache.org>.
tanvn commented on code in PR #29999:
URL: https://github.com/apache/airflow/pull/29999#discussion_r1133487924


##########
airflow/utils/db_cleanup.py:
##########
@@ -150,18 +150,28 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     timestamp_str = re.sub(r"[^\d]", "", datetime.utcnow().isoformat())[:14]
     target_table_name = f"{ARCHIVE_TABLE_PREFIX}{orm_model.name}__{timestamp_str}"
     print(f"Moving data to table {target_table_name}")
-    stmt = CreateTableAs(target_table_name, query.selectable)
-    logger.debug("ctas query:\n%s", stmt.compile())
-    session.execute(stmt)
+    bind = session.get_bind()
+    dialect_name = bind.dialect.name
+    if dialect_name == "mysql":
+        # MySQL with replication needs this split into two queries, so just do it for all MySQL
+        # ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
+        session.execute(f"CREATE TABLE {target_table_name} LIKE {orm_model.name}")

Review Comment:
   @uranusjr I would love to do so but I am new to sqlalchemy and at first I do not know how to let the compiler execute 2 queries instead of one?
   (I just took a quick look at https://docs.sqlalchemy.org/en/20/core/compiler.html#dialect-specific-compilation-rules)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org