You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/16 01:13:48 UTC

[GitHub] [airflow] dstandish opened a new pull request, #24488: Add indexes for CASCADE deletes

dstandish opened a new pull request, #24488:
URL: https://github.com/apache/airflow/pull/24488

   When we add foreign keys with ON DELETE CASCADE, and we delete rows in the foreign table, the database needs to join back to the referencing table.  If there's no suitable index, then it can be slow to perform the deletes.
   
   cc @wolfier 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180321537

   Cool! 
   
   One question @calfzhou - maybe you would like to add a note about it as PR to https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html?highlight=database+setup#setting-up-a-mysql-database  ? We already have a few watch-outs there and adding this one might be useful for others.
   
   It's super easy - just click "Suggest a change on this page" and you will get a PR where you will be able to directly edit the page sources and you could just use similar approach as other warnings there ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish merged pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
dstandish merged PR #24488:
URL: https://github.com/apache/airflow/pull/24488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] calfzhou commented on pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
calfzhou commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180294724

   > @calfzhou - you might have your mysql configured with "NO ZERO DATE" and "STRICT MODE". https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_no_zero_date.
   > 
   > Can you please confirm that?
   > 
   > If so, you might want to disable it - https://stackoverflow.com/questions/9192027/invalid-default-value-for-create-date-timestamp-field.
   
   Thanks @potiuk , that works! I removed `NO_ZERO_DATE` from sql mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180278302

   @calfzhou  - you might have your mysql configured with "NO ZERO DATE" and "STRICT MODE". 
   https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_no_zero_date. 
   
   Can you please confirm that? 
   
   If so, you might want to disable it - https://stackoverflow.com/questions/9192027/invalid-default-value-for-create-date-timestamp-field.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] calfzhou commented on pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
calfzhou commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1181246708

   PR posted #24983


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on a diff in pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on code in PR #24488:
URL: https://github.com/apache/airflow/pull/24488#discussion_r898699315


##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###

Review Comment:
   ```suggestion
   ```
   
   nit



##########
airflow/models/taskfail.py:
##########
@@ -39,6 +39,7 @@ class TaskFail(Base):
     duration = Column(Integer)
 
     __table_args__ = (
+        Index("idx_task_fail_task_instance", dag_id, task_id, run_id, map_index, unique=False),

Review Comment:
   ```suggestion
           Index("idx_task_fail_task_instance", dag_id, task_id, run_id, map_index),
   ```
   
   That's the default, no?



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+        batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+    with op.batch_alter_table('xcom', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    # ### end Alembic commands ###

Review Comment:
   ```suggestion
   ```



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+        batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+    with op.batch_alter_table('xcom', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    # ### end Alembic commands ###
+
+
+def downgrade():
+    """Unapply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###

Review Comment:
   ```suggestion
   ```



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'

Review Comment:
   ```suggestion
   airflow_version = '2.3.3'
   ```
   
   We should pull this into the next release.



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+        batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+    with op.batch_alter_table('xcom', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False

Review Comment:
   ```suggestion
               'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index']
   ```



##########
airflow/models/taskreschedule.py:
##########
@@ -61,6 +61,7 @@ class TaskReschedule(Base):
             name="task_reschedule_ti_fkey",
             ondelete="CASCADE",
         ),
+        Index('idx_task_reschedule_dag_run', dag_id, run_id, unique=False),

Review Comment:
   ```suggestion
           Index('idx_task_reschedule_dag_run', dag_id, run_id),
   ```



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+        batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)

Review Comment:
   ```suggestion
           batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'])
   ```



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False

Review Comment:
   ```suggestion
               'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index']
   ```
   
   Same here.



##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+    """Apply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+        batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+    with op.batch_alter_table('xcom', schema=None) as batch_op:
+        batch_op.create_index(
+            'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+        )
+
+    # ### end Alembic commands ###
+
+
+def downgrade():
+    """Unapply Add indexes for CASCADE deletes"""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('xcom', schema=None) as batch_op:
+        batch_op.drop_index('idx_xcom_task_instance')
+
+    with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+        batch_op.drop_index('idx_task_reschedule_dag_run')
+
+    with op.batch_alter_table('task_fail', schema=None) as batch_op:
+        batch_op.drop_index('idx_task_fail_task_instance')
+
+    # ### end Alembic commands ###

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1157718258

   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] calfzhou commented on pull request #24488: Add indexes for CASCADE deletes for task_instance

Posted by GitBox <gi...@apache.org>.
calfzhou commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180236380

   Upgrading airflow 2.3.2 to 2.3.3, db upgrade failed:
   
   Running upgrade 3c94c427fdf6 -> f5fcbda3e651, Add indexes for CASCADE deletes on task_instance
   ...
   MySQLdb.OperationalError: (1067, "Invalid default value for 'end_date'")
   
   (MySQL 8.0)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org