You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/16 01:13:48 UTC
[GitHub] [airflow] dstandish opened a new pull request, #24488: Add indexes for CASCADE deletes
dstandish opened a new pull request, #24488:
URL: https://github.com/apache/airflow/pull/24488
When we add foreign keys with ON DELETE CASCADE, and we delete rows in the foreign table, the database needs to join back to the referencing table. If there's no suitable index, then it can be slow to perform the deletes.
cc @wolfier
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180321537
Cool!
One question @calfzhou - maybe you would like to add a note about it as PR to https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html?highlight=database+setup#setting-up-a-mysql-database ? We already have a few watch-outs there and adding this one might be useful for others.
It's super easy - just click "Suggest a change on this page" and you will get a PR where you will be able to directly edit the page sources and you could just use similar approach as other warnings there ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dstandish merged pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
dstandish merged PR #24488:
URL: https://github.com/apache/airflow/pull/24488
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] calfzhou commented on pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
calfzhou commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180294724
> @calfzhou - you might have your mysql configured with "NO ZERO DATE" and "STRICT MODE". https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_no_zero_date.
>
> Can you please confirm that?
>
> If so, you might want to disable it - https://stackoverflow.com/questions/9192027/invalid-default-value-for-create-date-timestamp-field.
Thanks @potiuk , that works! I removed `NO_ZERO_DATE` from sql mode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180278302
@calfzhou - you might have your mysql configured with "NO ZERO DATE" and "STRICT MODE".
https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_no_zero_date.
Can you please confirm that?
If so, you might want to disable it - https://stackoverflow.com/questions/9192027/invalid-default-value-for-create-date-timestamp-field.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] calfzhou commented on pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
calfzhou commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1181246708
PR posted #24983
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a diff in pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on code in PR #24488:
URL: https://github.com/apache/airflow/pull/24488#discussion_r898699315
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
Review Comment:
```suggestion
```
nit
##########
airflow/models/taskfail.py:
##########
@@ -39,6 +39,7 @@ class TaskFail(Base):
duration = Column(Integer)
__table_args__ = (
+ Index("idx_task_fail_task_instance", dag_id, task_id, run_id, map_index, unique=False),
Review Comment:
```suggestion
Index("idx_task_fail_task_instance", dag_id, task_id, run_id, map_index),
```
That's the default, no?
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+ batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+ with op.batch_alter_table('xcom', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ # ### end Alembic commands ###
Review Comment:
```suggestion
```
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+ batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+ with op.batch_alter_table('xcom', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ # ### end Alembic commands ###
+
+
+def downgrade():
+ """Unapply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
Review Comment:
```suggestion
```
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
Review Comment:
```suggestion
airflow_version = '2.3.3'
```
We should pull this into the next release.
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+ batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+ with op.batch_alter_table('xcom', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
Review Comment:
```suggestion
'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index']
```
##########
airflow/models/taskreschedule.py:
##########
@@ -61,6 +61,7 @@ class TaskReschedule(Base):
name="task_reschedule_ti_fkey",
ondelete="CASCADE",
),
+ Index('idx_task_reschedule_dag_run', dag_id, run_id, unique=False),
Review Comment:
```suggestion
Index('idx_task_reschedule_dag_run', dag_id, run_id),
```
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+ batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
Review Comment:
```suggestion
batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'])
```
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
Review Comment:
```suggestion
'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index']
```
Same here.
##########
airflow/migrations/versions/0111_2_4_0_add_indexes_for_cascade_deletes.py:
##########
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Add indexes for CASCADE deletes
+
+Revision ID: f5fcbda3e651
+Revises: 3c94c427fdf6
+Create Date: 2022-06-15 18:04:54.081789
+
+"""
+
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = 'f5fcbda3e651'
+down_revision = '3c94c427fdf6'
+branch_labels = None
+depends_on = None
+airflow_version = '2.4.0'
+
+
+def upgrade():
+ """Apply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_task_fail_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+ batch_op.create_index('idx_task_reschedule_dag_run', ['dag_id', 'run_id'], unique=False)
+
+ with op.batch_alter_table('xcom', schema=None) as batch_op:
+ batch_op.create_index(
+ 'idx_xcom_task_instance', ['dag_id', 'task_id', 'run_id', 'map_index'], unique=False
+ )
+
+ # ### end Alembic commands ###
+
+
+def downgrade():
+ """Unapply Add indexes for CASCADE deletes"""
+ # ### commands auto generated by Alembic - please adjust! ###
+ with op.batch_alter_table('xcom', schema=None) as batch_op:
+ batch_op.drop_index('idx_xcom_task_instance')
+
+ with op.batch_alter_table('task_reschedule', schema=None) as batch_op:
+ batch_op.drop_index('idx_task_reschedule_dag_run')
+
+ with op.batch_alter_table('task_fail', schema=None) as batch_op:
+ batch_op.drop_index('idx_task_fail_task_instance')
+
+ # ### end Alembic commands ###
Review Comment:
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1157718258
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] calfzhou commented on pull request #24488: Add indexes for CASCADE deletes for task_instance
Posted by GitBox <gi...@apache.org>.
calfzhou commented on PR #24488:
URL: https://github.com/apache/airflow/pull/24488#issuecomment-1180236380
Upgrading airflow 2.3.2 to 2.3.3, db upgrade failed:
Running upgrade 3c94c427fdf6 -> f5fcbda3e651, Add indexes for CASCADE deletes on task_instance
...
MySQLdb.OperationalError: (1067, "Invalid default value for 'end_date'")
(MySQL 8.0)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org