You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/04 12:46:25 UTC
[GitHub] [airflow] ozw1z5rd commented on issue #10155: Airflow 1.10.10 + DAG SERIALIZATION = fails to start manually the DAG's operators
ozw1z5rd commented on issue #10155:
URL: https://github.com/apache/airflow/issues/10155#issuecomment-668574862
You must enable dag serialisation to replicate my issue, no serialisation no issue on company's system.
These are my setting ( from pilot installation )
```
min_serialized_dag_update_interval = 15
store_dag_code = True
max_num_rendered_ti_fields_per_task = 0 # this avoid the problem of Dead lock, which seems to affect MySQL only engine
```
Any dag is affected, my tests where on this specific one:
```
from builtins import range
from datetime import timedelta
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.dates import days_ago
args = {
'owner': 'Airflow',
'start_date': days_ago(2),
}
dag = DAG(
dag_id='example_sequence_restart',
default_args=args,
schedule_interval='0 0 * * *',
dagrun_timeout=timedelta(minutes=60),
tags=['example']
)
run_this_last = DummyOperator(
task_id='run_this_last',
dag=dag,
)
# [START howto_operator_bash]
run_this = BashOperator(
task_id='run_after_loop',
bash_command='echo 1',
dag=dag,
)
# [END howto_operator_bash]
run_this >> run_this_last
task = BashOperator(
task_id='start',
bash_command='echo "{{ task_instance_key_str }}" && sleep 1',
dag=dag,
)
task >> run_this
# [START howto_operator_bash_template]
also_run_this = BashOperator(
task_id='also_run_this',
bash_command='echo "run_id={{ run_id }} | dag_run={{ dag_run }}"',
dag=dag,
)
# [END howto_operator_bash_template]
also_run_this >> run_this_last
```
I have to say that after the database migration I changes the database a bit:
* dag_tag
change the constraint to
CONSTRAINT `dag_tag_ibfk_1` FOREIGN KEY (`dag_id`) REFERENCES `dag` (`dag_id`) on delete cascade
* rendered_task_instance_fields
changed the execution_date from timestamp to timestamp(6)
execution_date timestamp(6)
* task_fail
changed the execution_date to timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6)
I was before these changes ( mostly the on on rendered_task_instance_fields ) i was unable to manually trigger the same task twice and get the two execution completed without errors. One completed, the other was unable to make the insert into the rendered_task_instance_fields:
```
IntegrityError: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'PARTITIONADD-partition_add-2020-07-28 17:17:13' for key 'PRIMARY'")
[SQL: INSERT INTO rendered_task_instance_fields (dag_id, task_id, execution_date, rendered_fields) VALUES (%s, %s, %s, %s)]
[parameters: ('PARTITIONADD', 'partition_add', datetime.datetime(2020, 7, 28, 17, 17, 13, 315192), '{"hql": "\\n ALTER TABLE unifieddata_cat.transient_ww_eventsjson\\n ADD IF NOT EXISTS PARTITION( country = \'{country}\',year ... (158 characters truncated) ... e_url": "http://httpfs-preprod.hd.docomodigital.com:14000", "hdfs_path_pattern": "/Vault/Docomodigital/Preproduction/rawEvents/{country}/2020/07/28"}')]
(Background on this error at: http://sqlalche.me/e/gkpj)
```
After the change on excution_time anything worked fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org