You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Olivier Girardot <o....@lateral-thoughts.com> on 2016/12/16 13:46:47 UTC

AIRFLOW-699 - dags can't be triggered at the same second

Hi everyone,I wanted to talk about an issue I created AIRFLOW-699, there's an
integrity constraint regarding the dag_id and the execution_date that makes it
impossible to trigger two times with different parameters a specific dag at the
same second.
As we are using airflow as the scheduler at the end of a complex notification
process, during high load scenario, these things happen.
Is there any hard reason for this uniqueness constraint ?
Regards,
Olivier Girardot

Re: AIRFLOW-699 - dags can't be triggered at the same second

Posted by Bolke de Bruin <bd...@gmail.com>.
Hi,

If you are running on Postgres and 1.7.1.3 or Mysql/MariaDB and master triggering a dag run within the same second should work as long they are not triggered at the exact same time (6 fractions of a second).

Robin makes the correct case to make this work for exact same time triggers it would require rewiring task instances to use the primary key of dag runs. I’ve done this is in the past but received some push back on this from Jeremiah and thus removed it from the PR at that time.

Bolke.

> Op 21 dec. 2016, om 10:58 heeft Miller, Robin <Ro...@affiliate.oliverwyman.com> het volgende geschreven:
> 
> Hi Olivier,
> 
> 
> Someone correct me if I'm wrong, but my understanding is as follows:
> 
> 
> At the moment the only way to identify a task_instance as for a specific dag_run is by comparing the dag_id and execution_date. As such, having two dag_runs with the same execution_date would make it impossible to correctly identify the task states for each run. So that constraint is important.
> 
> 
> In order to remove this constraint, I believe Airflow would need to assign a unique key to a dag_run (possibly per dag_id, but that might be harder to implement) and use that key to identify its task instances instead of the execution date.
> 
> 
> Regards,
> 
> Robin Miller
> OLIVER WYMAN
> robin.miller@affiliate.oliverwyman.com<ma...@affiliate.oliverwyman.com>
> www.oliverwyman.com<http://www.oliverwyman.com/>
> 
> ________________________________
> From: Olivier Girardot <o....@lateral-thoughts.com>
> Sent: 21 December 2016 07:02:05
> To: dev@airflow.incubator.apache.org
> Subject: Re: AIRFLOW-699 - dags can't be triggered at the same second
> 
> anyone ?
> 
> 
> 
> 
> 
> 
> On Fri, Dec 16, 2016 2:46 PM, Olivier Girardot o.girardot@lateral-thoughts.com
> wrote:
> Hi everyone,I wanted to talk about an issue I created AIRFLOW-699, there's an
> integrity constraint regarding the dag_id and the execution_date that makes it
> impossible to trigger two times with different parameters a specific dag at the
> same second.
> As we are using airflow as the scheduler at the end of a complex notification
> process, during high load scenario, these things happen.
> Is there any hard reason for this uniqueness constraint ?
> Regards,
> Olivier Girardot
> 
> 
> Olivier Girardot| Associé
> o.girardot@lateral-thoughts.com
> +33 6 24 09 17 94
> 
> ________________________________
> This e-mail and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your cooperation.


Re: AIRFLOW-699 - dags can't be triggered at the same second

Posted by "Miller, Robin" <Ro...@affiliate.oliverwyman.com>.
Hi Olivier,


Someone correct me if I'm wrong, but my understanding is as follows:


At the moment the only way to identify a task_instance as for a specific dag_run is by comparing the dag_id and execution_date. As such, having two dag_runs with the same execution_date would make it impossible to correctly identify the task states for each run. So that constraint is important.


In order to remove this constraint, I believe Airflow would need to assign a unique key to a dag_run (possibly per dag_id, but that might be harder to implement) and use that key to identify its task instances instead of the execution date.


Regards,

Robin Miller
OLIVER WYMAN
robin.miller@affiliate.oliverwyman.com<ma...@affiliate.oliverwyman.com>
www.oliverwyman.com<http://www.oliverwyman.com/>

________________________________
From: Olivier Girardot <o....@lateral-thoughts.com>
Sent: 21 December 2016 07:02:05
To: dev@airflow.incubator.apache.org
Subject: Re: AIRFLOW-699 - dags can't be triggered at the same second

anyone ?






On Fri, Dec 16, 2016 2:46 PM, Olivier Girardot o.girardot@lateral-thoughts.com
wrote:
Hi everyone,I wanted to talk about an issue I created AIRFLOW-699, there's an
integrity constraint regarding the dag_id and the execution_date that makes it
impossible to trigger two times with different parameters a specific dag at the
same second.
As we are using airflow as the scheduler at the end of a complex notification
process, during high load scenario, these things happen.
Is there any hard reason for this uniqueness constraint ?
Regards,
Olivier Girardot


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

________________________________
This e-mail and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your cooperation.

Re: AIRFLOW-699 - dags can't be triggered at the same second

Posted by Olivier Girardot <o....@lateral-thoughts.com>.
anyone ?
 





On Fri, Dec 16, 2016 2:46 PM, Olivier Girardot o.girardot@lateral-thoughts.com
wrote:
Hi everyone,I wanted to talk about an issue I created AIRFLOW-699, there's an
integrity constraint regarding the dag_id and the execution_date that makes it
impossible to trigger two times with different parameters a specific dag at the
same second.
As we are using airflow as the scheduler at the end of a complex notification
process, during high load scenario, these things happen.
Is there any hard reason for this uniqueness constraint ?
Regards,
Olivier Girardot
 

Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94