You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/02 12:59:17 UTC

[GitHub] [airflow] salimeryigit opened a new issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

salimeryigit opened a new issue #13434:
URL: https://github.com/apache/airflow/issues/13434


   **Apache Airflow version**: 2.0.0
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): N/A
   
   **Environment**: 
   
   - **Cloud provider or hardware configuration**: local/aws
   - **OS** (e.g. from /etc/os-release): Ubuntu 18.04.5 LTS
   - **Kernel** (e.g. `uname -a`): 5.4.0-1032-aws
   - **Install tools**: pip
   - **Others**:
   
   **What happened**:
   I did a fresh Airflow 2.0.0 install. With this version, when I manually trigger a DAG, Airflow skips the next scheduled run.
   <!-- (please include exact error messages if you can) -->
   
   **What you expected to happen**:
   Manual runs do not interfere with the scheduled runs prior to Airflow 2. 
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   Create a simple hourly DAG. After enabling it and the initial run, run it manually. It shall skip the next hour. Below is an example, where the manual run with execution time of 08:17 causes the scheduled run with execution time of 08:00 to skip. 
   ![image](https://user-images.githubusercontent.com/26160471/103457719-c7193480-4d12-11eb-82cb-42efaedc9ef4.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-769489612


   Will be fixed by https://github.com/apache/airflow/pull/13963


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LanDeQuHuXi commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
LanDeQuHuXi commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-765877675


   It's a quite big change of behavior, please fix or let it be configurable as least.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13434:
URL: https://github.com/apache/airflow/issues/13434


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768560467


   This also happens in Postgres with psycopg2 2.8.6.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-769489612


   Will be fixed by https://github.com/apache/airflow/pull/13963 and released in 2.0.1 (around 2nd week of Feb)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bobfang1992 commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
bobfang1992 commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768909622


   Don't you think this is an unacceptable change? At least in its current form, we risk the next automated run will be skipped entirely if we want to manually trigger a dag run. This will certainly cause trouble. The manual run should not interfere with the scheduled runs IMHO.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] salimeryigit commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
salimeryigit commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-760016742


   I agree, and to be consistent with the older versions the default behavior should be the old one IMHO (personally, I think it should be reversed). 
   Because of the way Airflow handles execution time, the execution time between manual and scheduled runs may cause problems. Consider a daily dag run at 00:00. If the dag runs at the scheduled time on 13 Jan, the execution date would be 12 Jan 00:00 (period close time) . If I manually trigger the dag at say 00:30, the execution date would be 13 Jan 00:30 which would cause the scheduled run with execution date 13 Jan 00:00 to skip. Depending on the use case, this can cause problems. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LanDeQuHuXi edited a comment on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
LanDeQuHuXi edited a comment on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-765877675


   It's a quite big change of behavior, please fix or let it be configurable at least.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768484735


   Could you please confirm your database and driver version?
   
   (Assuming you are using mysql client `pip list | grep mysqlclient`)
   
   Based on your screenshot and the code snippet attached above, I suspect that this is related to https://github.com/apache/airflow/pull/11621. It looks like `"DagRunType.Scheduled"` is being written to the dag_run table instead of `"scheduled"`, which would then cause the `most_recent_dag_runs` function to not pick up the scheduled run. 
   
   I am currently looking into this and have an active environment which is affected so I should be able to provide more details shortly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-753483690


   Possibly related to #13407 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating edited a comment on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
SamWheating edited a comment on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768484735


   Could you please confirm your database and driver version?
   
   (Assuming you are using mysql client `piplist | grep mysqlclient`)
   
   Based on your screenshot and the code snippet attached above, I suspect that this is related to https://github.com/apache/airflow/pull/11621. It looks like `"DagRunType.Scheduled"` is being written to the dag_run table instead of `"scheduled"`, which would then cause the `most_recent_dag_runs` function to not pick up the scheduled run. 
   
   I am currently looking into this and have an active environment which is affected so I should be able to provide more details shortly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gbonazzoli commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
gbonazzoli commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-755048513


   I don't know if my problem is related to this issue, but I want to underline a change of behavior in airflow 2.0.0 in how the **`max_active_runs`** directive works.
   
   I have the following DAG definition:
   
   ```
   with DAG('VMWARE_BACKUP',
             description = 'VMWARE_BACKUP',
             tags=['vmware'],
             schedule_interval = None,
             start_date = datetime(2019, 5, 29, tzinfo=local_tz),
             default_args = default_args,
             max_active_runs = 1,   # maximum number of active runs for this DAG
             concurrency = 1,       # Added with Airflow 2.0.0
             catchup = False
       ) as dag:
   ```
   Before Airflow 2.0.0 only one DAG was allowed to run. 
   
   With Airflow 2.0.0 (probably due to the scheduler's rewriting with the fantastic speed added in switching between tasks) at the end of one task in a DAGrun instance instead of scheduling the following one it is scheduled a task from another DAGrun Instance, as you can se from the attached screenshot.
   
   ![Screen Shot 2021-01-06 at 03 57 10](https://user-images.githubusercontent.com/17742862/103724430-90724100-4fd4-11eb-9dc3-2a5ac203aae2.png)
   
   I can mitigate the problem in Airflow 2.0.0 with the added **`concurrency = 1`** so that at list the only very long task that I want to be the only one running on the system it runs alone.... 
   
   It works in my use case in other circumstances it could be impracticable.
   
   Please tell me if it is better to open a new main issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating removed a comment on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
SamWheating removed a comment on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768484735


   Could you please confirm your database and driver version?
   
   (Assuming you are using mysql client `pip list | grep mysqlclient`)
   
   Based on your screenshot and the code snippet attached above, I suspect that this is related to https://github.com/apache/airflow/pull/11621. It looks like `"DagRunType.MANUAL"` is being written to the dag_run table instead of `"manual"`, which would then cause the `most_recent_dag_runs` function to incorrectly pick up the manual run. 
   
   I am currently looking into this and have an active environment which is affected so I should be able to provide more details shortly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vxtals commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
vxtals commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-760003219


   I don't think this is really a bug but a change of behavior, IMO this should be reversed or at least allow to change it through config. 
   
   The problem is in the method bulk_write_to_db in the class DAG [https://github.com/apache/airflow/blob/master/airflow/models/dag.py](url)
   
           # Get the latest dag run for each existing dag as a single query (avoid n+1 query)
           most_recent_dag_runs = dict(
               session.query(DagRun.dag_id, func.max_(DagRun.execution_date))
               .filter(
                   DagRun.dag_id.in_(existing_dag_ids),
                   or_(
                       DagRun.run_type == DagRunType.BACKFILL_JOB,
                       DagRun.run_type == DagRunType.SCHEDULED,
                       DagRun.external_trigger.is_(True),
                   ),
               )
               .group_by(DagRun.dag_id)
               .all()
           )
   
   When is getting from db 'most_recent_dag_runs' it includes DagRun.external_trigger.is_(True).
   This most_recent_dag_runs is used later in the method to calculate the next execution, so if it finds a manually triggered execution in the current schedule interval it won't schedule the execution. By removing that line it goes back to previous versions behavior.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] astleychen commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
astleychen commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-760667499


   @salimeryigit Agreed either. It's too much of an implication that a manual trigger may skip the next scheduled run.  I'm also surprised at the first time I met this issue as this basic schedule function should be able to be trusted and normally run as expected. In our scenario, we may trigger the DAG times in a day and also scheduled DAG to run daily as well. It's broken after V2 upgrade. Can we elevate this issue explicitly so that users on V2 can notice this behavior change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating edited a comment on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
SamWheating edited a comment on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768484735


   Could you please confirm your database and driver version?
   
   (Assuming you are using mysql client `pip list | grep mysqlclient`)
   
   Based on your screenshot and the code snippet attached above, I suspect that this is related to https://github.com/apache/airflow/pull/11621. It looks like `"DagRunType.MANUAL"` is being written to the dag_run table instead of `"manual"`, which would then cause the `most_recent_dag_runs` function to incorrectly pick up the manual run. 
   
   I am currently looking into this and have an active environment which is affected so I should be able to provide more details shortly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] salimeryigit commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
salimeryigit commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768825732


   Same DB configuration here, Postgres with psycopg2 2.8.6.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-753470722


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768949743


   > This will certainly cause trouble. The manual run should not interfere with the scheduled runs IMHO.
   
   This is already causing problems and confusion. Users are surprised by this change. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating edited a comment on issue #13434: Airflow 2.0.0 manual run causes scheduled run to skip

Posted by GitBox <gi...@apache.org>.
SamWheating edited a comment on issue #13434:
URL: https://github.com/apache/airflow/issues/13434#issuecomment-768484735


   Could you please confirm your database and driver version?
   
   (Assuming you are using mysql client `piplist | grep mysqlclient`)
   
   Based on your screenshot and the code snippet attached above, I suspect that this is related to https://github.com/apache/airflow/pull/11621. It looks like `"DagRunType.MANUAL"` is being written to the dag_run table instead of `"manual"`, which would then cause the `most_recent_dag_runs` function to incorrectly pick up the manual run. 
   
   I am currently looking into this and have an active environment which is affected so I should be able to provide more details shortly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org