You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/10 02:18:09 UTC

[GitHub] [airflow] wahsmail opened a new issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

wahsmail opened a new issue #15752:
URL: https://github.com/apache/airflow/issues/15752


   **Apache Airflow version**: 2.0.1
   
   **Environment**: 
   
   - **OS** (e.g. from /etc/os-release): CentOS Linux 7 (Core)
   - **Kernel** (e.g. `uname -a`): Linux 3.10.0-957.27.2.el7.x86_64
   - **Install tools**: conda install airflow airflow-with-ldap psycopg2 sqlalchemy=1.3
   
   **What happened**:
   
   When I want to backfill tasks using only the UI, I usually pick how far I want to backfill to, mark as failed with the "future" option selected, then clear with the "future" option selected (with various dependency options as well). After upgrading our production server to 2.x, the "Wait a minute prompt" only shows the selected task when there are multiple executions dates following it.
   
   One interesting thing to note is that this behavior works as expected when **clearing** tasks, just not marking success/failure.
   
   **What you expected to happen**:
   
   I expected all the task instances for on and after the selected execution dates to be affected. Instead I have to manually fail each task-date or find another workaround, but this is how our non-power-users have been backfilling processes.
   
   **How to reproduce it**:
   
   We created a fresh conda environment with Python 3.8, ran `conda install airflow airflow-with-ldap psycopg2 sqlalchemy=1.3` and continued the setup for the scheduler and webservice. Python package environment is airflow-centric, not much else in there. We are using the default timezone "America/Chicago" and cron expression schedules "0 0 * * *" to ensure our dags run every night at midnight local time, instead of 11pm/12am/1am depending on daylight savings time / start_date. For the DAG/task start_date I have tried passing a naive datetime.datetime, a datetime.datetime object with tzinfo=pedulum.timezone("America/Chicago"), a pendulum.datetime object with tz="America/Chicago", and a airflow.utils.timezone.datetime object. All suffer from the same issue. Here is an example DAG suffering from this:
   
   `
   from airflow import DAG
   from airflow.operators.python_operator import PythonOperator
   from airflow.utils.timezone import datetime
   
   default_args = {
       'owner': 'wahsmail',
       'depends_on_past': False,
       'start_date': datetime(2021, 3, 1),
       'email_on_failure': False,
       'email_on_retry': False,
       'retries': 0,
   }
   
   dag = DAG('test_dag', default_args=default_args, schedule_interval='0 0 * * *', catchup=True)
   
   
   def print_stuff_func(**context):
       print('---- airflow macros ----')
       print(str(context).replace(',', ',\n'))
   
   
   print_stuff = PythonOperator(
       task_id='print_stuff',
       python_callable=print_stuff_func,
       dag=dag
   )
   `
   
   **Anything else we need to know**:
   
   Step 1:
   ![image](https://user-images.githubusercontent.com/24307882/117597607-8b043f00-b10b-11eb-9a08-dc990f691e28.png)
   
   Step 2:
   ![image](https://user-images.githubusercontent.com/24307882/117597668-af601b80-b10b-11eb-8df6-310d2bbb4e7d.png)
   
   Step 3:
   ![image](https://user-images.githubusercontent.com/24307882/117597699-bc7d0a80-b10b-11eb-85ad-3b4b9926475f.png)
   
   Step 4:
   ![image](https://user-images.githubusercontent.com/24307882/117597745-dc143300-b10b-11eb-977b-d9b36045e641.png)
   
   Actually in this example, not even the selected date itself was marked as failed... not sure what's going on here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-1069231162


   The tree view has been refactored significantly.
   Actually it no longer exist. We now have Grid view
   Can you please check if the bug still reproducible on latest main branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-836073362


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-846146800


   Bump @jedcunningham 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
   
   Tagging some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844517963


   Issue also persists in 2.0.2
   
   This seems related: https://github.com/apache/airflow/issues/10112
   
   Very annoying bug!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the initial one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
   
   Tagging some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
   
   Tagging the last some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844517963


   Issue also persists in 2.0.2
   
   I think this is the same issue: https://github.com/apache/airflow/issues/10112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844577901


   So the choice is to either pass a localized datetime to croniter.get_next() and *then* convert to UTC, or somehow mutate the  schedule interval string such that it gets the same result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844615103


   Actually now I think I'd argue that [experimental.mark_tasks.get_execution_dates](https://github.com/apache/airflow/blob/e01b4e60d1bfbccce614ce8674c5d8f3580431ef/airflow/api/common/experimental/mark_tasks.py#L239) should convert start_date and end_date to the server timezone before passing to date_range. I made this fix for my installation and my dag is working as intended. I don't know what *other* edge cases this will break but please consider adding a fix for *this* edge case in the next patch, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844615103


   Actually now I think I'd argue that [experimental.mark_tasks.get_execution_dates](https://github.com/apache/airflow/blob/e01b4e60d1bfbccce614ce8674c5d8f3580431ef/airflow/api/common/experimental/mark_tasks.py#L239) should convert start_date and end_date to the server timezone before passing to date_range. I made this fix for my installation and my main stuff is working as intended. I don't know what *other* edge cases this will break but please consider adding a fix for *this* edge case in the next patch, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844517963


   Issue also persists in 2.0.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the initial one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5), meaning I want to run at midnight local time every day.
   
   Tagging some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   Think I found something. In [util.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance

Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053


   Think I found something. In [util.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, 5, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
   
   I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org