You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/10 02:18:09 UTC
[GitHub] [airflow] wahsmail opened a new issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
wahsmail opened a new issue #15752:
URL: https://github.com/apache/airflow/issues/15752
**Apache Airflow version**: 2.0.1
**Environment**:
- **OS** (e.g. from /etc/os-release): CentOS Linux 7 (Core)
- **Kernel** (e.g. `uname -a`): Linux 3.10.0-957.27.2.el7.x86_64
- **Install tools**: conda install airflow airflow-with-ldap psycopg2 sqlalchemy=1.3
**What happened**:
When I want to backfill tasks using only the UI, I usually pick how far I want to backfill to, mark as failed with the "future" option selected, then clear with the "future" option selected (with various dependency options as well). After upgrading our production server to 2.x, the "Wait a minute prompt" only shows the selected task when there are multiple executions dates following it.
One interesting thing to note is that this behavior works as expected when **clearing** tasks, just not marking success/failure.
**What you expected to happen**:
I expected all the task instances for on and after the selected execution dates to be affected. Instead I have to manually fail each task-date or find another workaround, but this is how our non-power-users have been backfilling processes.
**How to reproduce it**:
We created a fresh conda environment with Python 3.8, ran `conda install airflow airflow-with-ldap psycopg2 sqlalchemy=1.3` and continued the setup for the scheduler and webservice. Python package environment is airflow-centric, not much else in there. We are using the default timezone "America/Chicago" and cron expression schedules "0 0 * * *" to ensure our dags run every night at midnight local time, instead of 11pm/12am/1am depending on daylight savings time / start_date. For the DAG/task start_date I have tried passing a naive datetime.datetime, a datetime.datetime object with tzinfo=pedulum.timezone("America/Chicago"), a pendulum.datetime object with tz="America/Chicago", and a airflow.utils.timezone.datetime object. All suffer from the same issue. Here is an example DAG suffering from this:
`
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.timezone import datetime
default_args = {
'owner': 'wahsmail',
'depends_on_past': False,
'start_date': datetime(2021, 3, 1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
}
dag = DAG('test_dag', default_args=default_args, schedule_interval='0 0 * * *', catchup=True)
def print_stuff_func(**context):
print('---- airflow macros ----')
print(str(context).replace(',', ',\n'))
print_stuff = PythonOperator(
task_id='print_stuff',
python_callable=print_stuff_func,
dag=dag
)
`
**Anything else we need to know**:
Step 1:
![image](https://user-images.githubusercontent.com/24307882/117597607-8b043f00-b10b-11eb-9a08-dc990f691e28.png)
Step 2:
![image](https://user-images.githubusercontent.com/24307882/117597668-af601b80-b10b-11eb-8df6-310d2bbb4e7d.png)
Step 3:
![image](https://user-images.githubusercontent.com/24307882/117597699-bc7d0a80-b10b-11eb-85ad-3b4b9926475f.png)
Step 4:
![image](https://user-images.githubusercontent.com/24307882/117597745-dc143300-b10b-11eb-977b-d9b36045e641.png)
Actually in this example, not even the selected date itself was marked as failed... not sure what's going on here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-1069231162
The tree view has been refactored significantly.
Actually it no longer exist. We now have Grid view
Can you please check if the bug still reproducible on latest main branch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-836073362
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-846146800
Bump @jedcunningham
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
Tagging some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844517963
Issue also persists in 2.0.2
This seems related: https://github.com/apache/airflow/issues/10112
Very annoying bug!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the initial one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
Tagging some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
Tagging the last some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844517963
Issue also persists in 2.0.2
I think this is the same issue: https://github.com/apache/airflow/issues/10112
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844577901
So the choice is to either pass a localized datetime to croniter.get_next() and *then* convert to UTC, or somehow mutate the schedule interval string such that it gets the same result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844615103
Actually now I think I'd argue that [experimental.mark_tasks.get_execution_dates](https://github.com/apache/airflow/blob/e01b4e60d1bfbccce614ce8674c5d8f3580431ef/airflow/api/common/experimental/mark_tasks.py#L239) should convert start_date and end_date to the server timezone before passing to date_range. I made this fix for my installation and my dag is working as intended. I don't know what *other* edge cases this will break but please consider adding a fix for *this* edge case in the next patch, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844615103
Actually now I think I'd argue that [experimental.mark_tasks.get_execution_dates](https://github.com/apache/airflow/blob/e01b4e60d1bfbccce614ce8674c5d8f3580431ef/airflow/api/common/experimental/mark_tasks.py#L239) should convert start_date and end_date to the server timezone before passing to date_range. I made this fix for my installation and my main stuff is working as intended. I don't know what *other* edge cases this will break but please consider adding a fix for *this* edge case in the next patch, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844517963
Issue also persists in 2.0.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
I found the issue. In [utils.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the initial one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5), meaning I want to run at midnight local time every day.
Tagging some contributors to utils.dates.py for visibility: @Rcharriol @bolkedebruin , sorry for the spam
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail edited a comment on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail edited a comment on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
Think I found something. In [util.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, **5**, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] wahsmail commented on issue #15752: Airflow UI tree view: mark (success|failed) (past|future) only marking selected task instance
Posted by GitBox <gi...@apache.org>.
wahsmail commented on issue #15752:
URL: https://github.com/apache/airflow/issues/15752#issuecomment-844566053
Think I found something. In [util.dates.py#L109](https://github.com/apache/airflow/blob/master/airflow/utils/dates.py#L109), cron iter is returning a `datetime.datetime(2021, 4, 24, 0, 0)` when the initial start_date (after making naive) was `datetime.datetime(2021, 4, 23, 5, 0)`. So when this date range is passed to the DagRun.find() method to query the database, it only finds a single execution date, the one with hour=5.
I have schedule_interval is `0 0 * * *` and the server's timezone is Chicago time (UTC-5)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org