You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/26 18:25:35 UTC
[GitHub] [airflow] iameugenejo opened a new issue #15036: Scheduler is skipping a day sometimes
iameugenejo opened a new issue #15036:
URL: https://github.com/apache/airflow/issues/15036
I'm having an exactly same issue as this user - https://www.reddit.com/r/dataengineering/comments/lri9fv/airflow_dag_is_skipping_a_day/
The version I'm using is 2.0.1
![Screen Shot 2021-03-26 at 11 01 32 AM](https://user-images.githubusercontent.com/1054824/112676474-b4039580-8e25-11eb-8a3d-036881667f21.png)
![Screen Shot 2021-03-26 at 11 01 40 AM](https://user-images.githubusercontent.com/1054824/112676478-b534c280-8e25-11eb-8b5e-2e2d1dddccf3.png)
![Screen Shot 2021-03-26 at 11 04 10 AM](https://user-images.githubusercontent.com/1054824/112676499-be259400-8e25-11eb-911b-06108042acf8.png)
![Screen Shot 2021-03-26 at 11 01 54 AM](https://user-images.githubusercontent.com/1054824/112676480-b5cd5900-8e25-11eb-9038-aed147d01be9.png)
![Screen Shot 2021-03-26 at 11 01 59 AM](https://user-images.githubusercontent.com/1054824/112676482-b665ef80-8e25-11eb-9c34-c631f557b455.png)
![Screen Shot 2021-03-26 at 11 03 43 AM](https://user-images.githubusercontent.com/1054824/112676498-bd8cfd80-8e25-11eb-8eab-91136d1de51f.png)
The scheduler log is there for the missing date without showing any errors.
And there was no manual runs at all for this dag.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-808429102
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-830765361
@iameugenejo can you share the DAG code? we need more information here.
If we can't reproduce it's almost impossible to find a fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] iameugenejo commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-832294937
this hasn't happened for the past month or so.
The following is the dag with some values redacted.
```from datetime import timedelta
from airflow.models import DAG
from airflow.operators.bash import BashOperator
import sys
import pendulum
from airflow.utils import timezone
DAG_ID = '{REDACTED}'
now = pendulum.now(timezone.utc)
schedule_interval = '0 16 * * *'
start_date = now - timedelta(days=1)
max_active_runs = 1
num_of_tasks = 6
email_ids = '{REDACTED}'
with DAG(
dag_id=DAG_ID,
start_date=start_date,
max_active_runs=max_active_runs,
default_args={
'owner': 'airflow',
'start_date': start_date,
'max_active_runs': max_active_runs,
'email': email_ids,
'email_on_failure': True,
'email_on_retry': True
},
schedule_interval=schedule_interval,
dagrun_timeout=timedelta(seconds=43200), # 6 hours
catchup=False
) as dag:
tasks = []
for i in range(0, num_of_tasks):
tasks.append(BashOperator(
task_id='edms3_'+str(i+1),
retries=10,
retry_delay=timedelta(seconds=60), # 1 minute retry delay
retry_exponential_backoff=True,
max_retry_delay=timedelta(seconds=900), # 15 minutes max retry delay
do_xcom_push=True, # return the last line from the stdout
bash_command="REDACTED.sh {} {} ".format(int(i), int(num_of_tasks)),
dag=dag))
if i != 0:
tasks[i-1] >> tasks[i]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jhtimmins closed issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
jhtimmins closed issue #15036:
URL: https://github.com/apache/airflow/issues/15036
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] iameugenejo edited a comment on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
iameugenejo edited a comment on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-832294937
this hasn't happened for the past month or so.
The following is the dag with some values redacted.
```from datetime import timedelta
from airflow.models import DAG
from airflow.operators.bash import BashOperator
import sys
import pendulum
from airflow.utils import timezone
DAG_ID = '{REDACTED}'
now = pendulum.now(timezone.utc)
schedule_interval = '0 16 * * *'
start_date = now - timedelta(days=1)
max_active_runs = 1
num_of_tasks = 6
email_ids = '{REDACTED}'
with DAG(
dag_id=DAG_ID,
start_date=start_date,
max_active_runs=max_active_runs,
default_args={
'owner': 'airflow',
'start_date': start_date,
'max_active_runs': max_active_runs,
'email': email_ids,
'email_on_failure': True,
'email_on_retry': True
},
schedule_interval=schedule_interval,
dagrun_timeout=timedelta(seconds=43200), # 6 hours
catchup=False
) as dag:
tasks = []
for i in range(0, num_of_tasks):
tasks.append(BashOperator(
task_id='redacted_'+str(i+1),
retries=10,
retry_delay=timedelta(seconds=60), # 1 minute retry delay
retry_exponential_backoff=True,
max_retry_delay=timedelta(seconds=900), # 15 minutes max retry delay
do_xcom_push=True, # return the last line from the stdout
bash_command="REDACTED.sh {} {} ".format(int(i), int(num_of_tasks)),
dag=dag))
if i != 0:
tasks[i-1] >> tasks[i]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jhtimmins commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-849316552
@iameugenejo Sounds good. I'll close for now
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-848524638
I wasn't able to reproduce but I think it's related to the dynamic start_date used in the DAG which is a bad practice and can lead to all kind of undesired behavior.
`start_date = now - timedelta(days=1)`
I tend to close this issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jhtimmins commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-849259773
Thanks @eladkal.
@iameugenejo are you able to replicate this bug even if you remove the dynamic start_date? If not, I agree with @eladkal that we can probably chalk it up to the dynamic start date.
@kaxil Is it possible/desirable to add a check for dynamic start dates and to throw an error or warning?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-811785533
@iameugenejo can you share more details about the issue?
how often does it happen?
effecting specific dag or all dags in the system?
Without reproduce steps / more information it might be hard to understand the root cause
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] iameugenejo commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-849307992
Dynamic start_date is still there and the issue hasn't happened for the past few months, so it might not be about the dynamic start_date.
But since I'm not seeing the issue anymore, I don't mind closing this issue and reopening it when it occurs again
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jhtimmins commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-848448004
@eladkal were you able to validate this? Just trying to get an idea what the status is
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-830712782
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] iameugenejo commented on issue #15036: Scheduler is skipping a day sometimes
Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-812048076
@eladkal , it happened 5 times so far since 2/20.
It's happening to 1 specific dag.
The dag itself is static but the tasks the dag executes are generated dynamically.
The other dags that are not showing this symptom have their tasks statically coded.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org