You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/26 18:25:35 UTC

[GitHub] [airflow] iameugenejo opened a new issue #15036: Scheduler is skipping a day sometimes

iameugenejo opened a new issue #15036:
URL: https://github.com/apache/airflow/issues/15036


   I'm having an exactly same issue as this user - https://www.reddit.com/r/dataengineering/comments/lri9fv/airflow_dag_is_skipping_a_day/
   
   The version I'm using is 2.0.1
   
   ![Screen Shot 2021-03-26 at 11 01 32 AM](https://user-images.githubusercontent.com/1054824/112676474-b4039580-8e25-11eb-8a3d-036881667f21.png)
   ![Screen Shot 2021-03-26 at 11 01 40 AM](https://user-images.githubusercontent.com/1054824/112676478-b534c280-8e25-11eb-8b5e-2e2d1dddccf3.png)
   ![Screen Shot 2021-03-26 at 11 04 10 AM](https://user-images.githubusercontent.com/1054824/112676499-be259400-8e25-11eb-911b-06108042acf8.png)
   ![Screen Shot 2021-03-26 at 11 01 54 AM](https://user-images.githubusercontent.com/1054824/112676480-b5cd5900-8e25-11eb-9038-aed147d01be9.png)
   ![Screen Shot 2021-03-26 at 11 01 59 AM](https://user-images.githubusercontent.com/1054824/112676482-b665ef80-8e25-11eb-9c34-c631f557b455.png)
   ![Screen Shot 2021-03-26 at 11 03 43 AM](https://user-images.githubusercontent.com/1054824/112676498-bd8cfd80-8e25-11eb-8eab-91136d1de51f.png)
   
   
   The scheduler log is there for the missing date without showing any errors.
   
   And there was no manual runs at all for this dag.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-808429102


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-830765361


   @iameugenejo can you share the DAG code? we need more information here.
   If we can't reproduce it's almost impossible to find a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-832294937


   this hasn't happened for the past month or so.
   
   The following is the dag with some values redacted.
   
   ```from datetime import timedelta
   from airflow.models import DAG
   from airflow.operators.bash import BashOperator
   import sys
   import pendulum
   from airflow.utils import timezone
   
   
   DAG_ID = '{REDACTED}'
   
   now = pendulum.now(timezone.utc)
   
   
   schedule_interval = '0 16 * * *'
   start_date = now - timedelta(days=1)
   
   max_active_runs = 1
   num_of_tasks = 6
   email_ids = '{REDACTED}'
   
   with DAG(
           dag_id=DAG_ID,
           start_date=start_date,
           max_active_runs=max_active_runs,
           default_args={
               'owner': 'airflow',
               'start_date': start_date,
               'max_active_runs': max_active_runs,
               'email': email_ids,
               'email_on_failure': True,
               'email_on_retry': True
           },
           schedule_interval=schedule_interval,
           dagrun_timeout=timedelta(seconds=43200),  # 6 hours
           catchup=False
   ) as dag:
       tasks = []
       for i in range(0, num_of_tasks):
           tasks.append(BashOperator(
               task_id='edms3_'+str(i+1),
               retries=10,
               retry_delay=timedelta(seconds=60),  # 1 minute retry delay
               retry_exponential_backoff=True,
               max_retry_delay=timedelta(seconds=900),  # 15 minutes max retry delay
               do_xcom_push=True,  # return the last line from the stdout
               bash_command="REDACTED.sh {} {} ".format(int(i), int(num_of_tasks)),
               dag=dag))
           if i != 0:
               tasks[i-1] >> tasks[i]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins closed issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
jhtimmins closed issue #15036:
URL: https://github.com/apache/airflow/issues/15036


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo edited a comment on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
iameugenejo edited a comment on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-832294937


   this hasn't happened for the past month or so.
   
   The following is the dag with some values redacted.
   
   ```from datetime import timedelta
   from airflow.models import DAG
   from airflow.operators.bash import BashOperator
   import sys
   import pendulum
   from airflow.utils import timezone
   
   
   DAG_ID = '{REDACTED}'
   
   now = pendulum.now(timezone.utc)
   
   
   schedule_interval = '0 16 * * *'
   start_date = now - timedelta(days=1)
   
   max_active_runs = 1
   num_of_tasks = 6
   email_ids = '{REDACTED}'
   
   with DAG(
           dag_id=DAG_ID,
           start_date=start_date,
           max_active_runs=max_active_runs,
           default_args={
               'owner': 'airflow',
               'start_date': start_date,
               'max_active_runs': max_active_runs,
               'email': email_ids,
               'email_on_failure': True,
               'email_on_retry': True
           },
           schedule_interval=schedule_interval,
           dagrun_timeout=timedelta(seconds=43200),  # 6 hours
           catchup=False
   ) as dag:
       tasks = []
       for i in range(0, num_of_tasks):
           tasks.append(BashOperator(
               task_id='redacted_'+str(i+1),
               retries=10,
               retry_delay=timedelta(seconds=60),  # 1 minute retry delay
               retry_exponential_backoff=True,
               max_retry_delay=timedelta(seconds=900),  # 15 minutes max retry delay
               do_xcom_push=True,  # return the last line from the stdout
               bash_command="REDACTED.sh {} {} ".format(int(i), int(num_of_tasks)),
               dag=dag))
           if i != 0:
               tasks[i-1] >> tasks[i]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-849316552


   @iameugenejo Sounds good. I'll close for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-848524638


   I wasn't able to reproduce but I think it's related to the dynamic start_date used in the DAG which is a bad practice and can lead to all kind of undesired behavior.
   `start_date = now - timedelta(days=1)`
   
   I tend to close this issue
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-849259773


   Thanks @eladkal.
   
   @iameugenejo are you able to replicate this bug even if you remove the dynamic start_date? If not, I agree with @eladkal that we can probably chalk it up to the dynamic start date.
   
   @kaxil Is it possible/desirable to add a check for dynamic start dates and to throw an error or warning?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-811785533


   @iameugenejo can you share more details about the issue?
   how often does it happen?
   effecting specific dag or all dags in the system?
   
   Without reproduce steps / more information it might be hard to understand the root cause


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-849307992


   Dynamic start_date is still there and the issue hasn't happened for the past few months, so it might not be about the dynamic start_date.
   
   But since I'm not seeing the issue anymore, I don't mind closing this issue and reopening it when it occurs again
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-848448004


   @eladkal were you able to validate this? Just trying to get an idea what the status is


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-830712782


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo commented on issue #15036: Scheduler is skipping a day sometimes

Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #15036:
URL: https://github.com/apache/airflow/issues/15036#issuecomment-812048076


   @eladkal , it happened 5 times so far since 2/20.
   
   It's happening to 1 specific dag.
   
   The dag itself is static but the tasks the dag executes are generated dynamically.
   
   The other dags that are not showing this symptom have their tasks statically coded.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org