You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/20 15:33:14 UTC

[GitHub] [airflow] collinmcnulty opened a new issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

collinmcnulty opened a new issue #20991:
URL: https://github.com/apache/airflow/issues/20991


   ### Apache Airflow Provider(s)
   
   elasticsearch
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==1!2.5.0
   apache-airflow-providers-cncf-kubernetes==1!2.1.0
   apache-airflow-providers-datadog==1!2.0.1
   apache-airflow-providers-elasticsearch==1!2.1.0
   apache-airflow-providers-ftp==1!2.0.1
   apache-airflow-providers-google==1!6.1.0
   apache-airflow-providers-http==1!2.0.1
   apache-airflow-providers-imap==1!2.0.1
   apache-airflow-providers-microsoft-azure==1!3.3.0
   apache-airflow-providers-mysql==1!2.1.1
   apache-airflow-providers-postgres==1!2.3.0
   apache-airflow-providers-redis==1!2.0.1
   apache-airflow-providers-slack==1!4.1.0
   apache-airflow-providers-sqlite==1!2.0.1
   apache-airflow-providers-ssh==1!2.3.0
   ```
   
   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### Operating System
   
   Debian Bullseye
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   After upgrading to 2.2, task logs from manual dagruns performed before the upgrade could no longer be retrieved, even though they can still be seen in Kibana. Scheduled dagruns' tasks and tasks for dagruns begun after the upgrade are retrieved without issue.
   
   The issue appears to be because these tasks with missing logs all belong to dagruns that do not have the attribute data_interval_start or data_interval_end set.
   
   ### What you expected to happen
   
   Task logs continue to be fetched after upgrade.
   
   ### How to reproduce
   
   Below is how I verified the log fetching process.
   
   I ran the code snippet in a python interpreter in the scheduler to test log fetching.
   
   ```py
   from airflow.models import TaskInstance, DagBag, DagRun
   from airflow.settings import Session, DAGS_FOLDER
   from airflow.configuration import conf
   import logging
   from dateutil import parser
   
   logger = logging.getLogger('airflow.task')
   task_log_reader = conf.get('logging', 'task_log_reader')
   handler = next((handler for handler in logger.handlers if handler.name == task_log_reader), None)
   
   dag_id = 'pipeline_nile_reconciliation'
   task_id = 'nile_overcount_spend_resolution_task'
   execution_date = parser.parse('2022-01-10T11:49:57.197933+00:00')
   try_number=1
   
   session = Session()
   ti = session.query(TaskInstance).filter(
       TaskInstance.dag_id == dag_id,
       TaskInstance.task_id == task_id,
       TaskInstance.execution_date == execution_date).first()
   dagrun = session.query(DagRun).filter(
       DagRun.dag_id == dag_id,
       DagRun.execution_date == execution_date).first()
   
   
   dagbag = DagBag(DAGS_FOLDER, read_dags_from_db=True)
   dag = dagbag.get_dag(dag_id)
   ti.task = dag.get_task(ti.task_id)
   ti.dagrun = dagrun
   
   handler.read(ti, try_number, {})
   ```
   
   The following error log indicates errors in the log reading.
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/local/lib/python3.9/site-packages/airflow/utils/log/file_task_handler.py", line 239, in read
       log, metadata = self._read(task_instance, try_number_element, metadata)
     File "/usr/local/lib/python3.9/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 168, in _read
       log_id = self._render_log_id(ti, try_number)
     File "/usr/local/lib/python3.9/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 107, in _render_log_id
       data_interval_start = self._clean_date(dag_run.data_interval_start)
     File "/usr/local/lib/python3.9/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 134, in _clean_date
       return value.strftime("%Y_%m_%dT%H_%M_%S_%f")
   AttributeError: 'NoneType' object has no attribute 'strftime'
   ```
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr closed issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
uranusjr closed issue #20991:
URL: https://github.com/apache/airflow/issues/20991


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1036711517


   And how this plays with "providers" release too ? This change spans both "airflow" and elasticsearch provider - I am going to release the es provider early next week so I think we need to be sure back-compat work both ways - i.e. es provider works on Airflow 2.1+ as well (not 100% sure about it).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] collinmcnulty commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
collinmcnulty commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1017838837


   Seems like the problem in the code is that [this section](https://github.com/apache/airflow/blob/main/airflow/providers/elasticsearch/log/es_task_handler.py#L109-L114) requires resolving the `data_interval_start` and `data_interval_end` even if the format you specify does not use them, and they don't exist on manual dagruns from before 2.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1037092707


   The core “fix” is actually overriden when the ElasticSearch provider’s implementation is used, so just releasing the provider would be enough to resolve the issue in Airflow 2.2.3. No-one has complained about the core implementation yet, but this can still be in 2.2.4 to fix the issue pre-emptively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1017851842


   It should use `dag.get_run_data_interval(dag_run)` instead. How to get the correct DAG object, however, is the problem; `dag_run.get_dag()` *might* work, but if it doesn’t, we need to do something else, perhaps `ti.task.dag` or something else.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1018233069


   Also whoever works on this should also fix the rendering function in `FileTaskHandler` to add those same data-interval-related context variables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] collinmcnulty commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
collinmcnulty commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1017629365


   All credit to @wolfier for finding this bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1037092707


   The core “fix” is actually overriden when the ElasticSearch provider’s implementation is used, so just releasing the provider would be enough to resolve the title issue in Airflow 2.2.3. No-one has complained about the core implementation yet, but this can still be in 2.2.4 to fix the issue pre-emptively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #20991: Elasticsearch remote log will not fetch task logs from manual dagruns before 2.2 upgrade

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #20991:
URL: https://github.com/apache/airflow/issues/20991#issuecomment-1036608563


   Should this be included in 2.2.4 @uranusjr ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org