You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/19 13:36:57 UTC

[GitHub] [airflow] armandleopold opened a new issue #12484: Add documentation about query execution_date format for Elasticsearch Task Handler

armandleopold opened a new issue #12484:
URL: https://github.com/apache/airflow/issues/12484


   **Description**
   
   There is not enough documentation about the **execution_date** format for the elasticsearch **log_id** creation from custom logging pipelines.
   
   > from [Airflow Configuration Reference](https://airflow.apache.org/docs/stable/configurations-ref.html#log-id-template)
   
       - name: AIRFLOW__ELASTICSEARCH__LOG_ID_TEMPLATE
         value: "{dag_id}-{task_id}-{execution_date}-{try_number}"
   
   This function : [_clean_execution_date](https://github.com/apache/airflow/blob/master/airflow/providers/elasticsearch/log/es_task_handler.py#L123) declares a required format that needs to be meet in the **log_id** in order for airflow to fetch the logs from elasticsearch.
   
   **Use case / motivation**
   
   In a local kubernetes environment with the **KubernetesExecutor** & **KubernetesPodOperator**, I would like to log the logs from my pods to an elasticsearch from which i can fetch them to the Airflow UI.
   
   Unfortunately, everywhere in the UI , the displayed date format is not what is queried from Elasticsearch 
   * Displayed format : `2020-11-19T12:57:33.605561+00:00`
   * Queried format : `2020_11_19T12_57_33_605561`
   
   I have searched for so many things for so long, i have loose 1 week of work because of this lack of documentation for my (i admit, very specific) case.
   
   I suggess to add the required format here : [configuration reference](https://airflow.apache.org/docs/stable/configurations-ref.html#log-id-template)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #12484: Add documentation about query execution_date format for Elasticsearch Task Handler

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12484:
URL: https://github.com/apache/airflow/issues/12484#issuecomment-730378882


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] armandleopold closed issue #12484: Add documentation about query execution_date format for Elasticsearch Task Handler

Posted by GitBox <gi...@apache.org>.
armandleopold closed issue #12484:
URL: https://github.com/apache/airflow/issues/12484


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB commented on issue #12484: Add documentation about query execution_date format for Elasticsearch Task Handler

Posted by GitBox <gi...@apache.org>.
itayB commented on issue #12484:
URL: https://github.com/apache/airflow/issues/12484#issuecomment-1017771758


   I found the problem. I have a separate pod for airflow-scheduler and airflow-webserver.
   
   I added the `AIRFLOW__ELASTICSEARCH__JSON_FORMAT=True` only to the airflow-scheduler + the workers but not to the airflow-webserver.
   
   I dig into the source code and I found that the webserver [checks][1] for the `AIRFLOW__ELASTICSEARCH__JSON_FORMAT` as well in order to [transform][2] the `log_id` in order to clean the dates to the right format:
   ```
   if self.json_format:
       data_interval_start = self._clean_date(dag_run.data_interval_start)
       data_interval_end = self._clean_date(dag_run.data_interval_end)
       execution_date = self._clean_date(dag_run.execution_date)
   else:
       data_interval_start = dag_run.data_interval_start.isoformat()
       data_interval_end = dag_run.data_interval_end.isoformat()
       execution_date = dag_run.execution_date.isoformat()
   ```
   
   Also post the answer in [SO](https://stackoverflow.com/questions/70787712/airflow-wrong-log-id-format/70791172#70791172)
   
   
     [1]: https://github.com/apache/airflow/blob/main/airflow/providers/elasticsearch/log/es_task_handler.py#L111
     [2]: https://github.com/apache/airflow/blob/main/airflow/providers/elasticsearch/log/es_task_handler.py#L108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] armandleopold commented on issue #12484: Add documentation about query execution_date format for Elasticsearch Task Handler

Posted by GitBox <gi...@apache.org>.
armandleopold commented on issue #12484:
URL: https://github.com/apache/airflow/issues/12484#issuecomment-1022531318


   Thanks @itayB 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB commented on issue #12484: Add documentation about query execution_date format for Elasticsearch Task Handler

Posted by GitBox <gi...@apache.org>.
itayB commented on issue #12484:
URL: https://github.com/apache/airflow/issues/12484#issuecomment-1012205988


   @armandleopold hi! did you solve this issue? I'm having the same problem in Airflow v2.2.3.
   I configured the logs to json format and the `log_id` is being created but with the wrong `execution_date` format:
   ```
   "log_id": "sparkjobs-8765280-check_daily_output-2022_01_12T07_00_00_000000-6"
   ```
   while the Web UI is sending the execution_date in a different format (`2022-01-12T07:00:00+00:00`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org