You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/12/05 18:26:44 UTC

[GitHub] [airflow] mik-laj edited a comment on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch

mik-laj edited a comment on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch
URL: https://github.com/apache/airflow/pull/5177#issuecomment-557788417
 
 
   I wonder why this change had to make changes to the ``views.py`` file.  Why were the logsHandler not updated?  In my opinion, we should check the existence of the ``download_logs`` key in the implementation of the handler logic and then disable the pagination mechanism. Now the abstraction of code is running away and it is possible that we are breaking other handlers because they expected different behavior.
   
   I am working on documentation for this class and if we withdraw this change we will be able to do it as follows.  In my opinion this is the original behavior of this code.
   ```python
   class TaskHandler(logging.Handler, ABC):
       """
       Handler that allows you to write and read information about a specific task.
       """
       @abstractmethod
       def read(
           self, task_instance: TaskInstance, try_number: Optional[int] = None, metadata: Optional[Dict] = None
       ) -> Tuple[List[str], List[Dict]]:
           """
           Read logs of given task instance.
   
           It supports log pagination. To do this, the first call to this function contains an empty metadata
           object. As a result, list of logs and list of metadata should be returned. The resulting
           metadata should contain the key ``end_of_logs``, which determines whether pagination should be
           continued. It is possible to return more metadata objects, but only the first is used, so
           you should always return a list with one item.
   
           The remaining keys in the dictionary are sent back to the method without changes, which means that
           if you add an additional key with a token or page number, you can expect that the key will be
           available in the next request for logs.
   
           If the metadata in the call contains the ``download_logs'' key, then full logs should be
           returned without pagination.
   
           :param task_instance: task instance object
           :param try_number: task instance try_number to read logs from. If None
              it returns all logs separated by try_number
           :param metadata: log metadata,
               can be used for steaming log reading and auto-tailing.
           :return: a list of logs and list of metadata objects.
           """
           ...
   
       @abstractmethod
       def set_context(self, task_instance: TaskInstance) -> None:
           """
           Provide task_instance context to airflow task handler.
   
           Different implementations provide different behavior. Examples of behavior are:
   
           * in the case of handlers writing to a file, it may start writing to another file;
           * for remote services, it can start adding labels to logs.
   
           This allows us to later search for logs for a single task
   
           :param task_instance: task instance object
           """
           ...
   ```
   
   I would like to point out that in this PR there is a duplicate mechanism of handling the case when try_numbers is empty, i.e. the log for all try numbers is downloaded.
   
   Previously it was part of the file_task_handler handler, and now it has been copied a second time to another place despite the es_task_handler extended from this handler.
   https://github.com/apache/airflow/blob/master/airflow/utils/log/file_task_handler.py#L155-L164
   https://github.com/apache/airflow/blob/master/airflow/utils/log/es_task_handler.py#L36
   https://github.com/apache/airflow/blob/master/airflow/www/views.py#L595-L599

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services