You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/11/25 10:29:49 UTC

[GitHub] [airflow] mik-laj commented on a change in pull request #6644: [AIRFLOW-6047] Simplify the logging configuration template

mik-laj commented on a change in pull request #6644: [AIRFLOW-6047] Simplify the logging configuration template
URL: https://github.com/apache/airflow/pull/6644#discussion_r350099913
 
 

 ##########
 File path: airflow/config_templates/airflow_local_settings.py
 ##########
 @@ -148,26 +128,61 @@
     }
 }
 
-REMOTE_HANDLERS = {
-    's3': {
+# Only update the handlers and loggers when CONFIG_PROCESSOR_MANAGER_LOGGER is set.
+# This is to avoid exceptions when initializing RotatingFileHandler multiple times
+# in multiple processes.
+if os.environ.get('CONFIG_PROCESSOR_MANAGER_LOGGER') == 'True':
+    DEFAULT_LOGGING_CONFIG['handlers'] \
+        .update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'])
+    DEFAULT_LOGGING_CONFIG['loggers'] \
+        .update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['loggers'])
+
+    # Manually create log directory for processor_manager handler as RotatingFileHandler
+    # will only create file but not the directory.
+    processor_manager_handler_config = DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'][
+        'processor_manager']
+    directory = os.path.dirname(processor_manager_handler_config['filename'])
+    mkdirs(directory, 0o755)
+
+# Remote logging configuration
+
+# Storage bucket URL for remote logging
+# S3 buckets should start with "s3://"
+# GCS buckets should start with "gs://"
+# WASB buckets should start with "wasb"
+# just to help Airflow select correct handler
+REMOTE_BASE_LOG_FOLDER = conf.get('core', 'REMOTE_BASE_LOG_FOLDER')
+
+ELASTICSEARCH_HOST = conf.get('elasticsearch', 'HOST')
+
+REMOTE_LOGGING = conf.getboolean('core', 'remote_logging')
+
+if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
+    S3_REMOTE_HANDLERS = {
         'task': {
             'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
             'formatter': 'airflow',
 
 Review comment:
   This is not common to all handlers, so it will be problematic. My Stackdriver handler contains the following configurations:
   https://github.com/PolideaInternal/airflow/blob/e2511a74bfdd3824845ae037e4a50de127c223d6/airflow/config_templates/airflow_local_settings.py
   ```python
       gcp_conn_id = conf.get('core', 'REMOTE_LOG_CONN_ID', fallback=None)
       # stackdriver:///airflow-tasks => airflow-tasks
       REMOTE_BASE_LOG_FOLDER = urlparse(REMOTE_BASE_LOG_FOLDER).path[1:]
       STACKDRIVER_REMOTE_HANDLERS = {
           'task': {
               'class': 'airflow.utils.log.stackdriver_task_handler.StackdriverTaskHandler',
               'formatter': 'airflow',
               'name': REMOTE_BASE_LOG_FOLDER,
               'gcp_conn_id': gcp_conn_id
           }
       }
   
       DEFAULT_LOGGING_CONFIG['handlers'].update(STACKDRIVER_REMOTE_HANDLERS)
   ```
   I'm also afraid that pulling out only part of the configuration to a separate variable will make it difficult to understand. This is not a classic code that must follow DRY rules to avoid problems. This is a configuration file where each code has a different purpose. They look similar, but each has its own separate role. First of all, this file should be easy to understand and adapt to the specific case of our users .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services