You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/22 01:55:41 UTC

[GitHub] [airflow] Rukeith opened a new issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Rukeith opened a new issue #14352:
URL: https://github.com/apache/airflow/issues/14352


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 
   v2.1.0.dev0
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: Docker on GKE
   - **OS** (e.g. from /etc/os-release): 
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   Here is my logging configuration at `airflow.cfg`
   
   ```
   [logging]
   # The folder where airflow should store its log files
   # This path must be absolute
   base_log_folder = /opt/airflow/logs
   
   # Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
   # Set this to True if you want to enable remote logging.
   remote_logging = True
   
   # Users must supply an Airflow connection id that provides access to the storage
   # location.
   remote_log_conn_id = AIRFLOW_LOG_BUCKET
   
   # Path to Google Credential JSON file. If omitted, authorization based on `the Application Default
   # Credentials
   # <https://cloud.google.com/docs/authentication/production#finding_credentials_automatically>`__ will
   # be used.
   google_key_path = /secrets/service_account.json
   
   # Storage bucket URL for remote logging
   # S3 buckets should start with "s3://"
   # Cloudwatch log groups should start with "cloudwatch://"
   # GCS buckets should start with "gs://"
   # WASB buckets should start with "wasb" just to help Airflow select correct handler
   # Stackdriver logs should start with "stackdriver://"
   remote_base_log_folder = gs://airflow/logs
   
   # Use server-side encryption for logs stored in S3
   encrypt_s3_logs = False
   
   # Logging level
   logging_level = INFO
   
   # Logging level for Flask-appbuilder UI
   fab_logging_level = WARN
   
   # Logging class
   # Specify the class that will specify the logging configuration
   # This class has to be on the python classpath
   # Example: logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
   logging_config_class =
   
   # Flag to enable/disable Colored logs in Console
   # Colour the logs when the controlling terminal is a TTY.
   colored_console_log = True
   
   # Log format for when Colored logs is enabled
   colored_log_format = [%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s
   colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatter
   
   # Format of Log line
   log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
   simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
   
   # Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
   # Example: task_log_prefix_template = {ti.dag_id}-{ti.task_id}-{execution_date}-{try_number}
   task_log_prefix_template =
   
   # Formatting for how airflow generates file names/paths for each task run.
   log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
   
   # Formatting for how airflow generates file names for log
   log_processor_filename_template = {{ filename }}.log
   
   # full path of dag_processor_manager logfile
   dag_processor_manager_log_location = /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log
   
   # Name of handler to read task instance logs.
   # Defaults to use ``task`` handler.
   task_log_reader = task
   
   # A comma\-separated list of third-party logger names that will be configured to print messages to
   # consoles\.
   # Example: extra_loggers = connexion,sqlalchemy
   extra_loggers =
   
   ```
   
   **What you expected to happen**:
   
   I expected I could read the task log on UI
   
   **How to reproduce it**:
   
   While I run any DAGs, the log always shows up this.
   
   ```
   *** Unable to read remote log from gs://airflow/server/logs/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
   *** maximum recursion depth exceeded while calling a Python object
   
   *** Log file does not exist: /opt/airflow/logs/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
   *** Fetching from: http://airflow-worker-deploy-685868b855-fx7cr:8793/log/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
   *** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-deploy-685868b855-fx7cr', port=8793): Max retries exceeded with url: /log/dag_id/task_id//2021-02-21T00:40:00+00:00/4.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce4f00e7f0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
   ```
   
   The log files have already exists on worker and store on GCS.
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potatochip commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
potatochip commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-959784317






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stevenzhang-support commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
stevenzhang-support commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-977550855


   I have a customer with the same symptom and could the airflow team share **any updates** on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LeXy0623 edited a comment on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
LeXy0623 edited a comment on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-970235177


   Hi, I have same log symptom at the Task log (I didn't check the worker log, yet).
   What is interesting is that it happens only 1-3 tasks per 500 daily and always changing when and where it happens but for me it is only BigQueryOperators which are afftected. 
   It is Composer composer-1.17.0-preview.3-airflow-2.0.1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] YiiTing commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
YiiTing commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-831062244


   Hi all,
   i have some problems
   
   here is my UI log
   `*** Reading remote log from gs://airflow-log/inspect_apii/main.inspect_p/2021-05-03T06:40:00+00:00/1.log.
   b'*** Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/airflow-log/o/inspect_apii%2Fmain.inspect_p%2F2021-05-03T06%3A40%3A00%2B00%3A00%2F1.log?alt=media: No such obj`
   
   but i can find my log in gcs 
   `*** Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/airflow-log/o/inspect_apii%2Fmain.inspect_p%2F2021-05-03T06%3A40%3A00%2B00%3A00%2F1.log?alt=media: No such object: airflow-log/inspect_apii/main.inspect_p/2021-05-03T06:40:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   `
   
   what's going on?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] YiiTing removed a comment on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
YiiTing removed a comment on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-831062244


   Hi all,
   i have some problems
   
   here is my UI log
   `*** Reading remote log from gs://airflow-log/inspect_apii/main.inspect_p/2021-05-03T06:40:00+00:00/1.log.
   b'*** Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/airflow-log/o/inspect_apii%2Fmain.inspect_p%2F2021-05-03T06%3A40%3A00%2B00%3A00%2F1.log?alt=media: No such obj`
   
   but i can find my log in gcs 
   `*** Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/airflow-log/o/inspect_apii%2Fmain.inspect_p%2F2021-05-03T06%3A40%3A00%2B00%3A00%2F1.log?alt=media: No such object: airflow-log/inspect_apii/main.inspect_p/2021-05-03T06:40:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   `
   
   what's going on?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LeXy0623 commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
LeXy0623 commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-970235177


   Hi, I have same log sympton at the Task log (I didn't check the worker log, yet).
   What is interesting is that it happens only 1-3 tasks per 500 daily and always changing when and where it happens but for me it is only BigQueryOperators which are afftected. 
   It is Composer composer-1.17.0-preview.3-airflow-2.0.1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] wpromatt commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
wpromatt commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-949873147


   Any updates on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-978140574


   @stevenzhang-support  I think every case of that might be different, the fact that you get it intermittenly (I speculate because without details you said generic "the same" without being specific if it is persistent or intermittent) then most likely there is a deployment issue somewhere - if Airflow can read the logs in general and only sometimes it can't then probably some proxy, firewall, rate lmitation or something comes into play and you need to investigate your deployment. If it is not working consistently - you likely has badly configured access.
   
   But it's impossible to get "team" answering that differently, because we have no clearly reproducible case that could help us diagnose it. If you are on Composer, then likely the best way is to reach out to composer support. 
   
   Since there is no clear reproducibility here, I will convert it to discussion - but if someone has similar issue and will be able to provide more diagnostics and reproducibility we can always convert it back to an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rishabh-cldcvr commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
rishabh-cldcvr commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-899991601


   hey @Rukeith  did you get a solution to this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rukeith commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
Rukeith commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-783790209


   @SamWheating `gs://airflow` is just a mock name which is not the exact name on `airflow.cfg`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rukeith commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
Rukeith commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-900013592


   @rishabh-cldcvr Sorry, I haven't fixed this. I still have no idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #14352:
URL: https://github.com/apache/airflow/issues/14352


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-783429604


   Is `gs://airflow` the correct name for your GCS bucket? GCS buckets have globally-unique names so I would be surprised / impressed if you were able to create a bucket with this name. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potatochip commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
potatochip commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-959784317


   If airflow gets 404 then service account probably doesn't have adequate permission


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rukeith commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
Rukeith commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-962810131


   @potatochip Well, I am sure it has enough permission, I had made sure the service account got the highest permission. And airflow did save the log to my GCS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-782985830


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potatochip commented on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
potatochip commented on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-959784317


   If airflow gets 404 then service account probably doesn't have adequate permission


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stevenzhang-support edited a comment on issue #14352: Airflow2.0 cannot read remote log from GCP GCS

Posted by GitBox <gi...@apache.org>.
stevenzhang-support edited a comment on issue #14352:
URL: https://github.com/apache/airflow/issues/14352#issuecomment-977550855


   I have a customer with the same symptom and could the airflow team share **any updates** on this?
   The actual message is below:
   ```
   Log file is not found: gs://<path_URL>/2021-11-16T01:30:00+00:00/1.log. The task might not have been executed or worker executing it might have finished abnormally (e.g. was evicted)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org