You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/23 15:03:52 UTC

[GitHub] [airflow] dddevis opened a new issue, #25706: `conn_id` for S3 logging isn't defined using SecretsManagerBackend

dddevis opened a new issue, #25706:
URL: https://github.com/apache/airflow/issues/25706

   ### Official Helm Chart version
   
   1.6.0 (latest released)
   
   ### Apache Airflow version
   
   2.3.0
   
   ### Kubernetes Version
   
   1.22
   
   ### Helm Chart configuration
   
   I'm trying to set up remote logging on S3.  The `config.logging` stanza looks like this:
   
   ```yaml
     logging:
       remote_logging: "True"
       remote_base_log_folder: "s3://my-bucket/airflow-logs"
       remote_log_conn_id: "airflow-remote-logging-conn"
       encrypt_s3_logs: "False"
       colored_console_log: "False"
   ```
   
   I have an AWS Secrets Manager secret called `airflow/connections/airflow-remote-logging-conn`.  Via Helm I have configured Airflow to use Secrets Manager as the secrets backend.  Here is my `config.secrets`:
   
   ```yaml
     secrets:
       backend: "airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend"
       backend_kwargs: '{"connections_prefix": "airflow/connections", "variables_prefix": "airflow/variables", "config_prefix": "airflow/config"}'
   ```
   
   ### Docker Image customisations
   
   A minimal extension to the official Docker image:
   
   ```dockerfile
   ARG AIRFLOW_VERSION=2.3.0
   ARG PYTHON_VERSION=3.9
   
   FROM apache/airflow:${AIRFLOW_VERSION}-python${PYTHON_VERSION}
   
   COPY requirements.txt requirements.txt
   COPY dags /opt/airflow/dags
   
   USER airflow
   
   RUN python -m pip install --user -r requirements.txt
   ```
   
   where `requirements.txt` is
   
   ```
   apache-airflow-providers-amazon==3.0.0
   apache-airflow-providers-cncf-kubernetes==3.0.2
   apache-airflow-providers-postgres==5.0.0
   boto3==1.21.7
   ```
   
   ### What happened
   
   Remote logging does not work.  All task logs in the UI report:
   
   ```
   *** Failed to verify remote log exists s3://my-bucket/airflow_logs/dag_id=my_dag/run_id=scheduled__2022-08-10T06:00:00+00:00/task_id=my_task/attempt=1.log.
   The conn_id `airflow-remote-logging-conn` isn't defined
   *** Falling back to local log
   *** Trying to get logs (last 100 lines) from worker pod mytask-26fd147458d74729a4d609638f0a03db ***
   
   
   [2022-08-11, 06:00:14 UTC] {dagbag.py:507} INFO - Filling up the DagBag from /opt/airflow/dags/repo/airflow/dags/my_dag.py
   [2022-08-11, 06:00:16 UTC] {task_command.py:369} INFO - Running <TaskInstance:my_dag.my_task scheduled__2022-08-10T06:00:00+00:00 [queued]> on host mytask-26fd147458d74729a4d609638f0a03db
   ```
   
   ### What you think should happen instead
   
   Remote logging to S3 should work and the Airflow deployment should recognize that the connection `airflow-remote-logging-conn` exists, since the Secrets Manager secret name is `airflow/connections/airflow-remote-logging-conn`, adhering to the specification in `config.secrets.backend_kwargs`.  
   
   I have essentially the same setup for a different Airflow instance I manage with Helm chart v1.1.0, and I have no problems there.  
   
   ### How to reproduce
   
   1. In `values.yaml` for the v1.6.0 Helm chart, change `config.logging` and `config.secrets` as above.
   2. Change `workers.serviceAccount` along these lines:
   ```yaml
     serviceAccount:
       create: true
       name: "airflow-worker"
       annotations:
         eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/airflow-worker"
   ```
   where the IRSA `airflow-worker` has these policies attached:
   ```json
   {
       "Statement": [
           {
               "Action": [
                   "s3:ListBucket",
                   "s3:*Object*"
               ],
               "Effect": "Allow",
               "Resource": [
                   "arn:aws:s3:::my-bucket/*",
                   "arn:aws:s3:::my-bucket"
               ],
               "Sid": "MyBucketReadWritePolicyDocument"
           }
       ],
       "Version": "2012-10-17"
   }
   ```
   and
   ```json
   {
       "Statement": [
           {
               "Action": [
                   "secretsmanager:ListSecretVersionIds",
                   "secretsmanager:GetSecretValue",
                   "secretsmanager:GerResourcePolicy",
                   "secretsmanager:DescribeSecret"
               ],
               "Effect": "Allow",
               "Resource": "arn:aws:secretsmanager:us-east-1.123456789:secret:airflow/*",
               "Sid": "SecretsManagerAirflowReadOnlyPolicy"
           }
       ],
       "Version": "2012-10-17"
   }
   ```
   Similarly for IRSAs `airflow-scheduler` and `airflow-webserver`.  
   3. Create a Secrets Manager secret called `airflow/connections/airflow-remote-logging-conn` and populate it with `s3://my-bucket`.  
   4. Run the example DAG and see the error reporting that the connection `airflow-remote-logging-conn` isn't defined.
   
   ### Anything else
   
   This happens every time for any task with chart version 1.6.0.  The same setup with 1.1.0 works.  
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #25706: `conn_id` for S3 logging isn't defined using SecretsManagerBackend

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #25706:
URL: https://github.com/apache/airflow/issues/25706#issuecomment-1218077353

   @dddevis Did you tried set [logging_level](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#logging-level) to DEBUG
   
   Most of the errors during get data from secrets manager suppressed
   https://github.com/apache/airflow/blob/e736c2d1545501d6998a02380778574ae159a203/airflow/providers/amazon/aws/secrets/secrets_manager.py#L446-L480


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25706: `conn_id` for S3 logging isn't defined using SecretsManagerBackend

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25706:
URL: https://github.com/apache/airflow/issues/25706#issuecomment-1224200727

   `airflow-remote-logging-conn` -> you need to define it in connections.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25706: `conn_id` for S3 logging isn't defined using SecretsManager backend

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25706:
URL: https://github.com/apache/airflow/issues/25706#issuecomment-1214181807

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25706: `conn_id` for S3 logging isn't defined using SecretsManagerBackend

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25706: `conn_id` for S3 logging isn't defined using SecretsManagerBackend
URL: https://github.com/apache/airflow/issues/25706


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dddevis commented on issue #25706: `conn_id` for S3 logging isn't defined using SecretsManagerBackend

Posted by GitBox <gi...@apache.org>.
dddevis commented on issue #25706:
URL: https://github.com/apache/airflow/issues/25706#issuecomment-1220019248

   @Taragolis I've set the logging level to "DEBUG" and I see this instead:
   
   ```
   *** Failed to verify remote log exists s3://devis-us-east-1/airflow_logs/dag_id=my_dag/run_id=scheduled__2022-08-17T06:00:00+00:00/task_id=my_task/attempt=1.log.
   The conn_id `airflow-remote-logging-conn` isn't defined
   *** Falling back to local log
   *** Trying to get logs (last 100 lines) from worker pod my_task-f6b5e030da01477cb379a6a4f6ece722 ***
   
   *** Unable to fetch logs from worker pod my_task-f6b5e030da01477cb379a6a4f6ece722 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c0e930ec-c774-4116-ba08-9bdd5999f90c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 18 Aug 2022 22:05:20 GMT', 'Content-Length': '294'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"my_task-f6b5e030da01477cb379a6a4f6ece722\\" not found","reason":"NotFound","details":{"name":"my_task-f6b5e030da01477cb379a6a4f6ece722","kind":"pods"},"code":404}\n'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org