You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/14 21:36:35 UTC

[GitHub] [airflow] sid-habu opened a new issue, #28365: S3 Remote Logging not working on Airflow 2.4.3

sid-habu opened a new issue, #28365:
URL: https://github.com/apache/airflow/issues/28365

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   Upgrade from Airflow 2.3.4 to 2.4.3 has stopped S3 remote logging. We had a working `aws` connection type in 2.3.4 with the below attributes.  
   
   ```
   AWS Access Key ID
   AWS Secret Access Key
   Extra: {"region_name": "us-east-2", "role_arn": "arn:aws:iam::xxxxxxxxx:role/service/xxxxxxxxx"}
   ```
   
   With upgrade to 2.4.3, remote logging no longer with below error
   
   ```
   canarydagprintdate-938a1a1f10ab4c3c9db5650fdec731b7
   *** Failed to verify remote log exists s3://xxxxxx/dag_id=canary_dag/run_id=scheduled__2022-12-14T21:21:40.610340+00:00/task_id=print_date/attempt=1.log.
   An error occurred (403) when calling the HeadObject operation: Forbidden
   *** Falling back to local log
   *** Trying to get logs (last 100 lines) from worker pod canarydagprintdate-938a1a1f10ab4c3c9db5650fdec731b7 ***
   ```
   
   I manually verified using AWS CLI that the IAM user is able to assume role and access the S3 bucket and list keys within it.
   
   Below are S3 bucket permissions
   
   ```
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::xxxx:role/service/xxxxx"
               },
               "Action": [
                   "s3:ListBucket",
                   "s3:GetBucketLocation"
               ],
               "Resource": "arn:aws:s3:::xxxxxx"
           },
           {
               "Sid": "",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::xxxxx:role/service/xxxxxxx"
               },
               "Action": [
                   "s3:PutObjectAcl",
                   "s3:PutObject",
                   "s3:GetObjectVersion",
                   "s3:GetObjectAcl",
                   "s3:GetObject",
                   "s3:DeleteObjectVersion",
                   "s3:DeleteObject"
               ],
               "Resource": "arn:aws:s3:::xxxxxxx/*"
           }
       ]
   }
   ```
   
   ### What you think should happen instead
   
   Remote logging to S3 used to work on Airflow 2.3.4
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==6.1.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.4.0
   apache-airflow-providers-common-sql==1.3.1
   apache-airflow-providers-docker==3.2.0
   apache-airflow-providers-elasticsearch==4.2.1
   apache-airflow-providers-ftp==3.1.0
   apache-airflow-providers-google==8.4.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.1.0
   apache-airflow-providers-http==4.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.3.0
   apache-airflow-providers-mysql==3.2.1
   apache-airflow-providers-odbc==3.1.2
   apache-airflow-providers-postgres==5.2.2
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==4.1.0
   apache-airflow-providers-slack==6.0.0
   apache-airflow-providers-snowflake==3.3.0
   apache-airflow-providers-sqlite==3.2.1
   apache-airflow-providers-ssh==3.2.0
   ```
   
   Note: Upgraded `apache-airflow-providers-amazon==6.1.0` from `6.0.0` as I saw a related issue in https://github.com/apache/airflow/pull/26946/files
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Cloud provider or hardware configuration: AWS
   Custom Helm Chart
   Kubernetes version (if you are using kubernetes) (use kubectl version): 1.21.14
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sid-habu commented on issue #28365: S3 Remote Logging not working on Airflow 2.4.3

Posted by GitBox <gi...@apache.org>.
sid-habu commented on issue #28365:
URL: https://github.com/apache/airflow/issues/28365#issuecomment-1352399448

   > Just for confirm.
   > 
   > 1. You provide AWS Access Key ID, AWS Secret Access Key and role_arn in extra? Because You not masked as `xxxxx` for access key and secret key, so I assume that might be option that you retrieve initial credentials as IAM Profile or somewhere else.
   > 2. Are log files created in S3?
   > 3. Does this issue persist for newly created logs? After you upgrade to 6.1.0
   > 4. Did you also upgrade amazon-provider on webserver? Just ask because logs retrieved from webserver, not scheduler.
   > 5. Could you try to check your credentials by this snippet: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#snippet-to-create-connection-and-convert-to-uri and check is it printed correct role.
   
   1. The Access Key and Secret are provided in the Connection UI
   2. Yes, I see the log files getting generated in the S3
   3. Yes
   4. This might be the issue! You are right as `pip freeze` in webserver is still pointing to `6.0.0`
   
   ```
   apache-airflow-providers-amazon==6.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.4.0
   apache-airflow-providers-common-sql==1.2.0
   apache-airflow-providers-docker==3.2.0
   apache-airflow-providers-elasticsearch==4.2.1
   apache-airflow-providers-ftp==3.1.0
   apache-airflow-providers-google==8.4.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.1.0
   apache-airflow-providers-http==4.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.3.0
   apache-airflow-providers-mysql==3.2.1
   apache-airflow-providers-odbc==3.1.2
   apache-airflow-providers-postgres==5.2.2
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==4.1.0
   apache-airflow-providers-slack==6.0.0
   apache-airflow-providers-snowflake==3.3.0
   apache-airflow-providers-sqlite==3.2.1
   apache-airflow-providers-ssh==3.2.0
   ```
   
   5. Yes, it's correct
   
   Let me work on 4 above. Thank you so much for the insights


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sid-habu commented on issue #28365: S3 Remote Logging not working on Airflow 2.4.3

Posted by GitBox <gi...@apache.org>.
sid-habu commented on issue #28365:
URL: https://github.com/apache/airflow/issues/28365#issuecomment-1352402832

   > > Just for confirm.
   > > 
   > > 1. You provide AWS Access Key ID, AWS Secret Access Key and role_arn in extra? Because You not masked as `xxxxx` for access key and secret key, so I assume that might be option that you retrieve initial credentials as IAM Profile or somewhere else.
   > > 2. Are log files created in S3?
   > > 3. Does this issue persist for newly created logs? After you upgrade to 6.1.0
   > > 4. Did you also upgrade amazon-provider on webserver? Just ask because logs retrieved from webserver, not scheduler.
   > > 5. Could you try to check your credentials by this snippet: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#snippet-to-create-connection-and-convert-to-uri and check is it printed correct role.
   > 
   > 1. The Access Key and Secret are provided in the Connection UI
   > 2. Yes, I see the log files getting generated in the S3
   > 3. Yes
   > 4. This might be the issue! You are right as `pip freeze` in webserver is still pointing to `6.0.0`
   > 
   > ```
   > apache-airflow-providers-amazon==6.0.0
   > apache-airflow-providers-celery==3.0.0
   > apache-airflow-providers-cncf-kubernetes==4.4.0
   > apache-airflow-providers-common-sql==1.2.0
   > apache-airflow-providers-docker==3.2.0
   > apache-airflow-providers-elasticsearch==4.2.1
   > apache-airflow-providers-ftp==3.1.0
   > apache-airflow-providers-google==8.4.0
   > apache-airflow-providers-grpc==3.0.0
   > apache-airflow-providers-hashicorp==3.1.0
   > apache-airflow-providers-http==4.0.0
   > apache-airflow-providers-imap==3.0.0
   > apache-airflow-providers-microsoft-azure==4.3.0
   > apache-airflow-providers-mysql==3.2.1
   > apache-airflow-providers-odbc==3.1.2
   > apache-airflow-providers-postgres==5.2.2
   > apache-airflow-providers-redis==3.0.0
   > apache-airflow-providers-sendgrid==3.0.0
   > apache-airflow-providers-sftp==4.1.0
   > apache-airflow-providers-slack==6.0.0
   > apache-airflow-providers-snowflake==3.3.0
   > apache-airflow-providers-sqlite==3.2.1
   > apache-airflow-providers-ssh==3.2.0
   > ```
   > 
   > 5. Yes, it's correct
   > 
   > Let me work on 4 above. Thank you so much for the insights
   
   Confirmed, it works after upgrading to `apache-airflow-providers-amazon==6.1.0` on both web server and scheduler. It might be a good idea to update `https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.7.txt` with `apache-airflow-providers-amazon==6.1.0` as `6.0.0` is broken for S3/Assume role 
   
   ```
   Reading remote log from s3://habu-stage-kedge-logs/dag_id=airflow-log-cleanup/run_id=scheduled__2022-12-14T00:00:00+00:00/task_id=log_cleanup_worker_num_1_dir_0/attempt=1.log.
   [2022-12-15, 00:01:24 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: airflow-log-cleanup.log_cleanup_worker_num_1_dir_0 scheduled__2022-12-14T00:00:00+00:00 [queued]>
   [2022-12-15, 00:01:24 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: airflow-log-cleanup.log_cleanup_worker_num_1_dir_0 scheduled__2022-12-14T00:00:00+00:00 [queued]>
   ```
   
   Thanks once again @Taragolis 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sid-habu closed issue #28365: S3 Remote Logging not working on Airflow 2.4.3

Posted by GitBox <gi...@apache.org>.
sid-habu closed issue #28365: S3 Remote Logging not working on Airflow 2.4.3
URL: https://github.com/apache/airflow/issues/28365


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #28365: S3 Remote Logging not working on Airflow 2.4.3

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28365:
URL: https://github.com/apache/airflow/issues/28365#issuecomment-1352367459

   Just for confirm.
   
   1. You provide AWS Access Key ID, AWS Secret Access Key and role_arn in extra? Because You not masked as `xxxxx` for access key and secret key, so I assume that might be option that you retrieve initial credentials as IAM Profile or somewhere else.
   
   2. Are log files created in S3?
   
   3. Does this issue persist for newly created logs? After you upgrade to 6.1.0
   
   4. Did you also upgrade amazon-provider on webserver? Just ask because logs retrieved from webserver, not scheduler.
   
   5. Could you try to check your credentials by this snippet: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#snippet-to-create-connection-and-convert-to-uri and check is it printed correct role.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #28365: S3 Remote Logging not working on Airflow 2.4.3

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #28365:
URL: https://github.com/apache/airflow/issues/28365#issuecomment-1352239263

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org