You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/30 03:55:48 UTC

[GitHub] [airflow] josh-fell opened a new issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

josh-fell opened a new issue #19883:
URL: https://github.com/apache/airflow/issues/19883


   ### Apache Airflow Provider(s)
   
   microsoft-azure
   
   ### Versions of Apache Airflow Providers
   
   Latest of all providers available on Airflow `main`.
   
   
   ### Apache Airflow version
   
   main (development)
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Using Breeze on `main` branch.
   
   ### What happened
   
   When authenticating to Azure Blob Storage using the ["Shared Key" method](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key) (which only includes the shared access key and storage account URL), both the key value and URL are printed in plain text to the task logs.
   
   Example log entry:
   ```
   [2021-11-29, 21:57:53 EST] {base.py:79} INFO - Using connection to: id: wasb_default. Host: https://myAccountBlahBlah.blob.core.windows.net/, Port: None, Schema: , Login: , Password: None, extra: {'extra__wasb__connection_string': '', 'extra__wasb__sas_token': '***', 'extra__wasb__shared_access_key': 'KEa1QvMjxMNLuzTgVZFvS6PpQv087Ls0Oq+7Ic/fa9Lu3RQwunHi61yZTCJSCIo1gZBNLuzTgVZFvS6PpQv087Ls0Oq+7Ic/fa9Lu3RQwunHi61yZTCJSCIo1gZBH1cc/KZb3EAKXrqWXXXXXXXXXXXXXXXXXXXXXX==', 'extra__wasb__tenant_id': ''}
   ```
   
   ### What you expected to happen
   
   The entirety of the credentials used to authenticate to Azure Blob Storage should not be visible in plain text within the task logs.
   
   ### How to reproduce
   
   - Create an Azure Blob Storage connection populating the "Shared Access Key" and "Account Name" fields in the UI or creating a connection using the `shared_access_key`/`extra__wasb__shared_access_key` extra and `host`.
   - Use a simplified version of the existing `example_local_to_wasb` DAG in the Azure provider:
   ```python
   import os
   from datetime import datetime
   
   from airflow.models import DAG
   from airflow.providers.microsoft.azure.operators.wasb_delete_blob import WasbDeleteBlobOperator
   from airflow.providers.microsoft.azure.transfers.local_to_wasb import LocalFilesystemToWasbOperator
   
   PATH_TO_UPLOAD_FILE = os.environ.get('AZURE_PATH_TO_UPLOAD_FILE', 'example-text.txt')
   
   with DAG(
       "example_local_to_wasb",
       schedule_interval="@once",
       start_date=datetime(2021, 1, 1),
       catchup=False,
       default_args={"container_name": "mycontainer", "blob_name": "myblob"},
   ) as dag:
       upload = LocalFilesystemToWasbOperator(
           task_id="upload_file", file_path=PATH_TO_UPLOAD_FILE, load_options={"overwrite": True}
       )
   ```
   
   ### Anything else
   
   While a fix is relatively straightforward, feedback on the best approach would be appreciated. There seem to be a few options:
   
   - Update the `WasbHook` to use the `password` connection field instead of the custom `extra__wasb__shared_access_key` field.
     - Suggested by @ashb in a discussion on #19497
     - Obviously some tradeoff between custom connection fields/extras making connection creation (presumably) more straightforward/targeted vs reusing core connection fields such that they take on multiple meanings and perhaps more "burdensome" to create said connection.
     - What does backwards compat look like in this case? Or is it valid to call this out as a "breaking change"?
   - Add "shared_access_key" to `DEFAULT_SENSITIVE_FIELDS` to ensure the shared key value is masked in the connection `extras` when logged
     - Perhaps a bit heavy-handed of a change
   - Update the [logging logic](https://github.com/apache/airflow/blob/387893ae1820ca1553d05146ad9d2baaa0c0a519/airflow/hooks/base.py#L69-L80) in `airflow.hooks.base.get_connection()` to only log what connection ID is being used rather than all of the connection details.
     - Unclear what the user impact would be if folks use these particular details in log analysis or simply find it handy to have in the logs.
     - (Slightly-related side question: why is the logging only written when `host` is provided in a connection?)
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987995535


   Actually I'd do all three options together:
   * deprecate (not remove) shared_access_key and use password if it is not present
   * add shared_acess_key to sensitive fields
   AND 
   * change the logic of logging get_connection() to only use ID (but with the caveat that full connection details could be printed at DEBUG level).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987995535


   Actually I'd do all three options:
   * deprecate (not remove) shared_access_key and use password if it is not present
   * add shared_acess_key to sensitive fiekds
   AND 
   * change the logicof logging get_connection() to only use ID (but with the caveat that full connection details could be printed at DEBUG level).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
josh-fell commented on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987971464


   > @josh-fell why don't we have the same problem with Security Token field in Salesforce connection?
   
   For two reasons:
   1. The string "token" is already included in `DEFAULT_SENSITIVE_FIELDS` in the [secrets masker](https://github.com/josh-fell/airflow/blob/26e3c7f374f6c2c4eaaedb5aa72a5d53b8843258/airflow/utils/log/secrets_masker.py#L34-L49) logic. Any extra with that string in its key would be masked.
   2. The [logging line](https://github.com/apache/airflow/blob/387893ae1820ca1553d05146ad9d2baaa0c0a519/airflow/hooks/base.py#L69-L80) that is creating this issue is _only_ written when `host` is provided in a connection. This brings me to the "intriguing" part: Why is this only logged when `host` is provided? And is it necessary to log all of the details of the connection rather than just what connection ID is used?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
josh-fell commented on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987571197


   @eladkal Any feedback on any of the fix option listed above? I'm inclined to go with Option 1 but this would be a breaking change. I'm not sure when those become justified.  Also Option 3 is intriguing but it does have some questions that come along with and is more widespread in its effect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell edited a comment on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
josh-fell edited a comment on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987571197


   @eladkal Any feedback on any of the fix options listed above? I'm inclined to go with Option 1 but this would be a breaking change. I'm not sure when those become justified.  Also Option 3 is intriguing but it does have some questions that come along with and is more widespread in its effect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987731192


   @josh-fell why don't we have the same problem with Security Token field in Salesforce connection?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987995535


   Actually I'd do all three options:
   * deprecate (not remove) shared_access_key and use password if it is not present
   * add shared_acess_key to sensitive fields
   AND 
   * change the logicof logging get_connection() to only use ID (but with the caveat that full connection details could be printed at DEBUG level).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19883: Credentials present in logs when using Shared Key auth to Azure Blob Storage

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19883:
URL: https://github.com/apache/airflow/issues/19883#issuecomment-987995535


   Actually I'd do all three options:
   * deprecate (not remove) shared_access_key and use password if it is not present
   * add shared_acess_key to sensitive fields
   AND 
   * change the logic of logging get_connection() to only use ID (but with the caveat that full connection details could be printed at DEBUG level).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org