You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jurovee (via GitHub)" <gi...@apache.org> on 2023/03/11 13:23:45 UTC
[GitHub] [airflow] jurovee opened a new issue, #30039: Sensitive variable not masked in task logs when named with _ENCODED suffix
jurovee opened a new issue, #30039:
URL: https://github.com/apache/airflow/issues/30039
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
**Airflow 2.4.3**
Sensitive variables with a name like **ACCOUNT_PASSWORD_ENCODED** (for url-encoded versions of passwords) are not being masked properly in task logs or rendered templates.
Each of these variables have in our case their counterparts of name **ACCOUNT_PASSWORD** and these are masked **without any issues**.
`AIRFLOW__CORE__HIDE_SENSITIVE_VAR_CONN_FIELDS` is set to **True** and I also tried to add custom field "encoded" or "password_encoded" or "PASSWORD_ENCODED" to `AIRFLOW__CORE__SENSITIVE_VAR_CONN_NAMES`, e.g.:
`AIRFLOW__CORE__SENSITIVE_VAR_CONN_NAMES: "encoded,password_encoded"`
No impact on masking unfortunately.
I also tried to run `airflow.utils.log.secrets_masker.should_hide_value_for_key('ACCOUNT_PASSWORD_ENCODED')` from Airflow container and it results in True, so no idea why it's not getting hidden.
Could it be related to `%` characters in the variable value or something?
### What you think should happen instead
Sensitive variables with a name like **ACCOUNT_PASSWORD_ENCODED** (for url-encoded versions of passwords) should be masked in Airflow logs or rendered templates as they contain a "magic" substring **PASSWORD**.
### How to reproduce
Create a variable named **SOMETHING_PASSWORD_ENCODED** in your Airflow instance and try to use it in some task, e.g. BashOperator command echo {SOMETHING_PASSWORD_ENCODED}. Similarly create a variable without **_ENCODED** suffix and do the same. The first one is not being masked, the second one is.
### Operating System
K8S Debian 10 Linux Container
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other 3rd-party Helm chart
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jurovee commented on issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "jurovee (via GitHub)" <gi...@apache.org>.
jurovee commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1465285604
That works indeed. I am a bit confused, if the sensitive value is present in a "sensitive" variable, using the value itself in any form, e.g. printing it - should mask it either way?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1465289065
Closing it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jurovee commented on issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "jurovee (via GitHub)" <gi...@apache.org>.
jurovee commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1465291994
Got it, was under false and noob impression that Airflow webserver just somehow sees a string (in logs for example) and if it's contained in a sensitive variable value it will automatically hide it somehow, well it's a bit more complicated indeed ;) thanks both for clarifying. Gonna update our codebase accordingly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jurovee commented on issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "jurovee (via GitHub)" <gi...@apache.org>.
jurovee commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1465282119
@hussein-awala I just checked on 2.5.1:
```
>>> from airflow.models import Variable
>>> Variable.get('ABC_PASSWORD')
'w7.%40jp%295%24KCEvrR~'
```
BashOperator Task
command: `echo 'hello, password is w7.%40jp%295%24KCEvrR~'`
Airflow logs from the task:
```
[2023-03-12, 20:36:12 CET] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', "echo 'hello, password is w7.%40jp%295%24KCEvrR~'"]
[2023-03-12, 20:36:12 CET] {subprocess.py:86} INFO - Output:
[2023-03-12, 20:36:12 CET] {subprocess.py:93} INFO - hello, password is w7.%40jp%295%24KCEvrR~
[2023-03-12, 20:36:12 CET] {subprocess.py:97} INFO - Command exited with return code 0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jurovee commented on issue #30039: Sensitive variable not masked in task logs when named with _ENCODED suffix
Posted by "jurovee (via GitHub)" <gi...@apache.org>.
jurovee commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1464933892
@hussein-awala I'll try do that on Monday sure, but digging into it more, it just seems to me it's somehow related to specific value of a variable, not really the name. Can you please also check with a value being a URL-encoded string? E.g. URL-encoded string `w7.%40jp%295%24KCEvrR~` from some random string I've just generated: `w7.@jp)5$KCEvrR~`. Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1465288940
No. How would you want to do it ? You would have to not only return the value but also remember that it was retrieved from. A sensitively named variable. Once you retrieve it, it looses the 'source association'.
You would have to always send the variable together with some metadata that would tell the provenience of the string and that would have to be implemented at the level of your code to verify the metadata before printing.
There is no 'transparent' way where it can be handled - the best we can do is when this is a code which w can check with JiNJa before it gets Interpreted.
Probably it could be done using some super arcane methods (with a lot of performance overhead - where you would store retrieved variables and metadata about them but that would be terribly slow and complex and like it would not be possible to catch all usages of such retrieved value.
But if you would like to attempt to make such an exercise - feel free to open PR :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #30039: Sensitive variable not masked in task logs when named with _ENCODED suffix
Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1464935343
I just checked with these two values, and they are masked in the log.
I let you test with Airflow 2.5.1, then confirm that it works or provide some new values to reproduce the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1465283924
This is normal. When you load the variable via the method `Variable.get`, you will get a python string, then when you use it in the operator, Airflow considers it as a normal string.
Could you try with jinja templating?
```python
BashOperator(
task_id="bash",
bash_command="echo `{{ var.value.get('ABC_PASSWORD') }}`"
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #30039: Sensitive variable not masked in task logs when named with _ENCODED suffix
Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on issue #30039:
URL: https://github.com/apache/airflow/issues/30039#issuecomment-1464929321
I cannot reproduce it with Airflow 2.5.1, can you try to upgrade the latest version and check if it works?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #30039: Sensitive variable not masked in task logs when containing URL encoded string
URL: https://github.com/apache/airflow/issues/30039
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org