You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/29 16:36:59 UTC

[GitHub] [airflow] zkan opened a new pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

zkan opened a new pull request #17319:
URL: https://github.com/apache/airflow/pull/17319


   Closes: #14200 
   
   Changes proposed in this pull request:
   
   * Suggest to use secrets backend when variable contains sensitive data
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#discussion_r679636042



##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -109,8 +112,12 @@ or if you need to deserialize a json object from the variable :
 
     {{ var.json.<variable_name> }}
 
-An alternative option is to use environment variables in the top-level python code or use Environment Variables to create and manage Airflow variables. to manage Airflow Variables. This will avoid new connections to Airflow metadata DB every time Airflow parses the python file. For more information, see: :ref:`managing_variables`.
+For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
+for any variable that contains sensitive data.
 
+An alternative option is to use environment variables in the top-level Python code or use environment variables to
+create and manage Airflow variables. This will avoid new connections to Airflow metadata DB every time
+Airflow parses the Python file. For more information, see: :ref:`managing_variables`.

Review comment:
       What does “use environment variables in the top-level Python code” mean here?

##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -91,13 +91,16 @@ Variables
 ---------
 
 You should avoid usage of Variables outside an operator's ``execute()`` method or Jinja templates if possible,
-as Variables create a connection to metadata DB of Airflow to fetch the value, which can slow down parsing and place extra load on the DB.
+as Variables create a connection to metadata DB of Airflow to fetch the value, which can slow down parsing and
+place extra load on the DB.
 
 Airflow parses all the DAGs in the background at a specific period.
-The default period is set using ``processor_poll_interval`` config, which is by default 1 second. During parsing, Airflow creates a new connection to the metadata DB for each DAG.
+The default period is set using ``processor_poll_interval`` config, which is by default 1 second.
+During parsing, Airflow creates a new connection to the metadata DB for each DAG.

Review comment:
       ```suggestion
   The default period is set using the ``processor_poll_interval`` config, which is 1 second by default.
   During parsing, Airflow creates a new connection to the metadata DB for each DAG.
   ```
   
   I think this reads better? It’s not related to this PR, but since we are modifying the line anyway, let’s improve it if we can.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zkan commented on a change in pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
zkan commented on a change in pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#discussion_r680357578



##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -91,13 +91,16 @@ Variables
 ---------
 
 You should avoid usage of Variables outside an operator's ``execute()`` method or Jinja templates if possible,
-as Variables create a connection to metadata DB of Airflow to fetch the value, which can slow down parsing and place extra load on the DB.
+as Variables create a connection to metadata DB of Airflow to fetch the value, which can slow down parsing and
+place extra load on the DB.
 
 Airflow parses all the DAGs in the background at a specific period.
-The default period is set using ``processor_poll_interval`` config, which is by default 1 second. During parsing, Airflow creates a new connection to the metadata DB for each DAG.
+The default period is set using ``processor_poll_interval`` config, which is by default 1 second.
+During parsing, Airflow creates a new connection to the metadata DB for each DAG.

Review comment:
       Yes, that does. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#discussion_r680747249



##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -109,8 +112,12 @@ or if you need to deserialize a json object from the variable :
 
     {{ var.json.<variable_name> }}
 
-An alternative option is to use environment variables in the top-level python code or use Environment Variables to create and manage Airflow variables. to manage Airflow Variables. This will avoid new connections to Airflow metadata DB every time Airflow parses the python file. For more information, see: :ref:`managing_variables`.
+For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
+for any variable that contains sensitive data.
 
+An alternative option is to use environment variables in the top-level Python code or use environment variables to
+create and manage Airflow variables. This will avoid new connections to Airflow metadata DB every time
+Airflow parses the Python file. For more information, see: :ref:`managing_variables`.

Review comment:
       Yeah that makes sense to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zkan commented on a change in pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
zkan commented on a change in pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#discussion_r680366565



##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -109,8 +112,12 @@ or if you need to deserialize a json object from the variable :
 
     {{ var.json.<variable_name> }}
 
-An alternative option is to use environment variables in the top-level python code or use Environment Variables to create and manage Airflow variables. to manage Airflow Variables. This will avoid new connections to Airflow metadata DB every time Airflow parses the python file. For more information, see: :ref:`managing_variables`.
+For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
+for any variable that contains sensitive data.
 
+An alternative option is to use environment variables in the top-level Python code or use environment variables to
+create and manage Airflow variables. This will avoid new connections to Airflow metadata DB every time
+Airflow parses the Python file. For more information, see: :ref:`managing_variables`.

Review comment:
       I think it means accessing the environment variables outside a task, as referred to [this](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code).
   
   What do you think if we keep only this:
   ```suggestion
   An alternative option is to use environment variables to create and manage Airflow variables.
   This avoids new connections to Airflow metadata DB every time Airflow parses the
   Python file. For more information, see: :ref:`managing_variables`.
   ```
   since I think "environment variables in the top-level Python code" might be irrelevant to Airflow Variable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#discussion_r680763645



##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -109,8 +112,12 @@ or if you need to deserialize a json object from the variable :
 
     {{ var.json.<variable_name> }}
 
-An alternative option is to use environment variables in the top-level python code or use Environment Variables to create and manage Airflow variables. to manage Airflow Variables. This will avoid new connections to Airflow metadata DB every time Airflow parses the python file. For more information, see: :ref:`managing_variables`.
+For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
+for any variable that contains sensitive data.
 
+An alternative option is to use environment variables in the top-level Python code or use environment variables to
+create and manage Airflow variables. This will avoid new connections to Airflow metadata DB every time
+Airflow parses the Python file. For more information, see: :ref:`managing_variables`.

Review comment:
       The idea here was that you could control the behaviour of top-level python code by setting "regular" environment variables and reading from those. I think we should generally discourage using Airflow Variables  at the top level of python code in Dag - (regardless if they are defined via ENV Vars or not). I think in some future version of Airflow, where we will convert access to variables into API calls, we will likely detect such access to Airflow Variables during parsing and either warn or block it. 
   
   There is very little sense to use Airflow Variables in top-level code of DAGs- even if you make "requirement" to define them via variables, it's much simpler, explicit and straightforward to simply ... use the variables via os.environment and then they should not need to follow the _VAR convention. 
   
    So I think the proposed change is not what we intended to say here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#issuecomment-891310616


   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zkan commented on a change in pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
zkan commented on a change in pull request #17319:
URL: https://github.com/apache/airflow/pull/17319#discussion_r680971410



##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -109,8 +112,12 @@ or if you need to deserialize a json object from the variable :
 
     {{ var.json.<variable_name> }}
 
-An alternative option is to use environment variables in the top-level python code or use Environment Variables to create and manage Airflow variables. to manage Airflow Variables. This will avoid new connections to Airflow metadata DB every time Airflow parses the python file. For more information, see: :ref:`managing_variables`.
+For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
+for any variable that contains sensitive data.
 
+An alternative option is to use environment variables in the top-level Python code or use environment variables to
+create and manage Airflow variables. This will avoid new connections to Airflow metadata DB every time
+Airflow parses the Python file. For more information, see: :ref:`managing_variables`.

Review comment:
       Yeah, sounds great to me. ;)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17319: Suggest to use secrets backend for variable when it contains sensitive data

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17319:
URL: https://github.com/apache/airflow/pull/17319


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org