You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "KadeRyu (via GitHub)" <gi...@apache.org> on 2023/09/01 05:35:15 UTC

[GitHub] [airflow] KadeRyu opened a new issue, #33993: Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables)

KadeRyu opened a new issue, #33993:
URL: https://github.com/apache/airflow/issues/33993

   ### What do you see as an issue?
   
   In the [Airflow Best Practices article](https://airflow.apache.org/docs/apache-airflow/2.5.3/best-practices.html#top-level-python-code), I found a paragraph that was confusing to readers:
   
   ---
   Bad example:
   
   ```python
   from airflow.models import Variable
   
   foo_var = Variable.get("foo")  # DON'T DO THAT
   bash_use_variable_bad_1 = BashOperator(
       task_id="bash_use_variable_bad_1", bash_command="echo variable foo=${foo_env}", env={"foo_env": foo_var}
   )
   
   bash_use_variable_bad_2 = BashOperator(
       task_id="bash_use_variable_bad_2",
       bash_command=f"echo variable foo=${Variable.get('foo')}",  # DON'T DO THAT
   )
   
   bash_use_variable_bad_3 = BashOperator(
       task_id="bash_use_variable_bad_3",
       bash_command="echo variable foo=${foo_env}",
       env={"foo_env": Variable.get("foo")},  # DON'T DO THAT
   )
   ```
   
   Good example:
   ```python
   bash_use_variable_good = BashOperator(
       task_id="bash_use_variable_good",
       bash_command="echo variable foo=${foo_env}",
       env={"foo_env": "{{ var.value.get('foo') }}"},
   )
   ```
   ```python
   @task
   def my_task():
       var = Variable.get("foo")  # this is fine, because func my_task called only run task, not scan DAGs.
       print(var)
   ```
   ---
   
   In the "Bad Example" section, it states that `Variable.get("foo")` should not be specified in an Operator. Instead, it recommends calling it using a Jinja template, as described in the "Good Example" section.
   However, it explains that it is possible to call it in the `@task` operator. This might make it seem like users who want to call the `Variable.get()` method shouldn't do so in a traditional Operator.
   
   Secondly, to simplify the code, the Operators are not in the DAG and are not indented, so there is room for interpretation that the `BashOperators` in the bad example are outside the DAG (top-level code), and therefore the `Variable.get()` method is also top-level.
   
   
   ### Solving the problem
   
   I think some sentences should be added to explain why `Variable.get()` methods called from inside an Operator are a bad example.
   
   I don't know enough about the logic that the scheduler is parsing to make a precise suggestion, but I think there are cases like this
   
   - If there is no case where you need to use the `Variable.get()` method inside an Operator, and using a Jinja template is absolutely fine: add to the documentation why calling `Variable.get()` inside an Operator is recognized as top-level code, and calling `Variable.get()` inside a `@task` operator is fine.
   - When it's okay to use the `Variable.get()` method inside an Operator: add appropriate examples to the documentation
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #33993: Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables)

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #33993:
URL: https://github.com/apache/airflow/issues/33993#issuecomment-1702197853

   Thank you for trying to improve the docs!
   However I am not sure what is missing in the explnation?
   
   >I think some sentences should be added to explain why Variable.get() methods called from inside an Operator are a bad example.
   
   But it's OK to use `Variable.get()` in operator executed code like python callable of PythonOperator.
   Users should avoid using it in parts that are parsed by the scheduler.
   
   Could you check https://stackoverflow.com/a/71758614/14624409 if that clarify the issue we can work on modifying the docs to reflect it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] KadeRyu commented on issue #33993: Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables)

Posted by "KadeRyu (via GitHub)" <gi...@apache.org>.
KadeRyu commented on issue #33993:
URL: https://github.com/apache/airflow/issues/33993#issuecomment-1702203244

   Hi @eladkal! thanks for your information.
   As you said it's okay to use `Variable.get()` method in any operator, I wonder why below examples are bad:
   
   ```python
   bash_use_variable_bad_2 = BashOperator(
       task_id="bash_use_variable_bad_2",
       bash_command=f"echo variable foo=${Variable.get('foo')}",  # DON'T DO THAT
   )
   
   bash_use_variable_bad_3 = BashOperator(
       task_id="bash_use_variable_bad_3",
       bash_command="echo variable foo=${foo_env}",
       env={"foo_env": Variable.get("foo")},  # DON'T DO THAT
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #33993: Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables)

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #33993:
URL: https://github.com/apache/airflow/issues/33993#issuecomment-1702190560

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #33993: Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables)

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal closed issue #33993: Confusing paragraph in Airflow Bestpractice documentaion (Airflow Variables)
URL: https://github.com/apache/airflow/issues/33993


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org