You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/03 15:15:15 UTC
[GitHub] [airflow] francescomucio opened a new pull request, #25509: Possibility to document DAG with a separated markdown file
francescomucio opened a new pull request, #25509:
URL: https://github.com/apache/airflow/pull/25509
This change allows the possibility to pass the path of a markdown file to the `doc_md` argument of a DAG.
closes: https://github.com/apache/airflow/discussions/25396
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937306706
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
I was not able to find the logic used for .sql files in the SQL operators like the [here](https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/operators/sql.py#L802).
While testing I was not able to just pass "my_dag.py" but I had to pass a Path object, like you were mentioning, which is in my opinion more cumbersome and inconsistent (with what we do for .sql files)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937740029
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
Yes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938483844
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
Review Comment:
I trying to trigger a similar error with the PG operator with a query ending with `.sql`
<img width="694" alt="image" src="https://user-images.githubusercontent.com/3058143/183014884-85793dd2-b734-4d57-9f96-7ab9f6558680.png">
Would it make sense to return an error message in the doc, something like:
```python
doc_md = f""""
# Templating Error!
Not able to find the template file: {doc_md}.
"""
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938724381
##########
airflow/models/dag.py:
##########
@@ -543,8 +543,13 @@ def get_doc_md(self, doc_md: str) -> str:
if not doc_md.endswith('.md'):
template = jinja2.Template(doc_md)
else:
- # template = env.loader.get_source(env, doc_md)[0]
- template = env.get_template(doc_md)
+ try:
Review Comment:
Much better :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937874631
##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,8 @@ doc_md markdown
doc_rst reStructuredText
========== ================
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted. For DAGs it can contain a string or the reference to a template file. Template references are recognized by str ending in ‘.md’.
Review Comment:
Add in the documentation, mentioning also the exception
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938724882
##########
airflow/models/dag.py:
##########
@@ -543,8 +543,13 @@ def get_doc_md(self, doc_md: str) -> str:
if not doc_md.endswith('.md'):
template = jinja2.Template(doc_md)
else:
- # template = env.loader.get_source(env, doc_md)[0]
- template = env.get_template(doc_md)
+ try:
Review Comment:
Yeah. I think failing DAG because of typo in doc was just a bit too hard on the user
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937690359
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
I figured out how to render the `.md` file as a Jinja template.
Unfortunately the `doc_md` template at DAG level cannot access Airflow macros because there is no context (like for the templated fields for tasks). As mentioned [here](https://github.com/apache/airflow/discussions/25396#discussioncomment-3280373).
Of course a macro like `{{ ds }}` has no meaning at DAG level, but accessing Airflow Variables or Connection could be useful.
If anyone has an idea on how to add the context, I can try to do it :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1206579174
Yep. Transient error:
ERROR: for trino Cannot start service trino: driver failed programming external connectivity on endpoint airflow-integration-sqlite_trino_1 (fa404eaf4e560f6a678c72d0c3b5c644a988378d3a20a71d488a1c9cca2ede24): Error starting userland proxy: listen tcp4 0.0.0.0:38080: bind: address already in use
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937342671
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
it now works with a dag defined as:
```python
my_dag = DAG(dag_id="sample_dag",
start_date=datetime(2020, 4, 29),
schedule_interval="0 0 * * *",
doc_md='dag.md'
)
```
Because it is handled via Jinja, it should be able to handle templating too :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1206576943
Test fixed, but there is still something red. Do you think it is good to go?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937296010
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
Instead of doing heuristics like this, I wonder if it’s easier to simply _require_ the user to pass in a `Path` object, and do this here:
```python
if isinstance(doc_md, pathlib.Path):
self.doc_md = doc_md.read_text("utf-8")
else:
self.doc_md = doc_md
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937338005
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
found the problem :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25509:
URL: https://github.com/apache/airflow/pull/25509
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938725489
##########
airflow/models/dag.py:
##########
@@ -543,8 +543,13 @@ def get_doc_md(self, doc_md: str) -> str:
if not doc_md.endswith('.md'):
template = jinja2.Template(doc_md)
else:
- # template = env.loader.get_source(env, doc_md)[0]
- template = env.get_template(doc_md)
+ try:
Review Comment:
And it's perfectly OK to simply display the error as the documentation :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937726473
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
Yes, I was thinking about something like that, but I have no idea how big would be the effort and the ROI of it :)
I would say, let's see if anyone else in the community is interested in such kind of feature
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937998040
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
Review Comment:
Yep. Not nice if yoy really wanted your documentation end with ".md" :D. I think I'd prefer in this case to continue running but simply to use the "your documentation,md" as the content of the doc. I don't think failing DAG parsing in this case is justified.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937819305
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
Review Comment:
sure, changed
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
+ template = jinja2.Template(doc_md)
+ else:
+ # template = env.loader.get_source(env, doc_md)[0]
+ template = env.get_template(doc_md)
+
+ return template.render()
Review Comment:
removed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937819635
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
+ template = jinja2.Template(doc_md)
+ else:
+ # template = env.loader.get_source(env, doc_md)[0]
+ template = env.get_template(doc_md)
+
+ return template.render()
+
+ return template.render()
+ # return pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
removed as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937738270
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
+ template = jinja2.Template(doc_md)
+ else:
+ # template = env.loader.get_source(env, doc_md)[0]
+ template = env.get_template(doc_md)
+
+ return template.render()
Review Comment:
Double render.
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
+ template = jinja2.Template(doc_md)
+ else:
+ # template = env.loader.get_source(env, doc_md)[0]
+ template = env.get_template(doc_md)
+
+ return template.render()
+
+ return template.render()
+ # return pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
You can remove the comment :)
##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,8 @@ doc_md markdown
doc_rst reStructuredText
========== ================
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted. For DAGs it can contain a string or the reference to a template file. Template references are recognized by str ending in ‘.md’.
Review Comment:
Also the file must exist.
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
Review Comment:
```suggestion
if not doc_md.endswith('.md'):
```
I think we also nee to handle the case (and add tests). where the file is missing (and treat it is document). I can easily imagine a case when somoene puts a description:
"And then you might also see more in this fancy document.md")
Which would trigger a not-nice exception.
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
Review Comment:
```suggestion
env = self.get_template_env(force_sandboxed=True)
```
I think we should be careful here and use sandboxed environment (unless there is a reason we should not).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937457970
##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,7 @@ doc_md markdown
doc_rst reStructuredText
========== ================
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted and it can also contain the path of a markdown file containing the DAG's documentation.
Review Comment:
This should also document where the Markdown file would be found (i.e. if a relative path is supplied, where would the path be relative to?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937686022
##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,7 @@ doc_md markdown
doc_rst reStructuredText
========== ================
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted and it can also contain the path of a markdown file containing the DAG's documentation.
Review Comment:
Rewrote it, similarly to the `sql` param in the SQL operators, also referencing the relative path
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937708928
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
we can't have context because the moment we read it, the context does not exist yet.. We could potentially add a specially crafted context with only a few things that we do have access (like dag).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1205198904
Added two tests, one for rendering `doc_md` with jinja and the second to read and render an external `.md` file. Hopefully I did it right :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938483844
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
Review Comment:
I trying to trigger a similar error with the PG operator with a query ending with `.sql`
<img width="694" alt="image" src="https://user-images.githubusercontent.com/3058143/183014884-85793dd2-b734-4d57-9f96-7ab9f6558680.png">
Would it make sense to return an error message in the doc, something like:
doc_md = f""""
# Templating Error!
Not able to find the template file: {doc_md}.
"""
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1206406765
static checks to fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937822111
##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
-
self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
self.is_paused_upon_creation = is_paused_upon_creation
self.jinja_environment_kwargs = jinja_environment_kwargs
self.render_template_as_native_obj = render_template_as_native_obj
+
+ self.doc_md = self.get_doc_md(doc_md)
+
self.tags = tags or []
self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
+ def get_doc_md(self, doc_md: str) -> str:
+ if doc_md is None:
+ return doc_md
+
+ env = self.get_template_env(force_sandboxed=False)
+
+ if not doc_md.endswith('.md'):
Review Comment:
it will trigger a `jinja2.exceptions.TemplateNotFound:` from Jinja followed by the name of the file :)
As you can see [here](https://github.com/apache/airflow/runs/7672321434?check_suite_focus=true#step:10:10794)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file
Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937334817
##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
self.has_on_success_callback = self.on_success_callback is not None
self.has_on_failure_callback = self.on_failure_callback is not None
- self.doc_md = doc_md
+ self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md
Review Comment:
Uhm... it looks like it is a jinja2 feature and trying to get_template_env with something like this:
```python
def get_doc_md(self, doc_md: str) -> str:
if doc_md is None or not doc_md.endswith('.md'):
return doc_md
env = self.get_template_env(force_sandboxed=False)
return env.loader.get_source(env, doc_md)
```
gives me a weird error
```
Broken DAG: [/files/dags/dag.py] Traceback (most recent call last):
File "/opt/airflow/airflow/serialization/serialized_objects.py", line 286, in validate_schema
cls._json_schema.validate(serialized_obj)
File "/usr/local/lib/python3.7/site-packages/jsonschema/validators.py", line 353, in validate
raise error
jsonschema.exceptions.ValidationError:
```
Followed by the content of my `.md` file
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org