You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/03 15:15:15 UTC

[GitHub] [airflow] francescomucio opened a new pull request, #25509: Possibility to document DAG with a separated markdown file

francescomucio opened a new pull request, #25509:
URL: https://github.com/apache/airflow/pull/25509

   This change allows the possibility to pass the path of a markdown file to the `doc_md` argument of a DAG.
   
   closes: https://github.com/apache/airflow/discussions/25396
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937306706


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   I was not able to find the logic used for .sql files in the SQL operators like the [here](https://github.com/apache/airflow/blob/main/airflow/providers/common/sql/operators/sql.py#L802).
   
   While testing I was not able to just pass "my_dag.py" but I had to pass a Path object, like you were mentioning, which is in my opinion more cumbersome and inconsistent (with what we do for .sql files)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937740029


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938483844


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):

Review Comment:
   I trying to trigger a similar error with the PG operator with a query ending with `.sql`
   <img width="694" alt="image" src="https://user-images.githubusercontent.com/3058143/183014884-85793dd2-b734-4d57-9f96-7ab9f6558680.png">
   
   Would it make sense to return an error message in the doc, something like:
   
   ```python
   doc_md = f""""
   # Templating Error!
   Not able to find the template file: {doc_md}.
   """
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938724381


##########
airflow/models/dag.py:
##########
@@ -543,8 +543,13 @@ def get_doc_md(self, doc_md: str) -> str:
         if not doc_md.endswith('.md'):
             template = jinja2.Template(doc_md)
         else:
-            # template = env.loader.get_source(env, doc_md)[0]
-            template = env.get_template(doc_md)
+            try:

Review Comment:
   Much better :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937874631


##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,8 @@ doc_md      markdown
 doc_rst     reStructuredText
 ==========  ================
 
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted. For DAGs it can contain a string or the reference to a template file. Template references are recognized by str ending in ‘.md’.

Review Comment:
   Add in the documentation, mentioning also the exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938724882


##########
airflow/models/dag.py:
##########
@@ -543,8 +543,13 @@ def get_doc_md(self, doc_md: str) -> str:
         if not doc_md.endswith('.md'):
             template = jinja2.Template(doc_md)
         else:
-            # template = env.loader.get_source(env, doc_md)[0]
-            template = env.get_template(doc_md)
+            try:

Review Comment:
   Yeah. I think failing DAG because of typo in doc was just a bit too hard on the user



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937690359


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   I figured out how to render the `.md` file as a Jinja template.
   
   Unfortunately the `doc_md` template at DAG level cannot access Airflow macros because there is no  context (like for the templated fields for tasks). As mentioned [here](https://github.com/apache/airflow/discussions/25396#discussioncomment-3280373).
   
   Of course a macro like `{{ ds }}` has no meaning at DAG level, but accessing Airflow Variables or Connection could be useful.
   
   If anyone has an idea on how to add the context, I can try to do it :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1206579174

   Yep. Transient error:
   
     ERROR: for trino  Cannot start service trino: driver failed programming external connectivity on endpoint airflow-integration-sqlite_trino_1 (fa404eaf4e560f6a678c72d0c3b5c644a988378d3a20a71d488a1c9cca2ede24): Error starting userland proxy: listen tcp4 0.0.0.0:38080: bind: address already in use
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937342671


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   it now works with a dag defined as:
   ```python
   my_dag = DAG(dag_id="sample_dag",
                        start_date=datetime(2020, 4, 29),
                        schedule_interval="0 0 * * *",
                        doc_md='dag.md'
                )
   ```
   
   Because it is handled via Jinja, it should be able to handle templating too :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1206576943

   Test fixed, but there is still something red. Do you think it is good to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937296010


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   Instead of doing heuristics like this, I wonder if it’s easier to simply _require_ the user to pass in a `Path` object, and do this here:
   
   ```python
   if isinstance(doc_md, pathlib.Path):
       self.doc_md = doc_md.read_text("utf-8")
   else:
       self.doc_md = doc_md
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937338005


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   found the problem :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25509:
URL: https://github.com/apache/airflow/pull/25509


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938725489


##########
airflow/models/dag.py:
##########
@@ -543,8 +543,13 @@ def get_doc_md(self, doc_md: str) -> str:
         if not doc_md.endswith('.md'):
             template = jinja2.Template(doc_md)
         else:
-            # template = env.loader.get_source(env, doc_md)[0]
-            template = env.get_template(doc_md)
+            try:

Review Comment:
   And it's perfectly OK to simply display the error as the documentation :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937726473


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   Yes, I was thinking about something like that, but I have no idea how big would be the effort and the ROI of it :)
   
   I would say, let's see if anyone else in the community is interested in such kind of feature



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937998040


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):

Review Comment:
   Yep. Not nice if yoy really wanted your documentation end with ".md" :D. I think I'd prefer in this case to continue running but simply to use the "your documentation,md" as the content of the doc. I don't think failing DAG parsing in this case is justified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937819305


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)

Review Comment:
   sure, changed



##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):
+            template = jinja2.Template(doc_md)
+        else:
+            # template = env.loader.get_source(env, doc_md)[0]
+            template = env.get_template(doc_md)
+
+        return template.render()

Review Comment:
   removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937819635


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):
+            template = jinja2.Template(doc_md)
+        else:
+            # template = env.loader.get_source(env, doc_md)[0]
+            template = env.get_template(doc_md)
+
+        return template.render()
+
+        return template.render()
+        # return pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   removed as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937738270


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):
+            template = jinja2.Template(doc_md)
+        else:
+            # template = env.loader.get_source(env, doc_md)[0]
+            template = env.get_template(doc_md)
+
+        return template.render()

Review Comment:
   Double render.



##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):
+            template = jinja2.Template(doc_md)
+        else:
+            # template = env.loader.get_source(env, doc_md)[0]
+            template = env.get_template(doc_md)
+
+        return template.render()
+
+        return template.render()
+        # return pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   You can remove the comment :)



##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,8 @@ doc_md      markdown
 doc_rst     reStructuredText
 ==========  ================
 
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted. For DAGs it can contain a string or the reference to a template file. Template references are recognized by str ending in ‘.md’.

Review Comment:
   Also the file must exist. 



##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):

Review Comment:
   ```suggestion
           if not doc_md.endswith('.md'):
   ```
   
   I think we also nee to handle the case (and add tests). where the file is missing (and treat it is document). I can easily imagine a case when somoene puts a description:
   
   "And then you might also see more in this fancy document.md") 
   
   Which would trigger a not-nice exception.



##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)

Review Comment:
   ```suggestion
           env = self.get_template_env(force_sandboxed=True)
   ```
   
   I think we should be careful here and use sandboxed environment (unless there is a reason we should not).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937457970


##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,7 @@ doc_md      markdown
 doc_rst     reStructuredText
 ==========  ================
 
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted and it can also contain the path of a markdown file containing the DAG's documentation.

Review Comment:
   This should also document where the Markdown file would be found (i.e. if a relative path is supplied, where would the path be relative to?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937686022


##########
docs/apache-airflow/concepts/dags.rst:
##########
@@ -564,7 +564,7 @@ doc_md      markdown
 doc_rst     reStructuredText
 ==========  ================
 
-Please note that for DAGs, ``doc_md`` is the only attribute interpreted.
+Please note that for DAGs, ``doc_md`` is the only attribute interpreted and it can also contain the path of a markdown file containing the DAG's documentation.

Review Comment:
   Rewrote it, similarly to the `sql` param in the SQL operators, also referencing the relative path



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937708928


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   we can't have context because the moment we read it, the context does not exist yet.. We could potentially add a specially crafted context with only a few things that we do have access (like dag).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1205198904

   Added two tests, one for rendering `doc_md` with jinja and the second to read and render an external `.md` file. Hopefully I did it right :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r938483844


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):

Review Comment:
   I trying to trigger a similar error with the PG operator with a query ending with `.sql`
   <img width="694" alt="image" src="https://user-images.githubusercontent.com/3058143/183014884-85793dd2-b734-4d57-9f96-7ab9f6558680.png">
   
   Would it make sense to return an error message in the doc, something like:
   
   doc_md = f""""
   # Templating Error!
   Not able to find the template file: {doc_md}.
   """



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25509:
URL: https://github.com/apache/airflow/pull/25509#issuecomment-1206406765

   static checks to fix 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937822111


##########
airflow/models/dag.py:
##########
@@ -522,17 +522,35 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
-
         self._access_control = DAG._upgrade_outdated_dag_access_control(access_control)
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
         self.render_template_as_native_obj = render_template_as_native_obj
+
+        self.doc_md = self.get_doc_md(doc_md)
+
         self.tags = tags or []
         self._task_group = TaskGroup.create_root(self)
         self.validate_schedule_and_params()
 
+    def get_doc_md(self, doc_md: str) -> str:
+        if doc_md is None:
+            return doc_md
+
+        env = self.get_template_env(force_sandboxed=False)
+
+        if not doc_md.endswith('.md'):

Review Comment:
   it will trigger a `jinja2.exceptions.TemplateNotFound:` from Jinja followed by the name of the file :)
   
   As you can see [here](https://github.com/apache/airflow/runs/7672321434?check_suite_focus=true#step:10:10794)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] francescomucio commented on a diff in pull request #25509: Possibility to document DAG with a separated markdown file

Posted by GitBox <gi...@apache.org>.
francescomucio commented on code in PR #25509:
URL: https://github.com/apache/airflow/pull/25509#discussion_r937334817


##########
airflow/models/dag.py:
##########
@@ -522,7 +522,7 @@ def __init__(
         self.has_on_success_callback = self.on_success_callback is not None
         self.has_on_failure_callback = self.on_failure_callback is not None
 
-        self.doc_md = doc_md
+        self.doc_md = pathlib.Path(doc_md).read_text() if pathlib.Path(str(doc_md or '')).is_file() else doc_md

Review Comment:
   Uhm... it looks like it is a jinja2 feature and trying to get_template_env with something like this:
   ```python
       def get_doc_md(self, doc_md: str) -> str:
           if doc_md is None or not doc_md.endswith('.md'):
               return doc_md
   
           env = self.get_template_env(force_sandboxed=False)
           
           return env.loader.get_source(env, doc_md)
   ```
    gives me a weird error
    ```
    Broken DAG: [/files/dags/dag.py] Traceback (most recent call last):
     File "/opt/airflow/airflow/serialization/serialized_objects.py", line 286, in validate_schema
       cls._json_schema.validate(serialized_obj)
     File "/usr/local/lib/python3.7/site-packages/jsonschema/validators.py", line 353, in validate
       raise error
   jsonschema.exceptions.ValidationError: 
   ```
   Followed by the content of my `.md` file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org