You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/18 16:26:24 UTC

[GitHub] [airflow] frankbreetz opened a new issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

frankbreetz opened a new issue #17686:
URL: https://github.com/apache/airflow/issues/17686


   <!--
   Welcome to Apache Airflow!
   
   Please complete the next sections or the issue will be closed.
   -->
   
   **Apache Airflow version**:2.1.1
   
   <!-- AIRFLOW VERSION IS MANDATORY -->
   
   **OS**:Debian GNU/Linux 10
   
   <!-- MANDATORY! You can get it via `cat /etc/oss-release` for example -->
   
   **Apache Airflow Provider versions**: 
   
   <!-- You can use `pip freeze | grep apache-airflow-providers` (you can leave only relevant ones)-->
   
   **Deployment**:Astronomer
   
   <!-- e.g. Virtualenv / VM / Docker-compose / K8S / Helm Chart / Managed Airflow Service -->
   
   <!-- Please include your deployment tools and versions: docker-compose, k8s, helm, etc -->
   
   **What happened**:
   When Using the TaskFlow API To implement XComms, you can get a unwanted dependency created in the UI
   <details><summary> XCOMM DAG  using TaskFlow API</summary>
   
   ```
   from datetime import datetime, timedelta
   from airflow.decorators import dag
   from airflow.decorators import task
   
   def get_run_date():
       return '01-01-2021'
   
   default_args = {
       'owner': 'airflow',
       'depends_on_past': False,
       'email_on_failure': False,
       'email_on_retry': False,
       'retries': 1,
       'retry_delay': timedelta(minutes=5)
   }
   @dag (default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(date):
           print(f"xcomm_task2:{date}")
       @task()
       def xcomm_task3(date):
           print(f"xcomm_task3:{date}")
   
       set_date=set_date()
       task2=xcomm_task2(set_date)
       task3=xcomm_task3(set_date)
       set_date>>task2>>task3
   
   xcomm_taskflow_dag=xcomm_taskflow_dag()
   ```
   
   </details>
   
   Because both downstream tasks use the varible created by the first task there is a direct connection from each of the task to the first task. 
   
   ![Screen Shot 2021-08-18 at 10 06 11 AM](https://user-images.githubusercontent.com/24467723/129924392-ed364261-6bd9-4597-b9b6-5be53d62ab6c.png)
   
   This also duplicates tasks in the Tree View (xcomm_task3)
   ![Screen Shot 2021-08-18 at 10 08 42 AM](https://user-images.githubusercontent.com/24467723/129924405-496e4688-81b3-43df-98ef-3210733374e1.png)
   
   This is a simple example, but one can imagine it could become quite unwieldy with complex DAGs
   
   <!-- Please include exact error messages if you can -->
   
   **What you expected to happen**:
   If we create an XCOMM without using the TaskFlow API, there is no unwanted dependency created, just the one explicitly stated using the bit shift operators 
   
   <details><summary> XCOMM DAG without  Using TaskFlow API</summary>
   
   
   ```
   from airflow.operators.python_operator import PythonOperator
   from datetime import datetime, timedelta
   from airflow.decorators import dag
   
   def get_run_date(ti):
       date= '01-01-2021'
       ti.xcom_push(key="date",value=date)
   def var_fun2(ti):
       date=ti.xcom_pull(key="date",task_ids="get_date")
       print(date)
   
   def var_fun3(ti):
       date=ti.xcom_pull(key="date",task_ids="get_date")
       print(date)
   
   default_args = {
       'owner': 'airflow',
       'depends_on_past': False,
       'email_on_failure': False,
       'email_on_retry': False,
       'retries': 1,
       'retry_delay': timedelta(minutes=5)
   }
   
   @dag (default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_dag():
       set_date=PythonOperator(
           task_id="get_date",
           python_callable=get_run_date,
           provide_context=True
       )
   
       var_task2= PythonOperator(
           task_id="xcomm_task2",
           python_callable = var_fun2,
           provide_context=True
       )
   
       var_task3= PythonOperator(
           task_id="xcomm_task3",
           python_callable = var_fun3,
           provide_context=True
       )
   
       set_date >> var_task2 >> var_task3
   
   xcomm_dag=xcomm_dag()
   ```
   
   </details>
   No there is no drect connection to downstream tasks in the Graph View
   
   ![Screen Shot 2021-08-18 at 10 14 13 AM](https://user-images.githubusercontent.com/24467723/129924659-fd7e9172-e5ad-40d4-bfe9-f5e8bcab21a5.png)
   
   There are no duplicated tasks in the Tree View
   
   ![Screen Shot 2021-08-18 at 10 14 31 AM](https://user-images.githubusercontent.com/24467723/129924667-f52bf04c-5271-4444-aeb2-9ed6fcdd941e.png)
   
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   Upload the Included DAGs and View in the UI
   <!--
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   You can include images/screen-casts etc. by drag-dropping the image here.
   -->
   
   **Anything else we need to know**: I understand how this could be designed this way as there is not only a downstream dependency, but direct dependency between the first DAG and downstream DAG. The old version did not create this dependency and the new TaskFlow API does. This may be feature request along the lines of Config setting to disable downstream dependencies or not create a dependency based of the TaskFlow API, or at least not have them shown in the UI. It would be nice to have clearer code like in the TaskFlow API example and also clearer UI like in the non-TaskFlow API example.
   
   <!--
   How often does this problem occur? Once? Every time etc?
   Any relevant logs to include? Put them here inside fenced
   ``` ``` blocks or inside a foldable details tag if it's long:
   <details><summary>x.log</summary> lots of stuff </details>
   -->
   
   **Are you willing to submit a PR?**
   
   
   <!---
   This is absolutely not required, but we are happy to guide you in contribution process
   especially if you already have a good understanding of how to implement the fix.
   Airflow is a community-managed project and we love to bring new contributors in.
   Find us in #airflow-how-to-pr on Slack!
    -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931663628


   We need to improve the documentation though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931829369


   I believe that the culprit for this bug is the `set_xcomargs_dependencies` fn call in the `apply_defaults` function of the `BaseOperator` metaclass.  Experimenting with this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer edited a comment on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer edited a comment on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931800470


   I'm still inclined to call this a bug @ephraimbuddy.  Here's why.  I believe that all of the methods to define this should have the same behavior.
   
   ```
   date = set_date()
   task2 = xcomm_task2(date)
   task3 = xcomm_task3(date)
   date >> task2 >> task3
   ```
   
   should output the same DAG as 
   
   ```
   @dag(default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task2:{date}")
   
       @task()
       def xcomm_task3(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task3:{date}")
       
       set_date() >> xcomm_task2() >> xcomm_task3()
   ```
   which should output the same DAG as 
   ```
   @dag(default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(date):
           print(f"xcomm_task2:{date}")
   
       @task()
       def xcomm_task3(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task3:{date}")
       
       xcomm_task2(set_date()) >> xcomm_task3()
   ```
   
   I'm going to keep digging into this and see what differences there are between the 3 methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-937599461


   We need to document the difference between passing `Xcoms` and creating dependencies between tasks in Taskflow API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-941195338


   > We need to document the difference between passing `Xcoms` and creating dependencies between tasks in Taskflow API
   
   Quite agree  - there are a number of things in the taskflow that are "implicit" or not really documented (I just fixe one of those seeming obvious things about context https://github.com/apache/airflow/pull/18868 which was not obvious at all for new users (or even seasoned committers :D ) . So more docs on how TaskFlow works is really needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy removed a comment on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy removed a comment on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931654295






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer edited a comment on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer edited a comment on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931835280


   Sorry about the comment spam, I just keep figuring out more and more about this issue.  Here's what I've come up with from the research that I've done today.
   
   By doing `xcom_task2(date)` (if we're looking at the previous examples which define the tasks using the taskflow API), there is an implicit dependency being set for this case between `set_date >> xcom_task2`.  This was originally created as a response to this [issue](https://github.com/apache/airflow/issues/8054).  The reason that I don't like this can be traced back to PEP-20, [The Zen of Python](https://www.python.org/dev/peps/pep-0020/).
   
   I believe that it would make more sense within this context to not automatically set this dependency, even if it makes more sense logically to do so.  We know logically that in order to pull an XCOM from a task, that previous task which pushes the XCOM *must* have been run first.  So why not set that dependency automatically?
   
   The main diagram of the DAG and use case in the original post for this ticket gives the answer to that.  By implicitly setting the dependencies with `set_xcomarg_dependencies`, we get strange DAG behavior for more complicated DAGs, and more unnecessary dependencies.
   
   If we change to explicitly setting the dependencies, that would require the user to logically understand the relationship between tasks and XCOMs but also falls more in line with Zen of Python guidelines about explicit vs. implicit.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-941195338


   > We need to document the difference between passing `Xcoms` and creating dependencies between tasks in Taskflow API
   
   Quite agree  - there are a number of things in the taskflow that are "implicit" or not really documented (I just fixe one of those seeming obvious things about context https://github.com/apache/airflow/pull/18868 which was not obvious at all for new users (or even seasoned committers :D ) . So more docs on how TaskFlow works is really needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931729032


   Another way that will also work:
   
   ```python
   
   @dag(default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(date):
           print(f"xcomm_task2:{date}")
   
       @task()
       def xcomm_task3(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task3:{date}")
       
       xcomm_task2(set_date()) >> xcomm_task3()
   
   xcomm_taskflow_dag=xcomm_taskflow_dag()
   ```
   I think the documentation need to be improved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-1014955770


   Yes I am, should have more time this coming week to get in a PR.  Thanks for your patience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-939178504


   Can someone assign me to this?  TY!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-941196000


   In progress right now!  Actively writing as we speak @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer edited a comment on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer edited a comment on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931809241


   I believe that no matter how we define the DAG, and as long as the code is valid, it should deterministically create the same result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931654295


   The issue is the way you are defining the relationship.
   This should work: 
   ```python 
       date = set_date()
       xcomm_task2(date)>>xcomm_task3(date)
   ```
   Because xcomm_task2 needs data from set_date same as xcomm_task3
   
   Anytime you call a task decorated function, you create a task. 
   The code below will create two different tasks from set_date:
   ```python 
       xcomm_task2(set_date())>>xcomm_task3(set_date())
   ```
   That's why you have to assign the call to `set_date()` to a variable and pass the data around so as to create the dependency that you want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-1014920662


   @alex-astronomer are you still working on this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931725335


   I made a bunch of mistakes trying to make this work until I looked at the non-taskflow example.
   This should work for taskflow:
   
   ```python
   
   from datetime import datetime, timedelta
   from airflow.decorators import dag
   from airflow.decorators import task
   
   def get_run_date():
       return '01-01-2021'
   
   default_args = {
       'owner': 'airflow',
       'depends_on_past': False,
       'email_on_failure': False,
       'email_on_retry': False,
       'retries': 1,
       'retry_delay': timedelta(minutes=5)
   }
   @dag(default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task2:{date}")
   
       @task()
       def xcomm_task3(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task3:{date}")
       
       set_date() >> xcomm_task2() >> xcomm_task3()
   
   xcomm_taskflow_dag=xcomm_taskflow_dag()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-932034793


   I have thought about this and I think it's ok as it is just that we currently don't know how to use it.
   
   If we understand that calling a task decorated function in airflow produces an XComArg then it's not implicit that the relationship is set as is.
   
   From the PythonOperator code, you could see that xcoms were passed explicitly, but in the TaskFlow example, because we didn't know we could pass xcom explicitly as we did in the PythonOperator, we created extra tasks while setting up the dependencies.
   
   Anytime we call a task decorated function gives us an XComArg. It needs documentation especially in passing xcom as users think they can call the task decorated function many times and it remains the same task.
   
   Where possible, we should use `xcom_task2(xcom_task())` which creates the implicit relationship. However, we can pass **kwargs to a task function that has contexts that we can refer to and pull xcoms.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-901253973


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-937599461


   We need to document the difference between passing `Xcoms` and creating dependencies between tasks in Taskflow API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-901253973


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-941196000


   In progress right now!  Actively writing as we speak @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931800470


   I'm still inclined to call this a bug @ephraimbuddy.  Here's why.  I believe that all of the methods to define this should have the same behavior.
   
   ```
   date = set_date()
   task2 = xcomm_task2(date)
   task3 = xcomm_task3(date)
   date >> task2 >> task3
   ```
   
   should output the same DAG as 
   
   ```
   @dag(default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task2:{date}")
   
       @task()
       def xcomm_task3(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task3:{date}")
       
       set_date() >> xcomm_task2() >> xcomm_task3()
   ```
   which should output the same DAG as 
   ```
   @dag(default_args=default_args,
         schedule_interval=None,
           start_date=datetime(2021, 8, 1),
          dagrun_timeout=timedelta(hours=4),
          max_active_runs=1)
   def xcomm_taskflow_dag():
       @task()
       def set_date():
           date ="1-2-21"
           return date
       @task()
       def xcomm_task2(date):
           print(f"xcomm_task2:{date}")
   
       @task()
       def xcomm_task3(**kwargs):
           date=kwargs['ti'].xcom_pull(task_ids="set_date")
           print(f"xcomm_task3:{date}")
       
       xcomm_task2(set_date()) >> xcomm_task3()
   ```
   and also 
   ```
   date = set_date()
   xcomm_task2(date) >> xcomm_task3(date)
   ```
   
   I'm going to keep digging into this and see what differences there are between the 3 methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931809241


   I believe that no matter how we define the DAG, it should deterministically create the same result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931424494


   I'm going to take a look at this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #17686: Xcomms with Taskflow API cause unwanted dependencies in UI

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #17686:
URL: https://github.com/apache/airflow/issues/17686#issuecomment-931835280


   Sorry about the comment spam, I just keep figuring out more and more about this issue.  Here's what I've come up with from the research that I've done today.
   
   By doing `xcom_task2(date)` (if we're looking at the previous examples which define the tasks using the taskflow API), there is an implicit dependency being set for this case between `set_date >> xcom_task2`.  This was originally created as a response to this [issue](https://github.com/apache/airflow/issues/8054).  The reason that I don't like this can be traced back to PEP-20, [The Zen of Python](https://www.python.org/dev/peps/pep-0020/).
   
   I believe that it would make more sense within this context to not automatically set this dependency, even if it makes more sense logically to do so.  We know logically that in order to pull an XCOM from a task, that previous task which pushes the XCOM *must* have been run first.  So why not set that dependency automatically?
   
   The main diagram of the DAG and use case in the original post for this ticket gives the answer to that.  By implicitly setting the dependencies with `set_xcomarg_dependencies`, we get strange DAG behavior for more complicated DAGs, and more unnecessary dependencies.
   
   If we stick to explicitly setting the dependencies, that would require the user to logically understand the relationship between tasks and XCOMs but also falls more in line with Zen of Python guidelines about explicit vs. implicit.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org