You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/03 00:11:00 UTC

[GitHub] [airflow] turbaszek opened a new pull request #12768: Add extended information about XCom backends in docs

turbaszek opened a new pull request #12768:
URL: https://github.com/apache/airflow/pull/12768


   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12768: Add extended information about XCom backends in docs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12768:
URL: https://github.com/apache/airflow/pull/12768#issuecomment-809824793


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #12768: Add extended information about XCom backends in docs

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #12768:
URL: https://github.com/apache/airflow/pull/12768#issuecomment-737718409


   Have you thought about moving some documentation to a new document? I think this description goes beyond the general concepts of Airflow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #12768: Add extended information about XCom backends in docs

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #12768:
URL: https://github.com/apache/airflow/pull/12768#issuecomment-737790619


   > Have you thought about moving some documentation to a new document? I think this description goes beyond the general concepts of Airflow.
   
   Any suggestions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] casassg commented on a change in pull request #12768: Add extended information about XCom backends in docs

Posted by GitBox <gi...@apache.org>.
casassg commented on a change in pull request #12768:
URL: https://github.com/apache/airflow/pull/12768#discussion_r534617029



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -770,35 +770,131 @@ passed, then a corresponding list of XCom values is returned.
     def pull_function(task_instance):
         value = task_instance.xcom_pull(task_ids='pushing_task')
 
-When specifying arguments that are part of the context, they will be
-automatically passed to the function.
-
 It is also possible to pull XCom directly in a template, here's an example
 of what this may look like:
 
-.. code-block:: jinja
+.. code-block:: python
+
+    "SELECT * FROM {{ task_instance.xcom_pull(task_ids='foo_task', key='table_name') }}"
+
+This can also be done using the ``output`` attribute of an operator. The above example can be then
+simplified:
+
+.. code-block:: python
+
+    f"SELECT * FROM { foo_task.output['table_name'] }"
+
+
+XCom backend
+------------
+
+XCom to work needs a storage where the data can be persisted between tasks execution. The mechanism of
+persisting and retrieving the XCom data is called XCom backend.
+
+Base XCom backend
+~~~~~~~~~~~~~~~~~
+
+Airflow by default uses :class:`~airflow.models.xcom.BaseXCom`. This backend uses the Airflow metadatabase
+as a storage for XCom values. This is most common XCom and it work perfectly well. It's only limitation are:
+- the stored data has to be JSON-serializable (strings, lists, dictionaries, numbers)
+- the size of the data to persist is limited to 48kB
+
+When using this backend you have to remember about those two limitations. Base backend is designed to store
+**metadata**. So if you want to persist larger or more complex objects between tasks you have to store them
+in some other place (for example GCS or S3 buckets) and keep in XCom a reference to those object.
+
+This however, means that if you are often passing complex data between tasks in your DAGs you can end up with
+a lot of boilerplate code of retrieving and persisting the data. That's where you may consider using a custom
+XCom backend.
 
-    SELECT * FROM {{ task_instance.xcom_pull(task_ids='foo', key='table_name') }}
+Custom XCom backends
+~~~~~~~~~~~~~~~~~~~~
 
-Note that XComs are similar to `Variables`_, but are specifically designed
-for inter-task communication rather than global settings.
+A custom XCom backend is an additional layer over the base XCom. It still uses Airflow metadatabase under the
+hood but it allows users to perform some additional actions before saving and after retrieving the data.
 
-Custom XCom backend
--------------------
+To use a custom XCom backend users should configure ``xcom_backend`` parameter in Airflow config. Provided value
+should point to a class that is subclass of :class:`~airflow.models.xcom.BaseXCom` (See :doc:`modules_management` for
+details on how Python and Airflow manage modules).
 
-It is possible to change ``XCom`` behaviour of serialization and deserialization of tasks' result.
-To do this one have to change ``xcom_backend`` parameter in Airflow config. Provided value should point
-to a class that is subclass of :class:`~airflow.models.xcom.BaseXCom`. To alter the serialization /
-deserialization mechanism the custom class should override ``serialize_value`` and ``deserialize_value``
-methods.
+A custom XCom class has to implement the tow following methods:

Review comment:
       ```suggestion
   A custom XCom class has to implement the two following methods:
   ```

##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -770,35 +770,131 @@ passed, then a corresponding list of XCom values is returned.
     def pull_function(task_instance):
         value = task_instance.xcom_pull(task_ids='pushing_task')
 
-When specifying arguments that are part of the context, they will be
-automatically passed to the function.
-
 It is also possible to pull XCom directly in a template, here's an example
 of what this may look like:
 
-.. code-block:: jinja
+.. code-block:: python
+
+    "SELECT * FROM {{ task_instance.xcom_pull(task_ids='foo_task', key='table_name') }}"
+
+This can also be done using the ``output`` attribute of an operator. The above example can be then
+simplified:
+
+.. code-block:: python
+
+    f"SELECT * FROM { foo_task.output['table_name'] }"

Review comment:
       Should we document this ``output`` better somewhere?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #12768: Add extended information about XCom backends in docs

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #12768:
URL: https://github.com/apache/airflow/pull/12768#discussion_r535011924



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -770,35 +770,131 @@ passed, then a corresponding list of XCom values is returned.
     def pull_function(task_instance):
         value = task_instance.xcom_pull(task_ids='pushing_task')
 
-When specifying arguments that are part of the context, they will be
-automatically passed to the function.
-
 It is also possible to pull XCom directly in a template, here's an example
 of what this may look like:
 
-.. code-block:: jinja
+.. code-block:: python
+
+    "SELECT * FROM {{ task_instance.xcom_pull(task_ids='foo_task', key='table_name') }}"
+
+This can also be done using the ``output`` attribute of an operator. The above example can be then
+simplified:
+
+.. code-block:: python
+
+    f"SELECT * FROM { foo_task.output['table_name'] }"

Review comment:
       Yes, we should. I was to tired to write more about XComArgs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #12768: Add extended information about XCom backends in docs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #12768:
URL: https://github.com/apache/airflow/pull/12768


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org