You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/10/09 13:54:37 UTC

[GitHub] [airflow] mik-laj commented on a change in pull request #6247: [AIRFLOW-5588] Add Celery's architecture diagram

mik-laj commented on a change in pull request #6247: [AIRFLOW-5588] Add Celery's architecture diagram
URL: https://github.com/apache/airflow/pull/6247#discussion_r333026452
 
 

 ##########
 File path: docs/executor/celery.rst
 ##########
 @@ -72,3 +72,74 @@ Some caveats:
 - Make sure to set a visibility timeout in [celery_broker_transport_options] that exceeds the ETA of your longest running task
 - Tasks can consume resources. Make sure your worker has enough resources to run ``worker_concurrency`` tasks
 - Queue names are limited to 256 characters, but each broker backend might have its own restrictions
+
+Architecture
+------------
+
+.. graphviz::
+
+    digraph A{
+        rankdir="TB"
+        node[shape="rectangle", style="rounded"]
+
+
+        subgraph cluster {
+            label="Cluster";
+            {rank = same; dag; database}
+            {rank = same; workers; scheduler; web}
+
+            workers[label="Workers"]
+            scheduler[label="Scheduler"]
+            web[label="Web server"]
+            database[label="Database"]
+            dag[label="DAG files"]
+
+            subgraph cluster_queue {
+                label="Queue";
+                {rank = same; queue_broker; queue_result_backend}
+                queue_broker[label="Queue broker"]
+                queue_result_backend[label="Result backend"]
+            }
+
+            scheduler->workers[label="1"]
+            web->database[label="2"]
+            web->dag[label="3"]
+
+            workers->database[label="4"]
+            workers->dag[label="5"]
+            workers->queue_result_backend[label="6"]
+            workers->queue_broker[label="7"]
+
+            scheduler->database[label="8"]
+            scheduler->dag[label="9"]
+            scheduler->queue_result_backend[label="10"]
+            scheduler->queue_broker[label="11"]
+        }
+    }
+
+Airflow consist of several components:
+
+* **Workers** - Execute the assigned tasks
+* **Scheduler** - Responsible for adding the necessary tasks to the queue
+* **Web server** - Server HTTP provides access to DAG/task status information
+* **Database** - Contains information about the status of tasks, DAGs, Variables, connections, etc.
+* **Queue** - Queue mechanism provided by Celery
+
+Please note that the queue at Celery consists of two components:
+
+* **Broker** - Stores commands for execution
+* **Result backend** - Stores status of completed command
+
+The components communicate with each other in many places
+
+* [1] **Scheduler** --> **Workers** - Fetchs task execution logs
 
 Review comment:
   I checked it out. This is true. Webserver fetches logs.
   https://github.com/apache/airflow/blob/d719e1f/airflow/www/views.py#L554-L572
   https://github.com/apache/airflow/blob/d719e1fd6705a93a0dfefef4b46478ade5e006ea/airflow/utils/log/file_task_handler.py#L110-L132 
   I updated tihis PR.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services