You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/02 03:53:32 UTC

[GitHub] [airflow] YingboWang commented on a change in pull request #5499: [AIRFLOW-3964][AIP-17] Build smart sensor

YingboWang commented on a change in pull request #5499:
URL: https://github.com/apache/airflow/pull/5499#discussion_r464027558



##########
File path: docs/smart-sensor.rst
##########
@@ -0,0 +1,86 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+
+Smart Sensor
+============
+
+The smart sensor is a service which greatly reduces airflow’s infrastructure cost by consolidating
+some of the airflow long running light weight tasks.DAG Serialization and DB Persistence.
+
+.. image:: img/smart_sensor_architecture.png
+
+Instead of using one process for each task, the main idea of the smart sensor service to improve the
+efficiency of these long running tasks is to use centralized processes to execute those tasks in batches.
+
+To do that, we need to run a task in two steps, the first step is to serialize the task information
+into the database; and the second step is to use a few centralized processes to execute the serialized
+tasks in batches.
+
+In this way, we only need a handful of running processes.
+
+.. image:: img/smart_sensor_single_task_execute_flow.png
+
+The smart sensor service is supported in a new mode called “smart sensor mode”. In smart sensor mode,
+instead of holding a long running process for each sensor and poking periodically, a sensor will only
+store poke context at sensor_instance table and then exits with a ‘sensing’ state.
+
+When the smart sensor mode is enabled, a special set of builtin smart sensor DAGs
+(named smart_sensor_group_shard_xxx) is created by the system; These DAGs contain SmartSensorOperator
+task and manage the smart sensor jobs for the airflow cluster. The SmartSensorOperator task can fetch
+hundreds of ‘sensing’ instances from sensor_instance table and poke on behalf of them in batches.
+Users don’t need to change their existing DAGs.
+
+Enable/Disable Smart Sensor
+---------------------------
+
+Updating from a older version might need a schema change. If there is no ``sensor_instance`` table
+in the DB, please make sure to run ``airflow db upgrade``
+
+Add the following settings in the ``airflow.cfg``:
+
+.. code-block::
+
+    [smart_sensor]
+    use_smart_sensor = true
+    shard_code_upper_limit = 10000
+
+    # Users can change the following config based on their requirements
+    shards = 5
+    sensor_enabled = NamedHivePartitionSensor, MetastorePartitionSensor
+
+*   ``use_smart_sensor``: This config indicates if the smart sensor is enabled.
+*   ``shards``: This config indicates the number of concurrently running smart sensor jobs for
+    the airflow cluster.
+*   ``sensor_enabled``: This config is a list of sensor class names that will use the smart sensor.
+    The users use the same class names (e.g. HivePartitionSensor) in their DAGs and they don’t have
+    the control to use smart sensors or not, unless they exclude their tasks explicits.
+
+Enabling/disabling the smart sensor service is a system level configuration change.
+It is transparent to the individual users. Existing DAGs don't need to be changed for
+enabling/disabling the smart sensor. Rotating centralized smart sensor tasks will not
+cause any user’s sensor task failure.
+
+Support new operators in the smart sensor service
+-------------------------------------------------
+
+*   Define ``poke_context_fields`` as class attribute in the operator. ``poke_context_fields``

Review comment:
       Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org