You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/27 16:46:16 UTC

[GitHub] [airflow] mjpieters opened a new issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

mjpieters opened a new issue #11893:
URL: https://github.com/apache/airflow/issues/11893


   I was looking into smart sensors, to see if our sensors could be adapted to their use.
   
   Unfortunately, the current implementation has some issues, specifically around the execution and poke contexts:
   
   - The term _execution context_ is used differently from what I expected. It's a number of sensor attributes that are needed to do proper housekeeping around the sensor lifetime, transferred from the original sensor operator to the `SensorInstance` entry so you don't actually have to load the original sensor class each time. But the `BaseOperator.execute()` method is also passed a context, created from `TaskInstance.get_template_context()`, which is passed on verbatim to the `BaseSensorOperator.poke()` method. The term used implies that the smart sensor execution context would be the same context, but it is not.
   
   - The term _poke context_ is confusing because it is used for two purposes: to create an instance of the sensor class inside of the smart sensor, **and** to pass to the `.poke()` method when executing the wrapped sensor.
   
   The latter makes it quite hard to repurpose an existing sensor, one that uses elements from the standard execution context: the object produced by `TaskInstance.get_template_context()`. If I were to add things like, say, `execution_date` to `poke_context_fields`, I also have to update my sensor class `__init__` method to ignore those extra parameters.
   
   Moreover, it makes _no logical sense_ to pass in the same mapping to both construct the instance, and to the poke method? The initialiser can just set the right values on `self`, and poke(...) can just access anything pertinent directly from there.
   
   To me, the execution context is really the _smart sensor context_, and there really needs to be a separate _operator  context_, that differs from the _poke context_. The operator context is what you use to create the sensor task in a DAG, and the poke context captures the dag-run specific information. The _smart sensor_ and _operator_ contexts could be sourced together (it doesn't really matter to the _operator_ that the smart sensor needs to keep some of information separate for its own bookkeeping; just pick up the `poke_interval`, `retries`, etc. entries once you collected the context).
   
   And the poke context, as passed to the `poke` method, should include some standard entries. It's fine if what is 'standard' differs from the template context, as long as it is easy enough for the `poke()` implementation to alter how it gets its information based on what is present. E.g. pass in SensorWork's `dag_id`, `execution_date` and `try_number` fields when running under smart sensor management, and explicitly set a `smart_sensor` indicator in that context or as a separate argument to the `poke()` call.
   
   To give some context: several of my sensors need access to the current dag_run `execution_date`, two access XCOM variables via the `context['ti']` entry.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mjpieters edited a comment on issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

Posted by GitBox <gi...@apache.org>.
mjpieters edited a comment on issue #11893:
URL: https://github.com/apache/airflow/issues/11893#issuecomment-719108383


   I am using the TI primarily to access the `execution_date` and `dag_id` values. 
   
   I was also using it as a gateway to fetch XCom values, but I'm pretty sure that in my sensors this can and should actually be done in `pre_execute` instead (fetching the value just once then storing the information in the context instead of retrieving the value on the each poke).
   
   Note that I am perfectly prepared to accept that to be smart-sensor compatible the implementation may have to put up with a limited subset of functionality. As long as the common scalar values such as `execution_date` are present and for the common cases we don't have to differentiate between regular and smart-sensor modes when implementing the `poke` method I think a shim TI object would be fine.
   
   Has anyone done a survey yet of the sensors that ship with Airflow or other airflow-operators to see what they generally need? Such a survey would be really helpful in determining what the least common denominator could look like.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] YingboWang commented on issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

Posted by GitBox <gi...@apache.org>.
YingboWang commented on issue #11893:
URL: https://github.com/apache/airflow/issues/11893#issuecomment-718152888


   Thanks @mjpieters again for raising the problem. I agree on your analysis of the context and would like to fix based on this. As `ti` object is more difficult to be serialized, in the smart sensor we tried to use the context dictionary as the simplified version of `ti.__dict__` and only persist the minimum set of information. Would you like to provide more details on the `ti` use case in a sensor? Is an orm `ti` able to satisfy your case? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #11893:
URL: https://github.com/apache/airflow/issues/11893


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #11893:
URL: https://github.com/apache/airflow/issues/11893#issuecomment-827542680


   Yes we agree, we will tackle it as part of https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177050929 -- where we redesign "Smart sensors" and more.
   
   cc @andrewgodwin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mjpieters commented on issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

Posted by GitBox <gi...@apache.org>.
mjpieters commented on issue #11893:
URL: https://github.com/apache/airflow/issues/11893#issuecomment-719108383


   I am using the TI primarily to access the `execution_date` and `dag_id` values. 
   
   I was also using it as a gateway to fetch XCom values, but I'm pretty sure that in my sensors this can and should actually be done in `pre_execute` instead (fetching the value just once then storing the information in the context instead of retrieving the value on the each poke).
   
   Note that I am perfectly prepared to accept that to be smart-sensor compatible the implementation may have to put up with a limited subset of functionality. As long as the common scalar values such as `execution_date` are present and for the common cases we don't have to differentiate between regular and smart-sensor modes when implementing the `poke` method I think a shim TI object would be fine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org