You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/23 15:34:54 UTC

[GitHub] [airflow] potiuk commented on issue #25254: Add Task Instance Lifecycle History table

potiuk commented on issue #25254:
URL: https://github.com/apache/airflow/issues/25254#issuecomment-1193143649

   I am not sure if keeping it in airflow MetaData DB makes sense, This will put ENORMOUS pressure on the database. We are not going to use it in other parts of the MetaData DB. for anything else - just to "dump" the information. 
   
   Since we are going to make most airflow components DB-less (see https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API) that will also put additional pressure on those components to have more communication overhead to write such database enttry.
   
   I think (but I will not close that one yet) this one, similarly to #25252  has much better potential when implemented as part of our OpenTelemetry effort (which has already been approved and voted on https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow) - we are talking about gathering much more of information from Airflow via standard telemetry interfaces, so storing them in the MetaDataDB as opposed to keep them in external systems that are supposed to manage system telemetry and be able to track various kind of telemetry (including traces which are the best matching part of the Open-Telemetry proposa) are much better choice IMHO.
   
   I'd say any "database" entries here that we need should rather keep track fo changes of the DAG structure (which should be part of another AIP https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning  and this is the type of information that should be stored in Airflow Metadata. This is because such versioning can be used by Airlfow itself to make decision (back-filling). 
   
   Following this analogy, I personally think such table of state change would only make sense if we are going to use it for something else. For example IF such a table (or similar) would be a side-effect of implementing SLA feature "properly" then yeah - we could consider that as part of Airlfow Metadata. But if the only reason is to "track the history of changes by human", then we simply try to implement into Airlfow what Telemetry systems are doing way better than any of our implementations can be and we should rather focus on making sure our OpenTelemetry integration allows for it rather than trying to replicate it in-airflow.
   
   This is what I think, but I am curious what others think about it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org