You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/01 23:01:10 UTC

[GitHub] [airflow] potiuk commented on issue #16706: Improve operational logging and telemetry reporting in hooks

potiuk commented on issue #16706:
URL: https://github.com/apache/airflow/issues/16706#issuecomment-872599076


   How would you approach it @malthe ? You wrote you are willing to make PR but I am not sure you fully realise the scope of it (or maybe I do not understand something :). 
   
   Do you think about proposing/enforcing some standard of telemetry and implementing it separate for each hook? At once? Gradually? Or implementing some automated telemetry logging via BaseHook/python magic/automation ? 
   
   I think it is a good idea to standardize but I am simply afraid  that this might be rather difficult taking into account that each hook's implementation uses different libraries to communicate, each has a different set of methods  - which are not standardised (cannot be). And taking into account a number of those.
   
   Just a few numbers/back-of-the-envelope calculation: 
   
   We have 162 Hooks currently and many of them have 5-10, some ~20/30 methods that communicate outside.
   
   I did a very rough grep: `find . -name '*.py' | grep '/hooks/' | grep -v _init_ |grep -v test| xargs grep -e " def.*(" | grep -v "def _" | wc -l`
   Result: 2961
   
   You can remove the last 'wc' and see that vast majority of those are 'legitimate' hook methods that should be instrumented. Roughly speaking we have ~3000 methods (excluded constructors + internal methods) that we would need to instrument If we do it manually one-by-one. If it takes 20 minutes per method (which is unrealistic including testing/pr approval/review)  - we are talking about 1000 hours of work (about 6 man-months).
   
   I am not against, just wanted to think how realistic it is to standardise something like that across all  the codebase (if we cannot do automation) - and whether it's worth it, if this is that much of an effort. Of course it might be distributed among the contributors, but then it has to be organized, tracked, people need to be found owning parts of the code, convinced. testing done etc. etc. This is all doable but it's quite an effort that will take literally year or so to complete. We could of course just define the standard and hope for the best that people will implement it over time - but then it would take much more to complete.
   
   I'd really love to hear what's your thinking here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org