You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/21 23:26:06 UTC

[GitHub] [airflow] potiuk commented on issue #17771: Investigating expensive query when a dag has a large amount of tasks.

potiuk commented on issue #17771:
URL: https://github.com/apache/airflow/issues/17771#issuecomment-903188055


   > The motivation behind wanting to investigate this is that if it is resolved, we can reduce the hardware requirements. The expense of this query also gives me concerns about scalability in the future.
   
   As I see it this is wrong diagnosis. You can't optimise your hardware this way. Please correct me if I am wrong,  But If you reduce the hardware requirements now by 50%, the query will (at most) take 4 seconds. Which will cause longer, smaller spike, but that's about it. Jumping to conclusion that it might cause scalability issues is extremely premature, I'd love if you explain how this might be impacting scalability. It's very easy to say "this query takes most time" but you should analyse it and judge if it makes sense to optimise it and what impact it will have.
   
   There is a very good quote `premature optimisation is the root of all evil` which I very much sympathise with and for any optimisation I do, I try at least to find out what am I optimising actually and how much can I gain.
   
   I am not saying that it's not worth, but I am saying from the data/analysis/graphs you provided I see no reason why anyone should invest it optimising something that brings at most 0.03% improvement. This is very basic engineering rule - optimise something that you know makes sense to be optimised.
   
   If you  you have some strongly founded, data-based justification that optimising this query is needed I am happy to take a look at it but as @uranusjr - if you feel that you might improve that query - feel free, we will judge is potential complexity (which is inherent with optimisation) is worth it. 
   
   Shall I assign you the issue ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org