You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lens.apache.org by Yash Sharma <ya...@gmail.com> on 2015/06/25 09:23:41 UTC

[DISCUSS] Query cost computation

Hi All,
Just need little clarification on the query cost.

How do we compute the query cost currently, Do we calculate the cost of the
overall query ?
Is this cost only for limiting the user from using a very expensive query ?

One generally used approach is to have a DAG/Tree of all the operations in
the Query (in our case the Hive AST ) and then each node/operator having
its own cost.

By this we can calculate the cumulative cost of the query which would be a
summission of all the individual costs of operators. This would provide a
very granular control over the query cost.

This approach can also help us further in Query Optimization where certain
operators can be removed or rearranged. Drill/Hive/Pheonix use a similar
approach via Calcite - though the implementation style vary. Kylin is also
supposedly following a similar approach.

Should we explore this possibility ?

P.S I am asking this question without assuming any technical
in-feasibilities or coupling on current design. Just a open thought.