You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2023/03/26 07:39:00 UTC

[jira] [Created] (HIVE-27176) EXPLAIN SKEW

László Bodor created HIVE-27176:
-----------------------------------

             Summary: EXPLAIN SKEW <query>
                 Key: HIVE-27176
                 URL: https://issues.apache.org/jira/browse/HIVE-27176
             Project: Hive
          Issue Type: Improvement
            Reporter: László Bodor


Thinking about a new explain feature, which is actually not an explain, instead a set of analytical queries: considering a very complicated and large SQL statement (this below is a simple one, just for example's sake):
{code}
SELECT a FROM (SELECT b ... JOIN c on b.x = c.y) d JOIN e ON d.v = e.w
{code}

EXPLAIN skew should run a query like:
{code}
SELECT "b", "x", x, count distinct(b.x) as count order by count desc limit 50
UNION ALL
SELECT "c", "y", y, count distinct(c.y) as count order by count desc limit 50
UNION ALL
SELECT "d", "v", v count distinct(d.v) as count order by count desc limit 50
UNION ALL
SELECT "e", "w", w, count distinct(e.w) as count order by count desc limit 50
{code}

collecting some cardinality info about all the join columns found in the query, so result might be like:

{code}
table_name column_name column_value count
b "x" x_skew_value1 100431234
b "x" x_skew_value2 234
c "y" y_skew_value1 350000
c "y" x_skew_value2 459999
c "y" x_skew_value3 42
...
{code}
this doesn't solve the problem, instead shows data skew immediately for further analysis

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)