You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "okumin (Jira)" <ji...@apache.org> on 2021/01/08 13:51:00 UTC

[jira] [Assigned] (HIVE-24606) Multi-stage materialized CTEs can lost intermediate data

     [ https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

okumin reassigned HIVE-24606:
-----------------------------


> Multi-stage materialized CTEs can lost intermediate data
> --------------------------------------------------------
>
>                 Key: HIVE-24606
>                 URL: https://issues.apache.org/jira/browse/HIVE-24606
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 3.1.2, 2.3.7, 4.0.0
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>
> With complex multi-stage CTEs, Hive can start a latter stage before its previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve dependency between multistage materialized CTEs when a non-materialized CTE cuts in.
>  [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It means the dependency between `a1` and `a2` will be ignored and `a2` can start without waiting for `a1`. As a result, the above query returns the following result.
> {code:java}
> +-----+
> | id  |
> +-----+
> | a1  |
> | x   |
> +-----+
> {code}
> For your information, I ran this test with revision = 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)