You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "okumin (Jira)" <ji...@apache.org> on 2021/01/14 06:47:00 UTC
[jira] [Updated] (HIVE-24606) Multi-stage materialized CTEs can
lose intermediate data
[ https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
okumin updated HIVE-24606:
--------------------------
Summary: Multi-stage materialized CTEs can lose intermediate data (was: Multi-stage materialized CTEs can lost intermediate data)
> Multi-stage materialized CTEs can lose intermediate data
> --------------------------------------------------------
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 2.3.7, 3.1.2, 4.0.0
> Reporter: okumin
> Assignee: okumin
> Priority: Major
>
> With complex multi-stage CTEs, Hive can start a latter stage before its previous stage finishes.
> That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve dependency between multistage materialized CTEs when a non-materialized CTE cuts in.
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It means the dependency between `a1` and `a2` will be ignored and `a2` can start without waiting for `a1`. As a result, the above query returns the following result.
> {code:java}
> +-----+
> | id |
> +-----+
> | a1 |
> | x |
> +-----+
> {code}
> For your information, I ran this test with revision = 425e1ff7c054f87c4db87e77d004282d529599ae.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)