You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stu (Jira)" <ji...@apache.org> on 2022/03/24 22:14:00 UTC

[jira] [Comment Edited] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

    [ https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511358#comment-17511358 ] 

Stu edited comment on SPARK-26639 at 3/24/22, 10:13 PM:
--------------------------------------------------------

Here's another example of this happening, in Spark 3.1.2. I'm running the following code:
{code:java}
WITH t AS (
  SELECT random() as a
) 
  SELECT * FROM t
  UNION
  SELECT * FROM t {code}
The CTE has a non-deterministic function. If it was pre-calculated, the same random value would be chosen for `a` in both unioned queries, and the output would be deduplicated into a single record.

This is not the case. The output is two records, with different random values.

In our platform, some folks like to write complex CTEs and reference them multiple times. Recalculating these for every reference is quite computationally expensive, so we recommend to create separate tables in these cases, but don't have any way to enforce this. Fixing this bug would save a good number of compute hours!


was (Author: stubartmess):
Here's another example of this happening, in Spark 3.1.2. I'm running the following code:
{code:java}
WITH t AS (
  SELECT random() as a
) 
  SELECT * FROM t
  UNION
  SELECT * FROM t {code}
The CTE has a non-deterministic function. If it was pre-calculated, the same random value would be chosen for `a` in both unioned queries, and the output would be deduplicated into a single record.

This is not the case. The output is two records, with different random values.

> The reuse subquery function maybe does not work in SPARK SQL
> ------------------------------------------------------------
>
>                 Key: SPARK-26639
>                 URL: https://issues.apache.org/jira/browse/SPARK-26639
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Ke Jia
>            Priority: Major
>
> The subquery reuse feature has done in [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org