You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2020/06/23 19:12:00 UTC

[jira] [Commented] (SPARK-32063) Spark native temporary table

    [ https://issues.apache.org/jira/browse/SPARK-32063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143233#comment-17143233 ] 

L. C. Hsieh commented on SPARK-32063:
-------------------------------------

For 1 and 2, it seems all related to performance. In Spark, we have caching mechanism that materializes complex query. I think it can complement the shortage of temporary view.

For 3, I'm not sure about this point. Can you elaborate it more?

> Spark native temporary table
> ----------------------------
>
>                 Key: SPARK-32063
>                 URL: https://issues.apache.org/jira/browse/SPARK-32063
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> Many databases and data warehouse SQL engines support temporary tables. A temporary table, as its named implied, is a short-lived table that its life will be only for current session.
> In Spark, there is no temporary table. the DDL “CREATE TEMPORARY TABLE AS SELECT” will create a temporary view. A temporary view is totally different with a temporary table. 
> A temporary view is just a VIEW. It doesn’t materialize data in storage. So it has below shortage:
>  # View will not give improved performance. Materialize intermediate data in temporary tables for a complex query will accurate queries, especially in an ETL pipeline.
>  # View which calls other views can cause severe performance issues. Even, executing a very complex view may fail in Spark. 
>  # Temporary view has no database namespace. In some complex ETL pipelines or data warehouse applications, without database prefix is not convenient. It needs some tables which only used in current session.
>  
> More details are described in [Design Docs|https://docs.google.com/document/d/1RS4Q3VbxlZ_Yy0fdWgTJ-k0QxFd1dToCqpLAYvIJ34U/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org