You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "gabrywu (Jira)" <ji...@apache.org> on 2022/02/20 09:55:00 UTC

[jira] [Updated] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

     [ https://issues.apache.org/jira/browse/SPARK-38258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gabrywu updated SPARK-38258:
----------------------------
    Description: 
As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics.

So what do you think of it?[~yumwang] , it it reasonable?

  was:
As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in next queries, spark sql optimizer can use these statistics.

So what do you think of it?[~yumwang] 


> [proposal] collect & update statistics automatically when spark SQL is running
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-38258
>                 URL: https://issues.apache.org/jira/browse/SPARK-38258
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.0, 3.1.0, 3.2.0
>            Reporter: gabrywu
>            Priority: Minor
>
> As we all know, table & column statistics are very important to spark SQL optimizer, however we have to collect & update them using 
> {code:java}
> analyze table tableName compute statistics{code}
>  
> It's a little inconvenient, so why can't we collect & update statistics when a spark stage runs and finishes?
> For example, when a insert overwrite table statement finishes, we can update a corresponding table statistics using SQL metric. And in following queries, spark sql optimizer can use these statistics.
> So what do you think of it?[~yumwang] , it it reasonable?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org