You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bogdan Raducanu (JIRA)" <ji...@apache.org> on 2017/09/10 19:50:00 UTC

[jira] [Created] (SPARK-21969) CommandUtils.updateTableStats should call refreshTable

Bogdan Raducanu created SPARK-21969:
---------------------------------------

             Summary: CommandUtils.updateTableStats should call refreshTable
                 Key: SPARK-21969
                 URL: https://issues.apache.org/jira/browse/SPARK-21969
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Bogdan Raducanu


The table is cached so even though statistics are removed, they will still be used by the existing sessions.


{{code}}
spark.range(100).write.saveAsTable("tab1")
sql("analyze table tab1 compute statistics")
sql("explain cost select distinct * from tab1").show(false)
{{code}}

Produces:
{{code}}
Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, hints=none)
{{code}}


{{code}}
spark.range(100).write.mode("append").saveAsTable("tab1")
sql("explain cost select distinct * from tab1").show(false)
{{code}}

After append something, the same stats are used
{{code}}
Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, hints=none)
{{code}}

Manually refreshing the table removes the stats
{{code}}
spark.sessionState.catalog.refreshTable(TableIdentifier("tab1"))
sql("explain cost select distinct * from tab1").show(false)
{{code}}

{{code}}
Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none)
{{code}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org