You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/12/10 01:24:00 UTC

[jira] [Assigned] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data

     [ https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-33729:
------------------------------------

    Assignee: Apache Spark

> When refreshing cache, Spark should not use cached plan when recaching data
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-33729
>                 URL: https://issues.apache.org/jira/browse/SPARK-33729
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Chao Sun
>            Assignee: Apache Spark
>            Priority: Major
>
> Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark will call {{refreshTable}} method within {{CatalogImpl}}.
> {code}
>   override def refreshTable(tableName: String): Unit = {
>     val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
>     val tableMetadata = sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
>     val table = sparkSession.table(tableIdent)
>     if (tableMetadata.tableType == CatalogTableType.VIEW) {
>       // Temp or persistent views: refresh (or invalidate) any metadata/data cached
>       // in the plan recursively.
>       table.queryExecution.analyzed.refresh()
>     } else {
>       // Non-temp tables: refresh the metadata cache.
>       sessionCatalog.refreshTable(tableIdent)
>     }
>     // If this table is cached as an InMemoryRelation, drop the original
>     // cached version and make the new version cached lazily.
>     val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)
>     // uncache the logical plan.
>     // note this is a no-op for the table itself if it's not cached, but will invalidate all
>     // caches referencing this table.
>     sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)
>     if (cache.nonEmpty) {
>       // save the cache name and cache level for recreation
>       val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
>       val cacheLevel = cache.get.cachedRepresentation.cacheBuilder.storageLevel
>       // recache with the same name and cache level.
>       sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, cacheLevel)
>     }
>   }
> {code}
> Note that the {{table}} is created before the table relation cache is cleared, and used later in {{cacheQuery}}. This is incorrect since it still refers cached table relation which could be stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org