You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "William Wong (JIRA)" <ji...@apache.org> on 2019/04/15 16:25:00 UTC
[jira] [Resolved] (SPARK-27062) CatalogImpl.refreshTable should
register query in cache with received tableName
[ https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William Wong resolved SPARK-27062.
----------------------------------
Resolution: Duplicate
> CatalogImpl.refreshTable should register query in cache with received tableName
> -------------------------------------------------------------------------------
>
> Key: SPARK-27062
> URL: https://issues.apache.org/jira/browse/SPARK-27062
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.2
> Reporter: William Wong
> Priority: Minor
> Labels: easyfix, pull-request-available
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> If _CatalogImpl.refreshTable()_ method is invoked against a cached table, this method would first uncache corresponding query in the shared state cache manager, and then cache it back to refresh the cache copy.
> However, the table was recached with only 'table name'. The database name will be missed. Therefore, if cached table is not on the default database, the recreated cache may refer to a different table. For example, we may see the cached table name in driver's storage page will be changed after table refreshing.
> Here is related code on github for your reference.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
>
> {code:java}
> override def refreshTable(tableName: String): Unit = {
> val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
> val tableMetadata = sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
> val table = sparkSession.table(tableIdent)
> if (tableMetadata.tableType == CatalogTableType.VIEW) {
> // Temp or persistent views: refresh (or invalidate) any metadata/data cached
> // in the plan recursively.
> table.queryExecution.analyzed.refresh()
> } else {
> // Non-temp tables: refresh the metadata cache.
> sessionCatalog.refreshTable(tableIdent)
> }
> // If this table is cached as an InMemoryRelation, drop the original
> // cached version and make the new version cached lazily.
> if (isCached(table)) {
> // Uncache the logicalPlan.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, blocking = true)
> // Cache it again.
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
> }
> }
> {code}
>
> CatalogImpl cache table with received _tableName_, instead of _tableIdent.table_
> {code:java}
> override def cacheTable(tableName: String): Unit = { sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), Some(tableName)) }
> {code}
>
> Therefore, I would like to propose aligning the behavior. RefreshTable method should reuse the received _tableName_. Here is the proposed line of changes.
>
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
> {code}
> to
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName)){code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org