You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2021/02/22 04:32:37 UTC

[spark] branch master updated: [SPARK-34401][SQL][DOCS] Update docs about altering cached tables/views

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 6ea4b5f  [SPARK-34401][SQL][DOCS] Update docs about altering cached tables/views
6ea4b5f is described below

commit 6ea4b5fda7fd32f78e204e3de466fdc07e47ee89
Author: Max Gekk <ma...@gmail.com>
AuthorDate: Mon Feb 22 04:32:09 2021 +0000

    [SPARK-34401][SQL][DOCS] Update docs about altering cached tables/views
    
    ### What changes were proposed in this pull request?
    Update public docs of SQL commands about altering cached tables/views. For instance:
    <img width="869" alt="Screenshot 2021-02-08 at 15 11 48" src="https://user-images.githubusercontent.com/1580697/107217940-fd3b8980-6a1f-11eb-98b9-9b2e3fe7f4ef.png">
    
    ### Why are the changes needed?
    To inform users about commands behavior in altering cached tables or views.
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    By running the command below and manually checking the docs:
    ```
    $ SKIP_API=1 SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch
    ```
    
    Closes #31524 from MaxGekk/doc-cmd-caching.
    
    Authored-by: Max Gekk <ma...@gmail.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 docs/sql-ref-syntax-ddl-alter-table.md                         | 10 ++++++++++
 docs/sql-ref-syntax-ddl-alter-view.md                          |  2 ++
 docs/sql-ref-syntax-ddl-drop-table.md                          |  2 ++
 docs/sql-ref-syntax-ddl-repair-table.md                        |  2 ++
 docs/sql-ref-syntax-ddl-truncate-table.md                      |  2 ++
 docs/sql-ref-syntax-dml-load.md                                |  2 ++
 .../main/scala/org/apache/spark/sql/internal/CatalogImpl.scala |  2 +-
 7 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md
index e4d73f3..6fe1405 100644
--- a/docs/sql-ref-syntax-ddl-alter-table.md
+++ b/docs/sql-ref-syntax-ddl-alter-table.md
@@ -27,6 +27,10 @@ license: |
 
 `ALTER TABLE RENAME TO` statement changes the table name of an existing table in the database. The table rename command cannot be used to move a table between databases, only to rename a table within the same database.
 
+If the table is cached, the commands clear cached data of the table. The cache will be lazily filled when the next time the table is accessed. Additionally:
+  * the table rename command uncaches all table's dependents such as views that refer to the table. The dependents should be cached again explicitly.
+  * the partition rename command clears caches of all table dependents while keeping them as cached. So, their caches will be lazily filled when the next time they are accessed.
+
 #### Syntax
 
 ```sql
@@ -103,6 +107,8 @@ ALTER TABLE table_identifier { ALTER | CHANGE } [ COLUMN ] col_spec alterColumnA
 
 `ALTER TABLE ADD` statement adds partition to the partitioned table.
 
+If the table is cached, the command clears cached data of the table and all its dependents that refer to it. The cache will be lazily filled when the next time the table or the dependents are accessed.
+
 ##### Syntax
 
 ```sql
@@ -128,6 +134,8 @@ ALTER TABLE table_identifier ADD [IF NOT EXISTS]
 
 `ALTER TABLE DROP` statement drops the partition of the table.
 
+If the table is cached, the command clears cached data of the table and all its dependents that refer to it. The cache will be lazily filled when the next time the table or the dependents are accessed.
+
 ##### Syntax
 
 ```sql
@@ -187,6 +195,8 @@ ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name
 `ALTER TABLE SET` command can also be used for changing the file location and file format for 
 existing tables. 
 
+If the table is cached, the `ALTER TABLE .. SET LOCATION` command clears cached data of the table and all its dependents that refer to it. The cache will be lazily filled when the next time the table or the dependents are accessed.
+
 ##### Syntax
 
 ```sql
diff --git a/docs/sql-ref-syntax-ddl-alter-view.md b/docs/sql-ref-syntax-ddl-alter-view.md
index a34e77d..d69f246 100644
--- a/docs/sql-ref-syntax-ddl-alter-view.md
+++ b/docs/sql-ref-syntax-ddl-alter-view.md
@@ -28,6 +28,8 @@ the name of a view to a different name, set and unset the metadata of the view b
 Renames the existing view. If the new view name already exists in the source database, a `TableAlreadyExistsException` is thrown. This operation
 does not support moving the views across databases.
 
+If the view is cached, the command clears cached data of the view and all its dependents that refer to it. View's cache will be lazily filled when the next time the view is accessed. The command leaves view's dependents as uncached.
+
 #### Syntax
 ```sql
 ALTER VIEW view_identifier RENAME TO view_identifier
diff --git a/docs/sql-ref-syntax-ddl-drop-table.md b/docs/sql-ref-syntax-ddl-drop-table.md
index a15a992..6c115fd 100644
--- a/docs/sql-ref-syntax-ddl-drop-table.md
+++ b/docs/sql-ref-syntax-ddl-drop-table.md
@@ -26,6 +26,8 @@ if the table is not `EXTERNAL` table. If the table is not present it throws an e
 
 In case of an external table, only the associated metadata information is removed from the metastore database.
 
+If the table is cached, the command uncaches the table and all its dependents.
+
 ### Syntax
 
 ```sql
diff --git a/docs/sql-ref-syntax-ddl-repair-table.md b/docs/sql-ref-syntax-ddl-repair-table.md
index c2ef0a7..3614512 100644
--- a/docs/sql-ref-syntax-ddl-repair-table.md
+++ b/docs/sql-ref-syntax-ddl-repair-table.md
@@ -23,6 +23,8 @@ license: |
 
 `MSCK REPAIR TABLE` recovers all the partitions in the directory of a table and updates the Hive metastore. When creating a table using `PARTITIONED BY` clause, partitions are generated and registered in the Hive metastore. However, if the partitioned table is created from existing data, partitions are not registered automatically in the Hive metastore. User needs to run `MSCK REPAIR TABLE` to register the partitions. `MSCK REPAIR TABLE` on a non-existent table or a table without partiti [...]
 
+If the table is cached, the command clears cached data of the table and all its dependents that refer to it. The cache will be lazily filled when the next time the table or the dependents are accessed.
+
 ### Syntax
 
 ```sql
diff --git a/docs/sql-ref-syntax-ddl-truncate-table.md b/docs/sql-ref-syntax-ddl-truncate-table.md
index 6139814..3bc4d7a 100644
--- a/docs/sql-ref-syntax-ddl-truncate-table.md
+++ b/docs/sql-ref-syntax-ddl-truncate-table.md
@@ -25,6 +25,8 @@ The `TRUNCATE TABLE` statement removes all the rows from a table or partition(s)
 or an external/temporary table. In order to truncate multiple partitions at once, the user can specify the partitions 
 in `partition_spec`. If no `partition_spec` is specified it will remove all partitions in the table.
 
+If the table is cached, the command clears cached data of the table and all its dependents that refer to it. The cache will be lazily filled when the next time the table or the dependents are accessed.
+
 ### Syntax
 
 ```sql
diff --git a/docs/sql-ref-syntax-dml-load.md b/docs/sql-ref-syntax-dml-load.md
index 9381b42..08922b8 100644
--- a/docs/sql-ref-syntax-dml-load.md
+++ b/docs/sql-ref-syntax-dml-load.md
@@ -23,6 +23,8 @@ license: |
 
 `LOAD DATA` statement loads the data into a Hive serde table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the `LOAD DATA` statement takes an optional partition specification. When a partition is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the partition of the t [...]
 
+If the table is cached, the command clears cached data of the table and all its dependents that refer to it. The cache will be lazily filled when the next time the table or the dependents are accessed.
+
 ### Syntax
 
 ```sql
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
index 145daaf..884a389 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
@@ -552,7 +552,7 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
     // Re-caches the logical plan of the relation.
     // Note this is a no-op for the relation itself if it's not cached, but will clear all
     // caches referencing this relation. If this relation is cached as an InMemoryRelation,
-    // this will clear the relation cache and caches of all its dependants.
+    // this will clear the relation cache and caches of all its dependents.
     relation match {
       case SubqueryAlias(_, relationPlan) =>
         sparkSession.sharedState.cacheManager.recacheByPlan(sparkSession, relationPlan)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org