You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/01/30 03:50:47 UTC

[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/20430

    [SPARK-23263][SQL] Create table stored as parquet should update table size if automatic update table size is enabled

    …update table size is enabled
    
    ## What changes were proposed in this pull request?
    How to reproduce:
    ```sql
    bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true
    
    spark-sql> create table test_create_parquet stored as parquet as select 1;
    spark-sql> desc extended test_create_parquet;
    ```
    The table statistics will not exists. This pr fix this issue.
    
    ## How was this patch tested?
    
    unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-23263

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20430.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20430
    
----
commit 08d31c0823e5f6c257b0917362c8e07b04702af2
Author: Yuming Wang <yu...@...>
Date:   2018-01-30T03:45:20Z

    create table stored as parquet should update table size if automatic update table size is enabled

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20430#discussion_r165349231
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ---
    @@ -34,16 +34,12 @@ object CommandUtils extends Logging {
     
       /** Change statistics after changing data by commands. */
       def updateTableStats(sparkSession: SparkSession, table: CatalogTable): Unit = {
    -    if (table.stats.nonEmpty) {
    +    if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
           val catalog = sparkSession.sessionState.catalog
    -      if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
    -        val newTable = catalog.getTableMetadata(table.identifier)
    -        val newSize = CommandUtils.calculateTotalSize(sparkSession.sessionState, newTable)
    -        val newStats = CatalogStatistics(sizeInBytes = newSize)
    -        catalog.alterTableStats(table.identifier, Some(newStats))
    -      } else {
    -        catalog.alterTableStats(table.identifier, None)
    --- End diff --
    
    @felixcheung if the data of a table has been changed and auto size update is disabled, the stats become inaccurate, so we should remove them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    **[Test build #86790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86790/testReport)** for PR 20430 at commit [`08d31c0`](https://github.com/apache/spark/commit/08d31c0823e5f6c257b0917362c8e07b04702af2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Sure, please close and go ahead in the best way you could perform better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/356/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    CC @wzhfy 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum closed the pull request at:

    https://github.com/apache/spark/pull/20430


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    ping @wangyum for @wzhfy's comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Can we specialize this CTAS case? For data changing commands like INSERT, I think we should remove the stats if auto update is disabled, because the previous stats are inaccurate after the insertion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    **[Test build #86790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86790/testReport)** for PR 20430 at commit [`08d31c0`](https://github.com/apache/spark/commit/08d31c0823e5f6c257b0917362c8e07b04702af2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20430#discussion_r164662154
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala ---
    @@ -34,16 +34,12 @@ object CommandUtils extends Logging {
     
       /** Change statistics after changing data by commands. */
       def updateTableStats(sparkSession: SparkSession, table: CatalogTable): Unit = {
    -    if (table.stats.nonEmpty) {
    +    if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
           val catalog = sparkSession.sessionState.catalog
    -      if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
    -        val newTable = catalog.getTableMetadata(table.identifier)
    -        val newSize = CommandUtils.calculateTotalSize(sparkSession.sessionState, newTable)
    -        val newStats = CatalogStatistics(sizeInBytes = newSize)
    -        catalog.alterTableStats(table.identifier, Some(newStats))
    -      } else {
    -        catalog.alterTableStats(table.identifier, None)
    --- End diff --
    
    this seems to be a way to clear out the table stats previously. Don't we need that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86790/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Thanks @HyukjinKwon  How about close this? `CTAS` has other issues , as mentioned in [SPARK-24766](https://issues.apache.org/jira/browse/SPARK-24766). I will try to fix it if there is a chance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20430
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org