You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/07/30 07:27:39 UTC

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/1650

    [SPARK-2734][SQL] Remove tables from cache when DROP TABLE is run.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark dropCached

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1650
    
----
commit c3f535d751c177d2f608e2299b8543d3c72dae5f
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-07-30T05:25:19Z

    Remove tables from cache when DROP TABLE is run.t p

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1650


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50579517
  
    QA results for PR 1650:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class DropTable(tableName: String) extends LeafNode with Command {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17420/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1650#discussion_r15596099
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/DropTable.scala ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.sql.catalyst.expressions.Row
    +import org.apache.spark.sql.execution.{Command, LeafNode}
    +import org.apache.spark.sql.hive.HiveContext
    +
    +/**
    + * :: DeveloperApi ::
    + * Drops a table from the metastore and removes it if it is cached.
    + */
    +@DeveloperApi
    +case class DropTable(tableName: String) extends LeafNode with Command {
    +
    +  def hiveContext = sqlContext.asInstanceOf[HiveContext]
    +
    +  def output = Seq.empty
    +
    +  override protected[sql] lazy val sideEffectResult: Seq[Any] = {
    +    hiveContext.runSqlHive(s"DROP TABLE $tableName")
    +    hiveContext.catalog.unregisterTable(None, tableName)
    --- End diff --
    
    Probably not. `runSqlHive` directly passes the statement to Hive. So, only Hive metastore records will be deleted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1650#discussion_r15572198
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/DropTable.scala ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.rdd.RDD
    +import org.apache.spark.sql.catalyst.expressions.Row
    +import org.apache.spark.sql.execution.{Command, LeafNode}
    +import org.apache.spark.sql.hive.HiveContext
    +
    +/**
    + * :: DeveloperApi ::
    + * Drops a table from the metastore and removes it if it is cached.
    + */
    +@DeveloperApi
    +case class DropTable(tableName: String) extends LeafNode with Command {
    +
    +  def hiveContext = sqlContext.asInstanceOf[HiveContext]
    +
    +  def output = Seq.empty
    +
    +  override protected[sql] lazy val sideEffectResult: Seq[Any] = {
    +    hiveContext.runSqlHive(s"DROP TABLE $tableName")
    +    hiveContext.catalog.unregisterTable(None, tableName)
    --- End diff --
    
    Can we remove `hiveContext.catalog.unregisterTable(None, tableName)`? I think `hiveContext.runSqlHive(s"DROP TABLE $tableName")` will update the hiveContext.catalog in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50575367
  
    QA tests have started for PR 1650. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17420/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50680290
  
    QA tests have started for PR 1650. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17487/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50679646
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50669135
  
    QA tests have started for PR 1650. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17470/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1650#discussion_r15568205
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -377,6 +378,10 @@ private[hive] object HiveQl {
       }
     
       protected def nodeToPlan(node: Node): LogicalPlan = node match {
    +    // Special drop table that also uncaches.
    +    case Token("TOK_DROPTABLE",
    +           Token("TOK_TABNAME",
    +              Token(tableName, Nil) :: Nil) :: Nil) => DropTable(tableName)
    --- End diff --
    
    Seems we also need to support refer to a table with the format of `dbName.tableName` and `IF EXISTS`.
    
    An example AST:
    The AST tree of `drop table if exists default.src` is 
    ```
    TOK_DROPTABLE
      TOK_TABNAME
        default
        src
      TOK_IFEXISTS
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50699438
  
    Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50678662
  
    QA results for PR 1650:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class DropTable(tableName: String, ifExists: Boolean) extends LeafNode with Command {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17470/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2734][SQL] Remove tables from cache whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1650#issuecomment-50688613
  
    QA results for PR 1650:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class DropTable(tableName: String, ifExists: Boolean) extends LeafNode with Command {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17487/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---