You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/01/17 17:55:52 UTC

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16621

    [SPARK-19265][SQL] make table relation cache general and does not depend on hive

    ## What changes were proposed in this pull request?
    
    We have a table relation plan cache in `HiveMetastoreCatalog`, which caches a lot of things: file status, resolved data source, inferred schema, etc.
    
    However, it doesn't make sense to limit this cache with hive support, we should move it to SQL core module so that users can use this cache without hive support.
    
    It can also reduce the size of `HiveMetastoreCatalog`, so that it's easier to remove it eventually.
    
    main changes:
    1. move the table relation cache to `SessionCatalog`
    2. `SessionCatalog.lookupRelation` will return `SimpleCatalogRelation` and the analyzer will convert it to `LogicalRelation` or `MetastoreRelation` later, then `HiveSessionCatalog` doesn't need to override `lookupRelation` anymore
    3. `FindDataSourceTable` will read/write the table relation cache.
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark plan-cache

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16621
    
----
commit a7845cc54b95c9935801225a292bf87a08158bbe
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-17T15:03:33Z

    make table relation cache general and does not depend on hive

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96594811
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
     
     
     /**
    - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    - * source information.
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive.
      */
     class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    -  private def readDataSourceTable(
    -      sparkSession: SparkSession,
    -      simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
    -    val table = simpleCatalogRelation.catalogTable
    -    val pathOption = table.storage.locationUri.map("path" -> _)
    -    val dataSource =
    -      DataSource(
    -        sparkSession,
    -        userSpecifiedSchema = Some(table.schema),
    -        partitionColumns = table.partitionColumnNames,
    -        bucketSpec = table.bucketSpec,
    -        className = table.provider.get,
    -        options = table.storage.properties ++ pathOption)
    -
    -    LogicalRelation(
    -      dataSource.resolveRelation(),
    -      expectedOutputAttributes = Some(simpleCatalogRelation.output),
    -      catalogTable = Some(table))
    +  private def readDataSourceTable(table: CatalogTable): LogicalPlan = {
    +    val qualifiedTableName = QualifiedTableName(table.database, table.identifier.table)
    +    val cache = sparkSession.sessionState.catalog.tableRelationCache
    +    val withHiveSupport =
    +      sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive"
    +
    +    cache.get(qualifiedTableName, new Callable[LogicalPlan]() {
    +      override def call(): LogicalPlan = {
    +        val pathOption = table.storage.locationUri.map("path" -> _)
    +        val dataSource =
    +          DataSource(
    +            sparkSession,
    +            // In older version(prior to 2.1) of Spark, the table schema can be empty and should be
    +            // inferred at runtime. We should still support it.
    +            userSpecifiedSchema = if (table.schema.isEmpty) None else Some(table.schema),
    +            partitionColumns = table.partitionColumnNames,
    +            bucketSpec = table.bucketSpec,
    +            className = table.provider.get,
    +            options = table.storage.properties ++ pathOption,
    +            // TODO: improve `InMemoryCatalog` and remove this limitation.
    +            catalogTable = if (withHiveSupport) Some(table) else None)
    +
    +        LogicalRelation(dataSource.resolveRelation(), catalogTable = Some(table))
    --- End diff --
    
    ok, then we can revert it without changing the analyze part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71567/testReport)** for PR 16621 at commit [`919aaa2`](https://github.com/apache/spark/commit/919aaa2fbdf21fb4760a855c538e4bc9efa25d4b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class QualifiedTableName(database: String, name: String)`
      * `class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96564374
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -650,14 +659,21 @@ class SessionCatalog(
        * Refresh the cache entry for a metastore table, if any.
        */
       def refreshTable(name: TableIdentifier): Unit = synchronized {
    +    val dbName = formatDatabaseName(name.database.getOrElse(currentDb))
    +    val tableName = formatTableName(name.table)
    +
         // Go through temporary tables and invalidate them.
    -    // If the database is defined, this is definitely not a temp table.
    +    // If the database is defined, this may be a global temporary view.
         // If the database is not defined, there is a good chance this is a temp table.
         if (name.database.isEmpty) {
    -      tempTables.get(formatTableName(name.table)).foreach(_.refresh())
    -    } else if (formatDatabaseName(name.database.get) == globalTempViewManager.database) {
    -      globalTempViewManager.get(formatTableName(name.table)).foreach(_.refresh())
    +      tempTables.get(tableName).foreach(_.refresh())
    +    } else if (dbName == globalTempViewManager.database) {
    +      globalTempViewManager.get(tableName).foreach(_.refresh())
         }
    +
    +    // Also invalidate the table relation cache.
    --- End diff --
    
    is it still valid?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96570432
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -215,37 +215,44 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
     
     
     /**
    - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    - * source information.
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive.
      */
     class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    -  private def readDataSourceTable(
    -      sparkSession: SparkSession,
    -      simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
    -    val table = simpleCatalogRelation.catalogTable
    -    val pathOption = table.storage.locationUri.map("path" -> _)
    -    val dataSource =
    -      DataSource(
    -        sparkSession,
    -        userSpecifiedSchema = Some(table.schema),
    -        partitionColumns = table.partitionColumnNames,
    -        bucketSpec = table.bucketSpec,
    -        className = table.provider.get,
    -        options = table.storage.properties ++ pathOption)
    -
    -    LogicalRelation(
    -      dataSource.resolveRelation(),
    -      expectedOutputAttributes = Some(simpleCatalogRelation.output),
    -      catalogTable = Some(table))
    +  private def readDataSourceTable(relation: SimpleCatalogRelation): LogicalPlan = {
    +    val table = relation.catalogTable
    +    val cache = sparkSession.sessionState.catalog.tableRelationCache
    +    val withHiveSupport =
    +      sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive"
    +
    +    cache.get(table.qualifiedIdentifier, new Callable[LogicalPlan]() {
    +      override def call(): LogicalPlan = {
    +        val pathOption = table.storage.locationUri.map("path" -> _)
    +        val dataSource =
    +          DataSource(
    +            sparkSession,
    +            userSpecifiedSchema = Some(table.schema),
    --- End diff --
    
    ```
                // In older version(prior to 2.1) of Spark, the table schema can be empty and should be
                // inferred at runtime. We should still support it.
    ```
    
    Is it still valid?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96577427
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
     
     
     /**
    - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    - * source information.
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive.
      */
     class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    -  private def readDataSourceTable(
    -      sparkSession: SparkSession,
    -      simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
    -    val table = simpleCatalogRelation.catalogTable
    -    val pathOption = table.storage.locationUri.map("path" -> _)
    -    val dataSource =
    -      DataSource(
    -        sparkSession,
    -        userSpecifiedSchema = Some(table.schema),
    -        partitionColumns = table.partitionColumnNames,
    -        bucketSpec = table.bucketSpec,
    -        className = table.provider.get,
    -        options = table.storage.properties ++ pathOption)
    -
    -    LogicalRelation(
    -      dataSource.resolveRelation(),
    -      expectedOutputAttributes = Some(simpleCatalogRelation.output),
    -      catalogTable = Some(table))
    +  private def readDataSourceTable(table: CatalogTable): LogicalPlan = {
    +    val qualifiedTableName = QualifiedTableName(table.database, table.identifier.table)
    +    val cache = sparkSession.sessionState.catalog.tableRelationCache
    +    val withHiveSupport =
    +      sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive"
    +
    +    cache.get(qualifiedTableName, new Callable[LogicalPlan]() {
    +      override def call(): LogicalPlan = {
    +        val pathOption = table.storage.locationUri.map("path" -> _)
    +        val dataSource =
    +          DataSource(
    +            sparkSession,
    +            // In older version(prior to 2.1) of Spark, the table schema can be empty and should be
    +            // inferred at runtime. We should still support it.
    +            userSpecifiedSchema = if (table.schema.isEmpty) None else Some(table.schema),
    +            partitionColumns = table.partitionColumnNames,
    +            bucketSpec = table.bucketSpec,
    +            className = table.provider.get,
    +            options = table.storage.properties ++ pathOption,
    +            // TODO: improve `InMemoryCatalog` and remove this limitation.
    +            catalogTable = if (withHiveSupport) Some(table) else None)
    +
    +        LogicalRelation(dataSource.resolveRelation(), catalogTable = Some(table))
    --- End diff --
    
    Note that, previously we will set `expectedOutputAttributes` here, which was added by https://github.com/apache/spark/pull/15182
    
    However, this doesn't work when the table schema needs to be inferred at runtime, and it turns out that we don't need to do it at all. `AnalyzeColumnCommand` now gets attributes from the [resolved table relation plan](https://github.com/apache/spark/pull/16621/files#diff-027d6bd7c8cf4f64f99acc058389d859R44) , so it's fine for rule `FindDataSourceTable` to change outputs during analysis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96564541
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala ---
    @@ -60,9 +60,11 @@ case class TableIdentifier(table: String, database: Option[String])
       override val identifier: String = table
     
       def this(table: String) = this(table, None)
    -
     }
     
    +/** A fully qualified identifier for a table (i.e., database.tableName) */
    +case class QualifiedTableName(database: String, name: String)
    --- End diff --
    
    If users don't change the case sensitive config at runtime, it will be ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71549/testReport)** for PR 16621 at commit [`98a5483`](https://github.com/apache/spark/commit/98a54839994c5a990839f2b7a9116e6a79114862).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class QualifiedTableName(database: String, name: String)`
      * `class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    After merging https://github.com/apache/spark/pull/16517, it introduces a few conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71583/testReport)** for PR 16621 at commit [`096fcc8`](https://github.com/apache/spark/commit/096fcc82e7f77e8eec6c6caa04226e6ea09cb1b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    can we do it later? We are going to merge `CatalogRelation` implementations and unify the table relation representations soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96562575
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala ---
    @@ -60,9 +60,11 @@ case class TableIdentifier(table: String, database: Option[String])
       override val identifier: String = table
     
       def this(table: String) = this(table, None)
    -
     }
     
    +/** A fully qualified identifier for a table (i.e., database.tableName) */
    +case class QualifiedTableName(database: String, name: String)
    --- End diff --
    
    We are having a case sensitivity issue here, right? [Previously, we always make both database and table to lower cases](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L50-L54). Database and table names are not case sensitive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71522/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96807499
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -586,12 +594,12 @@ class SessionCatalog(
                 desc = metadata,
                 output = metadata.schema.toAttributes,
                 child = parser.parsePlan(viewText))
    -          SubqueryAlias(relationAlias, child, Option(name))
    +          SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db))))
             } else {
               SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None)
             }
           } else {
    -        SubqueryAlias(relationAlias, tempTables(table), Option(name))
    +        SubqueryAlias(relationAlias, tempTables(table), None)
    --- End diff --
    
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71583/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96575899
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -1799,6 +1799,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
               .getTableMetadata(TableIdentifier("tbl")).storage.locationUri.get
     
             sql(s"ALTER TABLE tbl SET LOCATION '${dir.getCanonicalPath}'")
    +        spark.catalog.refreshTable("tbl")
    --- End diff --
    
    +1 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96785980
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -118,6 +118,14 @@ class SessionCatalog(
       }
     
       /**
    +   * A cache of qualified table name to table relation plan.
    +   */
    +  val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {
    +    // TODO: create a config instead of hardcode 1000 here.
    --- End diff --
    
    Yep. Sure~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71522/testReport)** for PR 16621 at commit [`a7845cc`](https://github.com/apache/spark/commit/a7845cc54b95c9935801225a292bf87a08158bbe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96561166
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -650,14 +659,21 @@ class SessionCatalog(
        * Refresh the cache entry for a metastore table, if any.
        */
       def refreshTable(name: TableIdentifier): Unit = synchronized {
    +    val dbName = formatDatabaseName(name.database.getOrElse(currentDb))
    +    val tableName = formatTableName(name.table)
    +
         // Go through temporary tables and invalidate them.
    -    // If the database is defined, this is definitely not a temp table.
    +    // If the database is defined, this may be a global temporary view.
         // If the database is not defined, there is a good chance this is a temp table.
         if (name.database.isEmpty) {
    -      tempTables.get(formatTableName(name.table)).foreach(_.refresh())
    -    } else if (formatDatabaseName(name.database.get) == globalTempViewManager.database) {
    -      globalTempViewManager.get(formatTableName(name.table)).foreach(_.refresh())
    +      tempTables.get(tableName).foreach(_.refresh())
    +    } else if (dbName == globalTempViewManager.database) {
    +      globalTempViewManager.get(tableName).foreach(_.refresh())
         }
    +
    +    // Also invalidate the table relation cache.
    --- End diff --
    
    How about keep the previous comment here? 
    ```
        // refreshTable does not eagerly reload the cache. It just invalidate the cache.
        // Next time when we use the table, it will be populated in the cache.
        // Since we also cache ParquetRelations converted from Hive Parquet tables and
        // adding converted ParquetRelations into the cache is not defined in the load function
        // of the cache (instead, we add the cache entry in convertToParquetRelation),
        // it is better at here to invalidate the cache to avoid confusing waring logs from the
        // cache loader (e.g. cannot find data source provider, which is only defined for
        // data source table.).
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96572359
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -215,37 +215,44 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
     
     
     /**
    - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    - * source information.
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive.
      */
     class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    -  private def readDataSourceTable(
    -      sparkSession: SparkSession,
    -      simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
    -    val table = simpleCatalogRelation.catalogTable
    -    val pathOption = table.storage.locationUri.map("path" -> _)
    -    val dataSource =
    -      DataSource(
    -        sparkSession,
    -        userSpecifiedSchema = Some(table.schema),
    -        partitionColumns = table.partitionColumnNames,
    -        bucketSpec = table.bucketSpec,
    -        className = table.provider.get,
    -        options = table.storage.properties ++ pathOption)
    -
    -    LogicalRelation(
    -      dataSource.resolveRelation(),
    -      expectedOutputAttributes = Some(simpleCatalogRelation.output),
    -      catalogTable = Some(table))
    +  private def readDataSourceTable(relation: SimpleCatalogRelation): LogicalPlan = {
    +    val table = relation.catalogTable
    +    val cache = sparkSession.sessionState.catalog.tableRelationCache
    +    val withHiveSupport =
    +      sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive"
    +
    +    cache.get(table.qualifiedIdentifier, new Callable[LogicalPlan]() {
    +      override def call(): LogicalPlan = {
    +        val pathOption = table.storage.locationUri.map("path" -> _)
    +        val dataSource =
    +          DataSource(
    +            sparkSession,
    +            userSpecifiedSchema = Some(table.schema),
    --- End diff --
    
    good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96800976
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -586,12 +594,12 @@ class SessionCatalog(
                 desc = metadata,
                 output = metadata.schema.toAttributes,
                 child = parser.parsePlan(viewText))
    -          SubqueryAlias(relationAlias, child, Option(name))
    +          SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db))))
             } else {
               SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None)
             }
           } else {
    -        SubqueryAlias(relationAlias, tempTables(table), Option(name))
    +        SubqueryAlias(relationAlias, tempTables(table), None)
    --- End diff --
    
    Sorry, I have been living under a rock in the past month or so.
    
    This is not really needed anymore. Lets remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96576872
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -586,12 +594,12 @@ class SessionCatalog(
                 desc = metadata,
                 output = metadata.schema.toAttributes,
                 child = parser.parsePlan(viewText))
    -          SubqueryAlias(relationAlias, child, Option(name))
    +          SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db))))
             } else {
               SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None)
             }
           } else {
    -        SubqueryAlias(relationAlias, tempTables(table), Option(name))
    +        SubqueryAlias(relationAlias, tempTables(table), None)
    --- End diff --
    
    the existing way is to set `None`, see https://github.com/apache/spark/pull/16621/files#diff-ca4533edbf148c89cc0c564ab6b0aeaaL75
    
    This shows the evil of duplicated code, we have inconsistent behaviors without and without hive support. I think we should only set table identifier for persisted view, @hvanhovell is that true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96785426
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -118,6 +118,14 @@ class SessionCatalog(
       }
     
       /**
    +   * A cache of qualified table name to table relation plan.
    +   */
    +  val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {
    +    // TODO: create a config instead of hardcode 1000 here.
    --- End diff --
    
    yea it's easy, but I wanna minimal the code changes so it's easier to review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71549/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96577471
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -1626,17 +1626,6 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         assert(d.size == d.distinct.size)
       }
     
    -  test("SPARK-17625: data source table in InMemoryCatalog should guarantee output consistency") {
    --- End diff --
    
    we don't need this test anymore, see https://github.com/apache/spark/pull/16621/files#r96577427


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96577543
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala ---
    @@ -1322,4 +1322,26 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv
           sparkSession.sparkContext.conf.set(DEBUG_MODE, previousValue)
         }
       }
    +
    +  test("SPARK-18464: support old table which doesn't store schema in table properties") {
    --- End diff --
    
    this test was removed in https://github.com/apache/spark/pull/16003, but I find it's still useful and is not covered by other tests, so I add it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96800456
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -586,12 +594,12 @@ class SessionCatalog(
                 desc = metadata,
                 output = metadata.schema.toAttributes,
                 child = parser.parsePlan(viewText))
    -          SubqueryAlias(relationAlias, child, Option(name))
    +          SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db))))
             } else {
               SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None)
             }
           } else {
    -        SubqueryAlias(relationAlias, tempTables(table), Option(name))
    +        SubqueryAlias(relationAlias, tempTables(table), None)
    --- End diff --
    
    ping @hvanhovell again. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71522/testReport)** for PR 16621 at commit [`a7845cc`](https://github.com/apache/spark/commit/a7845cc54b95c9935801225a292bf87a08158bbe).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class QualifiedTableName(database: String, name: String)`
      * `class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16621


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96565032
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -386,7 +386,9 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
               case relation: CatalogRelation if DDLUtils.isHiveTable(relation.catalogTable) =>
                 relation.catalogTable.identifier
             }
    -        EliminateSubqueryAliases(catalog.lookupRelation(tableIdentWithDB)) match {
    +
    +        val tableRelation = df.sparkSession.table(tableIdentWithDB).queryExecution.analyzed
    --- End diff --
    
    yea, now `lookupRelation` will return `SimpleCatalogRelation` and other analyzer rules will convert it to `LogicalRelation` or `MetastoreRelation`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71574/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96565694
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -513,8 +514,10 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           isDataSourceTable: Boolean,
           format: String,
           userSpecifiedLocation: Option[String] = None): Unit = {
    -    val relation = EliminateSubqueryAliases(
    -      sessionState.catalog.lookupRelation(TableIdentifier(tableName)))
    +    var relation: LogicalPlan = null
    +    withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
    --- End diff --
    
    I checked the test cases. `ORC` has the same issue, but the default value is `false` currently. Thus, I think we should set `CONVERT_METASTORE_ORC` to `false` too, in case we change the default value of `CONVERT_METASTORE_ORC` in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96571473
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -650,14 +659,21 @@ class SessionCatalog(
        * Refresh the cache entry for a metastore table, if any.
        */
       def refreshTable(name: TableIdentifier): Unit = synchronized {
    +    val dbName = formatDatabaseName(name.database.getOrElse(currentDb))
    +    val tableName = formatTableName(name.table)
    +
         // Go through temporary tables and invalidate them.
    -    // If the database is defined, this is definitely not a temp table.
    +    // If the database is defined, this may be a global temporary view.
         // If the database is not defined, there is a good chance this is a temp table.
         if (name.database.isEmpty) {
    -      tempTables.get(formatTableName(name.table)).foreach(_.refresh())
    -    } else if (formatDatabaseName(name.database.get) == globalTempViewManager.database) {
    -      globalTempViewManager.get(formatTableName(name.table)).foreach(_.refresh())
    +      tempTables.get(tableName).foreach(_.refresh())
    +    } else if (dbName == globalTempViewManager.database) {
    +      globalTempViewManager.get(tableName).foreach(_.refresh())
         }
    +
    +    // Also invalidate the table relation cache.
    --- End diff --
    
    After an offline discussion, I am fine to remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Sure. No problem. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71549/testReport)** for PR 16621 at commit [`98a5483`](https://github.com/apache/spark/commit/98a54839994c5a990839f2b7a9116e6a79114862).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71567/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96563150
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -386,7 +386,9 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
               case relation: CatalogRelation if DDLUtils.isHiveTable(relation.catalogTable) =>
                 relation.catalogTable.identifier
             }
    -        EliminateSubqueryAliases(catalog.lookupRelation(tableIdentWithDB)) match {
    +
    +        val tableRelation = df.sparkSession.table(tableIdentWithDB).queryExecution.analyzed
    --- End diff --
    
    This change is to avoid overriding `lookupRelation` in `HiveMetastoreCatalog `, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Could we rename `SimpleCatalogRelation` to `UnresolvedCatalogRelation`? The current name looks very confusing to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96567010
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -650,14 +659,21 @@ class SessionCatalog(
        * Refresh the cache entry for a metastore table, if any.
        */
       def refreshTable(name: TableIdentifier): Unit = synchronized {
    +    val dbName = formatDatabaseName(name.database.getOrElse(currentDb))
    +    val tableName = formatTableName(name.table)
    +
         // Go through temporary tables and invalidate them.
    -    // If the database is defined, this is definitely not a temp table.
    +    // If the database is defined, this may be a global temporary view.
         // If the database is not defined, there is a good chance this is a temp table.
         if (name.database.isEmpty) {
    -      tempTables.get(formatTableName(name.table)).foreach(_.refresh())
    -    } else if (formatDatabaseName(name.database.get) == globalTempViewManager.database) {
    -      globalTempViewManager.get(formatTableName(name.table)).foreach(_.refresh())
    +      tempTables.get(tableName).foreach(_.refresh())
    +    } else if (dbName == globalTempViewManager.database) {
    +      globalTempViewManager.get(tableName).foreach(_.refresh())
         }
    +
    +    // Also invalidate the table relation cache.
    --- End diff --
    
    uh... you are changing how we use the cache. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71567/testReport)** for PR 16621 at commit [`919aaa2`](https://github.com/apache/spark/commit/919aaa2fbdf21fb4760a855c538e4bc9efa25d4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    cc @yhuai @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    No more comments. It looks pretty good! Let us see whether all the test cases can pass. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96575520
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -586,12 +594,12 @@ class SessionCatalog(
                 desc = metadata,
                 output = metadata.schema.toAttributes,
                 child = parser.parsePlan(viewText))
    -          SubqueryAlias(relationAlias, child, Option(name))
    +          SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db))))
             } else {
               SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None)
             }
           } else {
    -        SubqueryAlias(relationAlias, tempTables(table), Option(name))
    +        SubqueryAlias(relationAlias, tempTables(table), None)
    --- End diff --
    
    Should we keep the existing way? This was introduced for the EXPLAIN command of view. See the PR: https://github.com/apache/spark/pull/14657


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71583/testReport)** for PR 16621 at commit [`096fcc8`](https://github.com/apache/spark/commit/096fcc82e7f77e8eec6c6caa04226e6ea09cb1b8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class QualifiedTableName(database: String, name: String)`
      * `class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71574/testReport)** for PR 16621 at commit [`bbccdae`](https://github.com/apache/spark/commit/bbccdae6640a5efe047dba2df569384a78bd986e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71572/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71572/testReport)** for PR 16621 at commit [`2883c8b`](https://github.com/apache/spark/commit/2883c8bc6c22bfde24711060b5110b896f4c8b4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71576/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96577649
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
     
     
     /**
    - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data
    - * source information.
    + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive.
      */
     class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] {
    -  private def readDataSourceTable(
    -      sparkSession: SparkSession,
    -      simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
    -    val table = simpleCatalogRelation.catalogTable
    -    val pathOption = table.storage.locationUri.map("path" -> _)
    -    val dataSource =
    -      DataSource(
    -        sparkSession,
    -        userSpecifiedSchema = Some(table.schema),
    -        partitionColumns = table.partitionColumnNames,
    -        bucketSpec = table.bucketSpec,
    -        className = table.provider.get,
    -        options = table.storage.properties ++ pathOption)
    -
    -    LogicalRelation(
    -      dataSource.resolveRelation(),
    -      expectedOutputAttributes = Some(simpleCatalogRelation.output),
    -      catalogTable = Some(table))
    +  private def readDataSourceTable(table: CatalogTable): LogicalPlan = {
    +    val qualifiedTableName = QualifiedTableName(table.database, table.identifier.table)
    +    val cache = sparkSession.sessionState.catalog.tableRelationCache
    +    val withHiveSupport =
    +      sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive"
    +
    +    cache.get(qualifiedTableName, new Callable[LogicalPlan]() {
    +      override def call(): LogicalPlan = {
    +        val pathOption = table.storage.locationUri.map("path" -> _)
    +        val dataSource =
    +          DataSource(
    +            sparkSession,
    +            // In older version(prior to 2.1) of Spark, the table schema can be empty and should be
    +            // inferred at runtime. We should still support it.
    +            userSpecifiedSchema = if (table.schema.isEmpty) None else Some(table.schema),
    +            partitionColumns = table.partitionColumnNames,
    +            bucketSpec = table.bucketSpec,
    +            className = table.provider.get,
    +            options = table.storage.properties ++ pathOption,
    +            // TODO: improve `InMemoryCatalog` and remove this limitation.
    +            catalogTable = if (withHiveSupport) Some(table) else None)
    +
    +        LogicalRelation(dataSource.resolveRelation(), catalogTable = Some(table))
    --- End diff --
    
    cc @wzhfy 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    **[Test build #71576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71576/testReport)** for PR 16621 at commit [`d636389`](https://github.com/apache/spark/commit/d636389947af3041832c63582f9073b92421d7f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16621#discussion_r96726793
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -118,6 +118,14 @@ class SessionCatalog(
       }
     
       /**
    +   * A cache of qualified table name to table relation plan.
    +   */
    +  val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {
    +    // TODO: create a config instead of hardcode 1000 here.
    --- End diff --
    
    Hi, @cloud-fan .
    Why not making this config in this PR? It seems to be easy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16621
  
    Start the review. Will remove this comment when the review is finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org