You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2016/05/24 00:41:52 UTC

[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/13270

    [SPARK-15269][SQL] Removes unexpected empty table directories created while creating external Spark SQL data sourcet tables.

    This PR is an alternative to #13120 authored by @xwu0226.
    
    ## What changes were proposed in this pull request?
    
    When creating an external Spark SQL data source table and persisting its metadata to Hive metastore, we don't use the standard Hive `Table.dataLocation` field because Hive only allows directory paths as data locations while Spark SQL also allows file paths. However, if we don't set `Table.dataLocation`, Hive always creates an unexpected empty table directory under database location, but doesn't remove it while dropping the table (because the table is external).
    
    This PR works around this issue by explicitly setting `Table.dataLocation` and then manullay removing the created directory after creating the external table.
    
    Please refer to [this JIRA comment][1] for more details about why we chose this approach as a workaround.
    
    [1]: https://issues.apache.org/jira/browse/SPARK-15269?focusedCommentId=15297408&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15297408
    
    ## How was this patch tested?
    
    1. A new test case is added in `HiveQuerySuite` for this case
    2. Updated `ShowCreateTableSuite` to use the same table name in all test cases. (This is how I hit this issue at the first place.)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-15269-unpleasant-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13270
    
----
commit f376332b3c6eb59ddc4a45e0843285374ceaaace
Author: Cheng Lian <li...@databricks.com>
Date:   2016-05-24T00:30:37Z

    Fixes SPARK-15269

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222603352
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221741286
  
    **[Test build #59311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59311/consoleFull)** for PR 13270 at commit [`db06f13`](https://github.com/apache/spark/commit/db06f13915ed0c416b3747ed90c98aba67518293).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222592978
  
    **[Test build #59635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59635/consoleFull)** for PR 13270 at commit [`336fb55`](https://github.com/apache/spark/commit/336fb55406ad19eb7cc7276cd771ebd92ed8dec1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221726339
  
    **[Test build #59295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59295/consoleFull)** for PR 13270 at commit [`64a0cfd`](https://github.com/apache/spark/commit/64a0cfdd4684adb31a618ee468f9f01a6c7c2e22).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13270: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/13270
  
    LGTM. @liancheng Can you merge this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221739839
  
    @xwu0226 For your [last comment][1] in your PR, I also realized that we are not really using `CatalogTable.storage.locationUri` for data source tables while doing this PR.
    
    This means we can set an arbitrary location URI to that field as long as this location:
    
    1. is either an existing directory, or
    1. a non-existing location, but can definitely be created as a new directory successfully by Hive
    
    This PR takes the 2nd approach. Namely make a temporary directory and remove it later. The reason is that, I think it can be dangerous to set existing directories as location URI. I was afraid of the fact that Hive may try to delete that directory because of bugs in either Hive side or Spark side. Actually I tried to set `/` (which definitely exists) as location URI when doing this PR, and Hive tries to delete my root directory during unit test execution... (But that was probably caused my own bugs though.)
    
    [1]: https://github.com/apache/spark/pull/13120#discussion_r64126341



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221144664
  
    **[Test build #59169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59169/consoleFull)** for PR 13270 at commit [`f376332`](https://github.com/apache/spark/commit/f376332b3c6eb59ddc4a45e0843285374ceaaace).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13270: [SPARK-15269][SQL] Removes unexpected empty table...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13270


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64327851
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -216,7 +216,25 @@ class SessionCatalog(
         val table = formatTableName(tableDefinition.identifier.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
    +      // !! HACK ALERT !!
    +      //
    +      // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to
    +      // set `locationUri` and then remove the directory after creating the external table:
    +      val tablePath = defaultTablePath(newTableDefinition.identifier)
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
    +          ignoreIfExists)
    +      } finally {
    +        val path = new Path(tablePath)
    +        FileSystem.get(path.toUri, hadoopConf).delete(path, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    Yeah, thanks! Will add a check for the first case. The second case should be the reason why Jenkins tests failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221136564
  
    **[Test build #59169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59169/consoleFull)** for PR 13270 at commit [`f376332`](https://github.com/apache/spark/commit/f376332b3c6eb59ddc4a45e0843285374ceaaace).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270
  
    **[Test build #59687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59687/consoleFull)** for PR 13270 at commit [`7d0122f`](https://github.com/apache/spark/commit/7d0122f9e59e83683b1539c9c8e2be3b2c6f5658).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65253802
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -68,12 +72,13 @@ private[spark] class HiveExternalCatalog(client: HiveClient) extends ExternalCat
           body
         } catch {
           case NonFatal(e) if isClientException(e) =>
    -        throw new AnalysisException(e.getClass.getCanonicalName + ": " + e.getMessage)
    +        throw new AnalysisException(
    +          e.getClass.getCanonicalName + ": " + e.getMessage, cause = Some(e))
         }
       }
     
       private def requireDbMatches(db: String, table: CatalogTable): Unit = {
    -    if (table.identifier.database != Some(db)) {
    +    if (!table.identifier.database.contains(db)) {
    --- End diff --
    
    Let's not change this because contains does not exist in scala 2.10.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64922718
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -212,11 +212,46 @@ class SessionCatalog(
        * If no such database is specified, create it in the current database.
        */
       def createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit = {
    -    val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
    -    val table = formatTableName(tableDefinition.identifier.table)
    +    val tableId = tableDefinition.identifier
    +    val db = formatDatabaseName(tableId.database.getOrElse(getCurrentDatabase))
    +    val table = formatTableName(tableId.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (
    +      // If this is an external data source table...
    +      tableDefinition.properties.contains("spark.sql.sources.provider") &&
    +      newTableDefinition.tableType == CatalogTableType.EXTERNAL &&
    +      // ... that is not persisted as Hive compatible format (external tables in Hive compatible
    +      // format always set `locationUri` to the actual data location and should NOT be hacked as
    +      // following.)
    +      tableDefinition.storage.locationUri.isEmpty
    +    ) {
    +      // !! HACK ALERT !!
    +      //
    +      // Due to a restriction of Hive metastore, here we have to set `locationUri` to a temporary
    +      // directory that doesn't exist yet but can definitely be successfully created, and then
    +      // delete it right after creating the external data source table. This location will be
    +      // persisted to Hive metastore as standard Hive table location URI, but Spark SQL doesn't
    +      // really use it. Also, since we only do this workaround for external tables, deleting the
    +      // directory after the fact doesn't do any harm.
    +      //
    +      // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 for more details.
    +
    +      val tempPath =
    +        new Path(defaultTablePath(tableId).stripSuffix(Path.SEPARATOR) + "-__PLACEHOLDER__")
    +
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tempPath.toString)),
    +          ignoreIfExists)
    +      } finally {
    +        FileSystem.get(tempPath.toUri, hadoopConf).delete(tempPath, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    I think Hive-specific details/hacks should not be exposed in `SessionCatalog`. Let's move it into `HiveExternalCatalog`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222591642
  
    **[Test build #59632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59632/consoleFull)** for PR 13270 at commit [`3830dbb`](https://github.com/apache/spark/commit/3830dbb646b0b076eb994ebaec1a14d8a8d502dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222592234
  
    **[Test build #59632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59632/consoleFull)** for PR 13270 at commit [`3830dbb`](https://github.com/apache/spark/commit/3830dbb646b0b076eb994ebaec1a14d8a8d502dd).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221754951
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222603353
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59635/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221741745
  
    @liancheng Thanks for the explanation!! It is safer this way. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221137330
  
    @liancheng sure. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221754832
  
    **[Test build #59313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59313/consoleFull)** for PR 13270 at commit [`1545f04`](https://github.com/apache/spark/commit/1545f043c146be8ec247e062a3dfdd66b75b8664).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222592243
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65263694
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -368,14 +371,27 @@ private[hive] class HiveClientImpl(
             createTime = h.getTTable.getCreateTime.toLong * 1000,
             lastAccessTime = h.getLastAccessTime.toLong * 1000,
             storage = CatalogStorageFormat(
    -          locationUri = shim.getDataLocation(h),
    +          locationUri = shim.getDataLocation(h).filterNot { _ =>
    +            // SPARK-15269: Persisted data source tables always store the location URI as a SerDe
    +            // property named "path" instead of standard Hive `dataLocation`, because Hive only
    +            // allows directory paths as location URIs while Spark SQL data source tables also
    +            // allows file paths. So the standard Hive `dataLocation` is meaningless for Spark SQL
    +            // data source tables.
    +            DDLUtils.isDatasourceTable(properties) &&
    +              h.getTableType == HiveTableType.EXTERNAL_TABLE &&
    +              // Spark SQL may also save external data source in Hive compatible format when
    +              // possible, so that these tables can be directly accessed by Hive. For these tables,
    +              // `dataLocation` is still necessary. Here we also check for input format class
    +              // because only these Hive compatible tables set this field.
    +              h.getInputFormatClass == null
    +          },
    --- End diff --
    
    Why do we need this check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270
  
    **[Test build #59687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59687/consoleFull)** for PR 13270 at commit [`7d0122f`](https://github.com/apache/spark/commit/7d0122f9e59e83683b1539c9c8e2be3b2c6f5658).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222603171
  
    **[Test build #59635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59635/consoleFull)** for PR 13270 at commit [`336fb55`](https://github.com/apache/spark/commit/336fb55406ad19eb7cc7276cd771ebd92ed8dec1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59687/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221137189
  
    @xwu0226 Would you please help review this one? It's based on our discussion in your PR (#13120). The benefit of this version is that it avoids the bad case mentioned in [this comment][1].
    
    [1]: https://github.com/apache/spark/pull/13120#discussion_r63869408


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221428304
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221428174
  
    **[Test build #59224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59224/consoleFull)** for PR 13270 at commit [`7c77dc1`](https://github.com/apache/spark/commit/7c77dc12db47c2deddabe1365a325d1d5dc61fe6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221456714
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64488208
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -385,7 +385,10 @@ object CreateDataSourceTableUtils extends Logging {
             schema = relation.schema.map { f =>
               CatalogColumn(f.name, f.dataType.catalogString)
             },
    -        properties = tableProperties.toMap,
    +        // Removes the provider property since we are gonna saving this table as a Hive compatible
    +        // one, and other places use this property to check whether a table is a data source table
    +        // (e.g. `DDLUtils.isDatasourceTable`).
    +        properties = (tableProperties - "spark.sql.sources.provider").toMap,
    --- End diff --
    
    I see. So the hive compatible datasource table will not be resolved as datasource relation anymore. Correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221742860
  
    **[Test build #59313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59313/consoleFull)** for PR 13270 at commit [`1545f04`](https://github.com/apache/spark/commit/1545f043c146be8ec247e062a3dfdd66b75b8664).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65271869
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -368,14 +371,27 @@ private[hive] class HiveClientImpl(
             createTime = h.getTTable.getCreateTime.toLong * 1000,
             lastAccessTime = h.getLastAccessTime.toLong * 1000,
             storage = CatalogStorageFormat(
    -          locationUri = shim.getDataLocation(h),
    +          locationUri = shim.getDataLocation(h).filterNot { _ =>
    +            // SPARK-15269: Persisted data source tables always store the location URI as a SerDe
    +            // property named "path" instead of standard Hive `dataLocation`, because Hive only
    +            // allows directory paths as location URIs while Spark SQL data source tables also
    +            // allows file paths. So the standard Hive `dataLocation` is meaningless for Spark SQL
    +            // data source tables.
    +            DDLUtils.isDatasourceTable(properties) &&
    +              h.getTableType == HiveTableType.EXTERNAL_TABLE &&
    +              // Spark SQL may also save external data source in Hive compatible format when
    +              // possible, so that these tables can be directly accessed by Hive. For these tables,
    +              // `dataLocation` is still necessary. Here we also check for input format class
    +              // because only these Hive compatible tables set this field.
    +              h.getInputFormatClass == null
    +          },
    --- End diff --
    
    Technically, it doesn't harm to keep this field since Spark SQL doesn't use it. But this placeholder location URI doesn't make sense anywhere, and it can be error prone to keep it in the `locationUri` field since future maintainer may think it is the real data location.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64500603
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -216,7 +216,25 @@ class SessionCatalog(
         val table = formatTableName(tableDefinition.identifier.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
    +      // !! HACK ALERT !!
    +      //
    +      // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to
    +      // set `locationUri` and then remove the directory after creating the external table:
    +      val tablePath = defaultTablePath(newTableDefinition.identifier)
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
    +          ignoreIfExists)
    +      } finally {
    +        val path = new Path(tablePath)
    +        FileSystem.get(path.toUri, hadoopConf).delete(path, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    Correct. Because these tables are in standard Hive format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221738175
  
    @liancheng I see. Thank you!! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221726486
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64500677
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -216,7 +216,25 @@ class SessionCatalog(
         val table = formatTableName(tableDefinition.identifier.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
    +      // !! HACK ALERT !!
    +      //
    +      // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to
    +      // set `locationUri` and then remove the directory after creating the external table:
    +      val tablePath = defaultTablePath(newTableDefinition.identifier)
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
    +          ignoreIfExists)
    +      } finally {
    +        val path = new Path(tablePath)
    +        FileSystem.get(path.toUri, hadoopConf).delete(path, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    Actually it's `<warehouse-dir>/<table-name>-__PLACEHOLDER__`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65270931
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -68,12 +72,13 @@ private[spark] class HiveExternalCatalog(client: HiveClient) extends ExternalCat
           body
         } catch {
           case NonFatal(e) if isClientException(e) =>
    -        throw new AnalysisException(e.getClass.getCanonicalName + ": " + e.getMessage)
    +        throw new AnalysisException(
    +          e.getClass.getCanonicalName + ": " + e.getMessage, cause = Some(e))
         }
       }
     
       private def requireDbMatches(db: String, table: CatalogTable): Unit = {
    -    if (table.identifier.database != Some(db)) {
    +    if (!table.identifier.database.contains(db)) {
    --- End diff --
    
    Oh... I should disable this check in IDEA then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221756444
  
    cc @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221737110
  
    @xwu0226 I should probably add a comment to explain the `locationUri.isEmpty` thing since it's not quite intuitive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64674180
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala ---
    @@ -28,11 +28,11 @@ class ShowCreateTableSuite extends QueryTest with SQLTestUtils with TestHiveSing
       import testImplicits._
     
       test("data source table with user specified schema") {
    -    withTable("ddl_test1") {
    +    withTable("ddl_test") {
    --- End diff --
    
    I firstly noticed this bug while writing tests in this test suite. I found that test cases always fail if I use the same table names in multiple test cases. That's why I made the changes to this file as additional tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221753409
  
    **[Test build #59311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59311/consoleFull)** for PR 13270 at commit [`db06f13`](https://github.com/apache/spark/commit/db06f13915ed0c416b3747ed90c98aba67518293).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64645557
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -68,7 +69,8 @@ private[spark] class HiveExternalCatalog(client: HiveClient) extends ExternalCat
           body
         } catch {
           case NonFatal(e) if isClientException(e) =>
    -        throw new AnalysisException(e.getClass.getCanonicalName + ": " + e.getMessage)
    +        throw new AnalysisException(
    +          e.getClass.getCanonicalName + ": " + e.getMessage, cause = Some(e))
    --- End diff --
    
    Preserve the original exception so that we can see Hive internal stack trace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221144779
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-222592244
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59632/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64318562
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -216,7 +216,25 @@ class SessionCatalog(
         val table = formatTableName(tableDefinition.identifier.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
    +      // !! HACK ALERT !!
    +      //
    +      // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to
    +      // set `locationUri` and then remove the directory after creating the external table:
    +      val tablePath = defaultTablePath(newTableDefinition.identifier)
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
    +          ignoreIfExists)
    +      } finally {
    +        val path = new Path(tablePath)
    +        FileSystem.get(path.toUri, hadoopConf).delete(path, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    I am wondering if we should worry about the case where the `defaultTablePath` happens to be the same as the user-specified path for creating external table that contains real data. It may delete the external table data. For example:
    ```
    create table t10 (c1 int) using parquet options(path '/Users/xinwu/spark/spark-warehouse/t10');
    insert into t10 values (1);
    drop table t10;
    create table t10 (c1 int) using parquet options(path '/Users/xinwu/spark/spark-warehouse/t10');
    ```
    In the above case, my metastore warehouse dir is `/Users/xinwu/spark/spark-warehouse', so the `defaultTablePath` will return `'/Users/xinwu/spark/spark-warehouse/t10'`.  Now, upon the creation of the 2nd table, the data path will be deleted, right? Maybe this is a corner case that we may not worry about. but I thought I should bring it up. 
    
    Another observation is that in a hive-compatible case (above case), `createDataSourceTabes` set the `locationURI` with the user specified path, but will be overridden by the above code. Then, users will not be able to query anything back from hive shell, unless users don't expect to see same results from hive shell for hive-compatible tables. I am not sure about the semantic of hive-compatible datasource table. Will this be a problem? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221456715
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59246/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221722759
  
    @liancheng Thanks! I see what you mean for the code you handle the hive compatible tables. This will handle the table lookup time.. But for [creating table ](https://github.com/apache/spark/blob/193e0059be83006ddcce30a56c2b6ae09a2cac31/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L221-L248), we still possibly wrongly set the `locationURI` with `<warehouse.dir>/tableName/__PLACEHOLDER__` for hive-compatible table. such that querying from Hive shell will not return any results, right? 
    This is one reason why I just wanted to tough the code path for creating non-hive compatible datasource table in [createDataSourceTables.scala -> CreateDataSourceTableUtils.createDataSourceTable](https://github.com/xwu0226/spark/blob/1c903d21e4e341b8e54f4a95c0c77e2b33c79297/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L353-L379) in my PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221726488
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59295/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221715803
  
    @xwu0226 Thanks. Handled Hive compatible tables [here][1]. `CatalogTable` should be implementation agnostic, so it's infeasible to add a Hive specific flag to `CatalogTable`.
    
    [1]: https://github.com/apache/spark/pull/13270/files#diff-6fd847124f8eae45ba2de1cf7d6296feR382


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64485252
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -216,7 +216,25 @@ class SessionCatalog(
         val table = formatTableName(tableDefinition.identifier.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
    +      // !! HACK ALERT !!
    +      //
    +      // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to
    +      // set `locationUri` and then remove the directory after creating the external table:
    +      val tablePath = defaultTablePath(newTableDefinition.identifier)
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
    +          ignoreIfExists)
    +      } finally {
    +        val path = new Path(tablePath)
    +        FileSystem.get(path.toUri, hadoopConf).delete(path, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    So hive metastore will create the dummy location `<metastore warehouse dir>/__PLACEHOLDER__ ` when the data source table is created and will be removed right away. And for hive compatible case, the `locationURI` value set by `createDataSourceTables.newHiveCompatibleMetastoreTable` will not be overridden with `__PLACEHOLDER__`, correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221144781
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59169/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221754953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59313/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221456607
  
    **[Test build #59246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59246/consoleFull)** for PR 13270 at commit [`193e005`](https://github.com/apache/spark/commit/193e0059be83006ddcce30a56c2b6ae09a2cac31).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65271233
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -147,7 +152,41 @@ private[spark] class HiveExternalCatalog(client: HiveClient) extends ExternalCat
           ignoreIfExists: Boolean): Unit = withClient {
         requireDbExists(db)
         requireDbMatches(db, tableDefinition)
    -    client.createTable(tableDefinition, ignoreIfExists)
    +
    +    if (
    +    // If this is an external data source table...
    +      tableDefinition.properties.contains("spark.sql.sources.provider") &&
    +        tableDefinition.tableType == CatalogTableType.EXTERNAL &&
    +        // ... that is not persisted as Hive compatible format (external tables in Hive compatible
    +        // format always set `locationUri` to the actual data location and should NOT be hacked as
    +        // following.)
    +        tableDefinition.storage.locationUri.isEmpty
    +    ) {
    +      // !! HACK ALERT !!
    +      //
    +      // Due to a restriction of Hive metastore, here we have to set `locationUri` to a temporary
    +      // directory that doesn't exist yet but can definitely be successfully created, and then
    +      // delete it right after creating the external data source table. This location will be
    +      // persisted to Hive metastore as standard Hive table location URI, but Spark SQL doesn't
    +      // really use it. Also, since we only do this workaround for external tables, deleting the
    +      // directory after the fact doesn't do any harm.
    +      //
    +      // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 for more details.
    +      val tempPath = {
    +        val dbLocation = getDatabase(tableDefinition.database).locationUri
    +        new Path(dbLocation, tableDefinition.identifier.table + "-__PLACEHOLDER__")
    +      }
    +
    +      try {
    +        client.createTable(
    +          tableDefinition.withNewStorage(locationUri = Some(tempPath.toString)),
    --- End diff --
    
    Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64482317
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -216,7 +216,25 @@ class SessionCatalog(
         val table = formatTableName(tableDefinition.identifier.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
    +      // !! HACK ALERT !!
    +      //
    +      // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to
    +      // set `locationUri` and then remove the directory after creating the external table:
    +      val tablePath = defaultTablePath(newTableDefinition.identifier)
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
    +          ignoreIfExists)
    +      } finally {
    +        val path = new Path(tablePath)
    +        FileSystem.get(path.toUri, hadoopConf).delete(path, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    Added a `__PLACEHOLDER__` suffix to the dummy table location path to fix case one. And made sure that we handle Hive compatible tables properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65271411
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -368,14 +371,27 @@ private[hive] class HiveClientImpl(
             createTime = h.getTTable.getCreateTime.toLong * 1000,
             lastAccessTime = h.getLastAccessTime.toLong * 1000,
             storage = CatalogStorageFormat(
    -          locationUri = shim.getDataLocation(h),
    +          locationUri = shim.getDataLocation(h).filterNot { _ =>
    +            // SPARK-15269: Persisted data source tables always store the location URI as a SerDe
    +            // property named "path" instead of standard Hive `dataLocation`, because Hive only
    +            // allows directory paths as location URIs while Spark SQL data source tables also
    +            // allows file paths. So the standard Hive `dataLocation` is meaningless for Spark SQL
    +            // data source tables.
    +            DDLUtils.isDatasourceTable(properties) &&
    +              h.getTableType == HiveTableType.EXTERNAL_TABLE &&
    +              // Spark SQL may also save external data source in Hive compatible format when
    +              // possible, so that these tables can be directly accessed by Hive. For these tables,
    +              // `dataLocation` is still necessary. Here we also check for input format class
    +              // because only these Hive compatible tables set this field.
    +              h.getInputFormatClass == null
    +          },
    --- End diff --
    
    Because we have to store the placeholder location URI into metastore for external data source tables, and I'd like to avoid exposing it to user space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221710446
  
    **[Test build #59295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59295/consoleFull)** for PR 13270 at commit [`64a0cfd`](https://github.com/apache/spark/commit/64a0cfdd4684adb31a618ee468f9f01a6c7c2e22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64503026
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -385,7 +385,10 @@ object CreateDataSourceTableUtils extends Logging {
             schema = relation.schema.map { f =>
               CatalogColumn(f.name, f.dataType.catalogString)
             },
    -        properties = tableProperties.toMap,
    +        // Removes the provider property since we are gonna saving this table as a Hive compatible
    +        // one, and other places use this property to check whether a table is a data source table
    +        // (e.g. `DDLUtils.isDatasourceTable`).
    +        properties = (tableProperties - "spark.sql.sources.provider").toMap,
    --- End diff --
    
    Unfortunately this doesn't work, because we still want to recognize this table as data source table when saving data into the table using `CreateDataSourceTableAsSelectCommand`. My [last commit][1] tries to fix this by checking `.storage.locationUri.isEmpty`. This should work because data source tables never set this field.
    
    https://github.com/apache/spark/pull/13270/commits/193e0059be83006ddcce30a56c2b6ae09a2cac31


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221428305
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59224/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65263898
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -147,7 +152,41 @@ private[spark] class HiveExternalCatalog(client: HiveClient) extends ExternalCat
           ignoreIfExists: Boolean): Unit = withClient {
         requireDbExists(db)
         requireDbMatches(db, tableDefinition)
    -    client.createTable(tableDefinition, ignoreIfExists)
    +
    +    if (
    +    // If this is an external data source table...
    +      tableDefinition.properties.contains("spark.sql.sources.provider") &&
    +        tableDefinition.tableType == CatalogTableType.EXTERNAL &&
    +        // ... that is not persisted as Hive compatible format (external tables in Hive compatible
    +        // format always set `locationUri` to the actual data location and should NOT be hacked as
    +        // following.)
    +        tableDefinition.storage.locationUri.isEmpty
    +    ) {
    +      // !! HACK ALERT !!
    +      //
    +      // Due to a restriction of Hive metastore, here we have to set `locationUri` to a temporary
    +      // directory that doesn't exist yet but can definitely be successfully created, and then
    +      // delete it right after creating the external data source table. This location will be
    +      // persisted to Hive metastore as standard Hive table location URI, but Spark SQL doesn't
    +      // really use it. Also, since we only do this workaround for external tables, deleting the
    +      // directory after the fact doesn't do any harm.
    +      //
    +      // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 for more details.
    +      val tempPath = {
    +        val dbLocation = getDatabase(tableDefinition.database).locationUri
    +        new Path(dbLocation, tableDefinition.identifier.table + "-__PLACEHOLDER__")
    +      }
    +
    +      try {
    +        client.createTable(
    +          tableDefinition.withNewStorage(locationUri = Some(tempPath.toString)),
    --- End diff --
    
    So, this location is stored in the metastore?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221415230
  
    **[Test build #59224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59224/consoleFull)** for PR 13270 at commit [`7c77dc1`](https://github.com/apache/spark/commit/7c77dc12db47c2deddabe1365a325d1d5dc61fe6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221753531
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59311/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65123201
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -212,11 +212,46 @@ class SessionCatalog(
        * If no such database is specified, create it in the current database.
        */
       def createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit = {
    -    val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
    -    val table = formatTableName(tableDefinition.identifier.table)
    +    val tableId = tableDefinition.identifier
    +    val db = formatDatabaseName(tableId.database.getOrElse(getCurrentDatabase))
    +    val table = formatTableName(tableId.table)
         val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
         requireDbExists(db)
    -    externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +
    +    if (
    +      // If this is an external data source table...
    +      tableDefinition.properties.contains("spark.sql.sources.provider") &&
    +      newTableDefinition.tableType == CatalogTableType.EXTERNAL &&
    +      // ... that is not persisted as Hive compatible format (external tables in Hive compatible
    +      // format always set `locationUri` to the actual data location and should NOT be hacked as
    +      // following.)
    +      tableDefinition.storage.locationUri.isEmpty
    +    ) {
    +      // !! HACK ALERT !!
    +      //
    +      // Due to a restriction of Hive metastore, here we have to set `locationUri` to a temporary
    +      // directory that doesn't exist yet but can definitely be successfully created, and then
    +      // delete it right after creating the external data source table. This location will be
    +      // persisted to Hive metastore as standard Hive table location URI, but Spark SQL doesn't
    +      // really use it. Also, since we only do this workaround for external tables, deleting the
    +      // directory after the fact doesn't do any harm.
    +      //
    +      // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 for more details.
    +
    +      val tempPath =
    +        new Path(defaultTablePath(tableId).stripSuffix(Path.SEPARATOR) + "-__PLACEHOLDER__")
    +
    +      try {
    +        externalCatalog.createTable(
    +          db,
    +          newTableDefinition.withNewStorage(locationUri = Some(tempPath.toString)),
    +          ignoreIfExists)
    +      } finally {
    +        FileSystem.get(tempPath.toUri, hadoopConf).delete(tempPath, true)
    +      }
    +    } else {
    +      externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    Added these changes here mostly because `HiveExternalCatalog` doesn't have access to the Hadoop configuration, which is used to instantiate the `FileSystem` instance. Added an extra constructor argument to `HiveExternalCatalog` and moved this change there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221448188
  
    **[Test build #59246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59246/consoleFull)** for PR 13270 at commit [`193e005`](https://github.com/apache/spark/commit/193e0059be83006ddcce30a56c2b6ae09a2cac31).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221736990
  
    @xwu0226 No we shouldn't have problem with Hive compatible tables now since we only add the placeholder location URI when `table.storage.locationUri` is empty (see [here][1]), while Hive compatible tables always set this field.
    
    (BTW, the placeholder path is `<warehouse-dir>/<tableName>-__PLACEHOLDER__`. String `__PLACEHOLDER__` is part of the directory name rather than name of a sub-directory.)
    
    [1]: https://github.com/apache/spark/pull/13270/files#diff-b3f9800839b9b9a1df9da9cbfc01adf8R224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r64500584
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -385,7 +385,10 @@ object CreateDataSourceTableUtils extends Logging {
             schema = relation.schema.map { f =>
               CatalogColumn(f.name, f.dataType.catalogString)
             },
    -        properties = tableProperties.toMap,
    +        // Removes the provider property since we are gonna saving this table as a Hive compatible
    +        // one, and other places use this property to check whether a table is a data source table
    +        // (e.g. `DDLUtils.isDatasourceTable`).
    +        properties = (tableProperties - "spark.sql.sources.provider").toMap,
    --- End diff --
    
    Correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13270: [SPARK-15269][SQL] Removes unexpected empty table direct...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/13270
  
    Merging to master and branch-2.0.
    
    @xwu0226 @yhuai Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13270#issuecomment-221753528
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org