You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by windpiger <gi...@git.apache.org> on 2017/03/03 10:03:48 UTC

[GitHub] spark pull request #17149: [SPARK-19257][SQL][WIP]location for table/partiti...

GitHub user windpiger opened a pull request:

    https://github.com/apache/spark/pull/17149

    [SPARK-19257][SQL][WIP]location for table/partition/database should be java.net.URI

    ## What changes were proposed in this pull request?
    
    Currently we treat the location of table/partition/database as URI string.
    
    It will be safer if we can make the type of location as java.net.URI. 
    
    In this PR, there are following classes changes:
    **1. CatalogDatabase**
    ```
    case class CatalogDatabase(
        name: String,
        description: String,
        locationUri: String,
        properties: Map[String, String])
    --->
    case class CatalogDatabase(
        name: String,
        description: String,
        locationUri: URI,
        properties: Map[String, String])
    ```
    **2. CatalogStorageFormat**
    ```
    case class CatalogStorageFormat(
        locationUri: Option[String],
        inputFormat: Option[String],
        outputFormat: Option[String],
        serde: Option[String],
        compressed: Boolean,
        properties: Map[String, String])
    ---->
    case class CatalogStorageFormat(
        locationUri: Option[URI],
        inputFormat: Option[String],
        outputFormat: Option[String],
        serde: Option[String],
        compressed: Boolean,
        properties: Map[String, String])
    ```
    
    Before and After this PR, it is transparent for user, there is no change that the user should concern. The `String` to `URI` just happened in SparkSQL internally.
    
    Here list some operation related location:
    **1. whitespace in the location**
    
    
    **2. colon(:) in the location**
    
    
    **3. percent sign(%) in the location**
    
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark changeStringToURI

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17149
    
----
commit 38436e84d4647d545e1ff44a5012de6d161c2af4
Author: windpiger <so...@outlook.com>
Date:   2017-03-03T07:23:47Z

    convert to URI

commit 69a16464dc179094af89cdc3077e279fe4656959
Author: windpiger <so...@outlook.com>
Date:   2017-03-03T09:46:34Z

    fix some sytle

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #74000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74000/testReport)** for PR 17149 at commit [`681db88`](https://github.com/apache/spark/commit/681db88b90d71c2302bfa87dcaa6140afd1c8f40).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    @cnauroth Thank you so much for your help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109892051
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    I see, it seems the problem itself seems existing before.
    
    I found this while running related tests on Windows which it looks related with this PR (special character stuff) so I think this is a proper JIRA to make a followup with and PR to note this.
    
    It seems we do have tests that use URIs already whether it was mistakenly supported or not. Should we ask it to dev-mailing list? I think this is an important decision.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73841/testReport)** for PR 17149 at commit [`890327a`](https://github.com/apache/spark/commit/890327a8ce4cb6629261750b5003f8643920c5e7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73841/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104485607
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
    --- End diff --
    
    nit: `xx.map(CatalogUtils.stringToURI)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    (I think I should cc @srowen too FYI because he reviewed all my PRs fixing the tests on Windows. The point here seems to me okay to use URIs in general.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104359591
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -146,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
         try {
           sql(s"CREATE DATABASE $dbName")
           val db1 = catalog.getDatabaseMetadata(dbName)
    -      val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db")
    +      val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri
    --- End diff --
    
    I just implement this function with a option[String], I think for a String without a option, it is already simple, we still to implement one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r110420176
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    aah, paths, especially on windows make me weep. Note also lots of fun related to encoded secrets in s3n/s3a/s3 urls (HADOOP-3733 and the associated regression HADOOP-14114)
    
    Because of those patches, and the fact that if you want to embed a / in an S3A URI you *must* escape it, I don't know what to say here.
    
    If you can come up with a simple set of questions to ask someone who doesn't know this code but could understand Path's logic, I could pass them on to some people I think may know the answer. Maybe @Cnauroth might have some insight on what windows gets up to
    
    BTW, Path in Hadoop 2.9 is Serializable. you can pass them down to closures working on RDDs, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73840/testReport)** for PR 17149 at commit [`69a1646`](https://github.com/apache/spark/commit/69a16464dc179094af89cdc3077e279fe4656959).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104357644
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -162,6 +164,28 @@ object CatalogUtils {
         BucketSpec(numBuckets, normalizedBucketCols, normalizedSortCols)
       }
     
    +  /**
    +   * Convert URI to String.
    +   * Since URI.toString does not decode for the uri string, we need to use
    +   * Path(uri).toString to decode it.
    +   * @param uri the URI of the path
    +   * @return the String of the path
    +   */
    +  def URIToString(uri: Option[URI]): Option[String] = {
    +    uri.map(new Path(_).toString)
    --- End diff --
    
    can you try `java.net.URLDecoder`? I'd like to avoid hadoop dependency on this kind of basic functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #74012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74012/testReport)** for PR 17149 at commit [`5b423f5`](https://github.com/apache/spark/commit/5b423f56b69a11b44cbc8ce6bf1e8821619ebf79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74004/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104260081
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -397,7 +398,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
         val tableDesc = CatalogTable(
           identifier = table,
           tableType = tableType,
    -      storage = storage.copy(locationUri = customLocation),
    +      storage = storage.copy(locationUri = customLocation.map{ loc =>
    --- End diff --
    
    nit:
    ```
    storage.copy(
      locationUri = xxx)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73979/testReport)** for PR 17149 at commit [`b6bc466`](https://github.com/apache/spark/commit/b6bc4668f4b9b4cd0161ec58e9dcd54527bed0b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109852878
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    how could the input path be a URI string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109827145
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    > I did not replace it when I identified this because some existing tests in org.apache.spark.util.UtilsSuite were broken but I guess it would be fine if there is a coherent reason. These broken cases might be bugs.
    
    FYI.. cc @sarutak 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73842/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104357082
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -162,6 +164,28 @@ object CatalogUtils {
         BucketSpec(numBuckets, normalizedBucketCols, normalizedSortCols)
       }
     
    +  /**
    +   * Convert URI to String.
    +   * Since URI.toString does not decode for the uri string, we need to use
    --- End diff --
    
    `Since URI.toString does not decode the uri string, here we create a hadoop Path with the given URI, ad rely on hadoop Path.toString to decode the uri`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r110114252
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    Actually, I did a bit of search more about what `org.apache.hadoop.fs.Path` expects. It seems the [documentation](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/Path.html) says:
    
    > Construct a path from a String. Path strings are URIs, but with unescaped elements and some additional normalization.
    
    It seems those strings are expected to be unescaped. So, it seems we support URI with unescaped characters, which is inherited from `Path`.
    
    I want to be sure on this because I have fixed many tests to use URIs to pass on Windows and I am about to fix them further in this way. @steveloughran, do you mind if I cc you and ask to take a look and help to confirm this please? I know no one who knows better about Hadoop.
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109827042
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    It seems this has a double-de/encoding problem when the input path is an URI.
    
    ```scala
    scala> new org.apache.hadoop.fs.Path(new java.io.File("a b").toURI.toString).toUri.toString
    res1: String = file:/.../a%2520b
    ```
    
    A space character in URI is encoded as `%20` and `%` character is encoded as `%25`. A URI already has a %20 in it, and gets urlencoded again, from `%20` to `%2520`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73995/testReport)** for PR 17149 at commit [`abfb6f5`](https://github.com/apache/spark/commit/abfb6f5431ae1fb9bfe073f4acdf65ed870f7fa8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358840
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -66,7 +66,7 @@ case class CreateDatabaseCommand(
           CatalogDatabase(
             databaseName,
             comment.getOrElse(""),
    -        path.getOrElse(catalog.getDefaultDBPath(databaseName)),
    +        new Path(path.getOrElse(catalog.getDefaultDBPath(databaseName))).toUri,
    --- End diff --
    
    `getDefaultDBPath` should return URI now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r110560960
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    I think I have two questions as below:
    
    - Is it okay to use both URIs and local file paths for the input string for `org.apache.hadoop.fs.Path` in general (when they are expected to be unescaped)?
    
    - What `org.apache.hadoop.fs.Path` expects for input string? It seems URI strings but with unescaped characters per https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/Path.html and there are special cases that requires escaped characters such as s3n/s3a/s3 urls. Is my understanding correct?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358810
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -146,7 +146,7 @@ case class DescribeDatabaseCommand(
         val result =
           Row("Database Name", dbMetadata.name) ::
             Row("Description", dbMetadata.description) ::
    -        Row("Location", dbMetadata.locationUri) :: Nil
    +        Row("Location", new Path(dbMetadata.locationUri).toString) :: Nil
    --- End diff --
    
    use `CatalogUtil.URIToString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17149


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73855/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259181
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -170,7 +170,7 @@ class SessionCatalog(
               "you cannot create a database with this name.")
         }
         validateName(dbName)
    -    val qualifiedPath = makeQualifiedPath(dbDefinition.locationUri).toString
    +    val qualifiedPath = makeQualifiedPath(new Path(dbDefinition.locationUri).toString).toUri
    --- End diff --
    
    `makeQualifiedPath` should accept `URI` now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #74012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74012/testReport)** for PR 17149 at commit [`5b423f5`](https://github.com/apache/spark/commit/5b423f56b69a11b44cbc8ce6bf1e8821619ebf79).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104359108
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -169,7 +170,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
               val dbNameWithoutBackTicks = cleanIdentifier(dbName)
               sql(s"CREATE DATABASE $dbName Location '$path'")
               val db1 = catalog.getDatabaseMetadata(dbNameWithoutBackTicks)
    -          val expPath = makeQualifiedPath(tmpDir.toString)
    +          val expPath = new Path(makeQualifiedPath(tmpDir.toString)).toUri
    --- End diff --
    
    StringToURI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109859539
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    It's ambiguous to support both, what if users do wanna create a path `/tmp/%25`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358914
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -953,7 +955,7 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman
             // when the table creation DDL contains the PATH option.
             None
           } else {
    -        Some(s"path '${escapeSingleQuotedString(location)}'")
    +        Some(s"path '${escapeSingleQuotedString(new Path(location).toString)}'")
    --- End diff --
    
    use `URIToString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104361050
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala ---
    @@ -162,6 +164,28 @@ object CatalogUtils {
         BucketSpec(numBuckets, normalizedBucketCols, normalizedSortCols)
       }
     
    +  /**
    +   * Convert URI to String.
    +   * Since URI.toString does not decode for the uri string, we need to use
    +   * Path(uri).toString to decode it.
    +   * @param uri the URI of the path
    +   * @return the String of the path
    +   */
    +  def URIToString(uri: Option[URI]): Option[String] = {
    +    uri.map(new Path(_).toString)
    --- End diff --
    
    what about StringToURI? it still import Path


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259986
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala ---
    @@ -17,6 +17,7 @@
     
     package org.apache.spark.sql.catalog
     
    +import java.net.URI
    --- End diff --
    
    unnecessary import?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73977/testReport)** for PR 17149 at commit [`dc0a37b`](https://github.com/apache/spark/commit/dc0a37b8860ff41fbb7511522d8ceff43965ddd8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73855/testReport)** for PR 17149 at commit [`f2b9bd8`](https://github.com/apache/spark/commit/f2b9bd88398358fcc658cd98f54b6808a56c84b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73841/testReport)** for PR 17149 at commit [`890327a`](https://github.com/apache/spark/commit/890327a8ce4cb6629261750b5003f8643920c5e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #74004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74004/testReport)** for PR 17149 at commit [`5b92620`](https://github.com/apache/spark/commit/5b92620d66a1863e92690a28a036597856287fa9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73996/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358967
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
    @@ -77,7 +79,7 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
         new Database(
           name = metadata.name,
           description = metadata.description,
    -      locationUri = metadata.locationUri)
    +      locationUri = new Path(metadata.locationUri).toString)
    --- End diff --
    
    URIToString?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259863
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -883,7 +885,7 @@ abstract class CatalogTestUtils {
     
       def newFunc(): CatalogFunction = newFunc("funcName")
     
    -  def newUriForDatabase(): String = Utils.createTempDir().toURI.toString.stripSuffix("/")
    +  def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/"))
    --- End diff --
    
    we can simply it and write `Utils.createTempDir().toURI`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109866897
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    > how could the input path be a URI string?
    How about external table on S3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259448
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -340,8 +342,8 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
           "db1",
           "tbl",
           Map("partCol1" -> "1", "partCol2" -> "2")).location
    -    val tableLocation = catalog.getTable("db1", "tbl").location
    -    val defaultPartitionLocation = new Path(new Path(tableLocation, "partCol1=1"), "partCol2=2")
    +    val tableLocationPath = new Path(catalog.getTable("db1", "tbl").location)
    --- End diff --
    
    `tableLocationPath` sound weird, let's keep the previous name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73995/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74012/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73976/testReport)** for PR 17149 at commit [`109e2b5`](https://github.com/apache/spark/commit/109e2b5641dc91157301563134e17670be83341a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104261105
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -397,7 +398,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
         val tableDesc = CatalogTable(
           identifier = table,
           tableType = tableType,
    -      storage = storage.copy(locationUri = customLocation),
    +      storage = storage.copy(locationUri = customLocation.map{ loc =>
    --- End diff --
    
    this pattern appears many times in this PR, how about we create a util method `toURI` in `CatalogUtils` and add comments there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109826847
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
    --- End diff --
    
    It seems this has a double-de/encoding problem when the input path is an URI.
    
    ```scala
    scala> new org.apache.hadoop.fs.Path(new java.io.File("a b").toURI.toString).toUri.toString
    res1: String = file:/.../a%2520b
    ```
    
    A space character in URI is encoded as `%20` and `%` character is encoded as `%25`. A URI already has a %20 in it, and gets urlencoded again, from `%20` to `%2520`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73840/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109888297
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    @HyukjinKwon previously we also encode the table location string, I think we never supported uri path string before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73976/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r110556382
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    Thank you so much @steveloughran. Let me try to produce a simple set of questions purely about `org.apache.hadoop.fs.Path`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73855/testReport)** for PR 17149 at commit [`f2b9bd8`](https://github.com/apache/spark/commit/f2b9bd88398358fcc658cd98f54b6808a56c84b4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358279
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -883,7 +885,7 @@ abstract class CatalogTestUtils {
     
       def newFunc(): CatalogFunction = newFunc("funcName")
     
    -  def newUriForDatabase(): String = Utils.createTempDir().toURI.toString.stripSuffix("/")
    +  def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/"))
    --- End diff --
    
    @cloud-fan  after I try to use Path to Compare , I think stripSuffix here is the simple way.
    
    Path with a `String` type constructor will be equal when one has a `/`, and another does not.
    ```
    scala> val x = new Path("/ab/c/")
    x: org.apache.hadoop.fs.Path = /ab/c
    
    scala> val y = new Path("/ab/c")
    y: org.apache.hadoop.fs.Path = /ab/c
    
    scala> x == y
    res0: Boolean = true
    ```
    
    Path with a `URI` type constructor will be not equal when one has a `/`, and another does not.
    ```
    scala> val x =new URI("/a/b/c/")
    x: java.net.URI = /a/b/c/
    
    scala> val y =new URI("/a/b/c")
    y: java.net.URI = /a/b/c
    
    scala> x == y
    res1: Boolean = false
    
    scala> val x1 =new Path(x)
    x1: org.apache.hadoop.fs.Path = /a/b/c/
    
    scala> val y1 =new Path(y)
    y1: org.apache.hadoop.fs.Path = /a/b/c
    
    scala> x1 == y1
    res2: Boolean = false
    ```
    
    So just the Path with `String` type can ignore the suffix `/`, then if we have a `URI` in hand, and we want to compare with another `URI`, we should  first transform them to `String` , and use this String to constructor a Path, after this two actions, we can compare them with ignore the suffix `/`.
    
    But I think it is more complicate, here we normalize the URI with stripSuffix from the Orignal then compare two URI directly, it is more simple.
    
    should we must to convert it to Path to compare?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104359085
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -72,14 +73,14 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
     
       private def createDatabase(catalog: SessionCatalog, name: String): Unit = {
         catalog.createDatabase(
    -      CatalogDatabase(name, "", spark.sessionState.conf.warehousePath, Map()),
    +      CatalogDatabase(name, "", new Path(spark.sessionState.conf.warehousePath).toUri, Map()),
           ignoreIfExists = false)
       }
     
       private def generateTable(catalog: SessionCatalog, name: TableIdentifier): CatalogTable = {
         val storage =
           CatalogStorageFormat(
    -        locationUri = Some(catalog.defaultTablePath(name)),
    +        locationUri = Some(new Path(catalog.defaultTablePath(name)).toUri),
    --- End diff --
    
    StringToURI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358192
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -131,7 +132,7 @@ class SessionCatalog(
        * does not contain a scheme, this path will not be changed after the default
        * FileSystem is changed.
        */
    -  private def makeQualifiedPath(path: String): Path = {
    +  private def makeQualifiedPath(path: URI): Path = {
    --- End diff --
    
    this method can return `URI` too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73995/testReport)** for PR 17149 at commit [`abfb6f5`](https://github.com/apache/spark/commit/abfb6f5431ae1fb9bfe073f4acdf65ed870f7fa8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r111104548
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    there's actually some discussion [about defining this properly](https://issues.apache.org/jira/browse/HADOOP-14217), related to the problem of "colons in object store paths". That's not going to help directly, but a sign of something which has historically been underspecified



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109863558
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    Up to my knowledge, HDFS's fully qualified path is a URI form. If we do not support this, that virtually means we are going to disallow the fully qualified path. I understand it sounds ambiguous but disallowing does not look a good solution.
    
    Also, if users might want to access to local files or S3 when default scheme is `hdfs`, I guess it requires changing the default scheme every time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73979/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109826890
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
    --- End diff --
    
    Also, it seems we have a similar util with `CatalogUtils.stringToURI` in `org.apache.spark.util.Utils.resolveURI`. Could we consolidate them? I did not replace it when I identified this because some existing tests in `org.apache.spark.util.UtilsSuite` were broken but I guess it would be fine if there is a coherent reason. These broken cases might be bugs.
    
    @windpiger could you double check my comments and open a followup if I was not wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r110060244
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    Thank you for confirming. Let me send a mail soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73840/testReport)** for PR 17149 at commit [`69a1646`](https://github.com/apache/spark/commit/69a16464dc179094af89cdc3077e279fe4656959).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109827045
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    Also, it seems we have a similar util with `CatalogUtils.stringToURI` in `org.apache.spark.util.Utils.resolveURI`. Could we consolidate them? I did not replace it when I identified this because some existing tests in `org.apache.spark.util.UtilsSuite` were broken but I guess it would be fine if there is a coherent reason. These broken cases might be bugs.
    
    @windpiger could you double check my comments and open a followup if I was not wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    @gatorsmile, Thanks for your pointer. There is a good discussion there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r111153599
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    Thank you. Your answer and the pointer helped me a lot to understand. 
    
    So, up to my understanding...
    
    There are few exceptional cases, for example, special characters or encoded characters in those URIs. Also, there are some corner cases that are not clearly defined about what `Path` expects.
    
    Nevertheless, the baseline is fully qualified URI with unescaped characters or the unescaped path part in URI as the documentation says but there may be some variants.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104361311
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -146,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
         try {
           sql(s"CREATE DATABASE $dbName")
           val db1 = catalog.getDatabaseMetadata(dbName)
    -      val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db")
    +      val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri
    --- End diff --
    
    ok, maybe we should implement them all? option[string] also exists in many places, and more complicate


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358761
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -146,7 +150,7 @@ case class CreateDataSourceTableAsSelectCommand(
           assert(table.schema.isEmpty)
     
           val tableLocation = if (table.tableType == CatalogTableType.MANAGED) {
    -        Some(sessionState.catalog.defaultTablePath(table.identifier))
    +        Some(new Path(sessionState.catalog.defaultTablePath(table.identifier)).toUri)
    --- End diff --
    
    the `defaultTablePath` should also return URI now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259621
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -365,10 +367,10 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
     
         val partition1 =
           CatalogTablePartition(Map("partCol1" -> "1", "partCol2" -> "2"),
    -        storageFormat.copy(locationUri = Some(newLocationPart1)))
    +        storageFormat.copy(locationUri = Some(new Path(newLocationPart1).toUri)))
    --- End diff --
    
    isn't `newLocationPart1` already a `URI`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104488815
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -1686,4 +1689,162 @@ class HiveDDLSuite
           }
         }
       }
    +
    +  Seq("a b", "a:b", "a%b").foreach { specialChars =>
    --- End diff --
    
    please add a TODO to remove these duplicated tests when we merge DDLSuite and HiveDDLSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104261785
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -565,7 +565,8 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
     
             val newLocation = tableDefinition.storage.locationUri
             val storageWithPathOption = tableDefinition.storage.copy(
    -          properties = tableDefinition.storage.properties ++ newLocation.map("path" -> _))
    +          properties =
    +            tableDefinition.storage.properties ++newLocation.map("path" -> new Path(_).toString))
    --- End diff --
    
    we should also add a util method `URIToString` and add comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73977/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109865610
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    In addition, I guess we alway have a lot of tests with URI input paths and I did many of them to pass the tests on Windows, which I guess implicitly some committers do not disagree with this.
    
    IMHO, I guess URIs should be supported first correctly and local path should be supported as abbreviation because those local path form is abbreviation of the fully qualified path. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73996/testReport)** for PR 17149 at commit [`80f2c40`](https://github.com/apache/spark/commit/80f2c40aaeef4d4a00186107057e21ac7c3f9d4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358453
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -553,7 +555,7 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
       test("alter partitions") {
         val catalog = newBasicCatalog()
         try {
    -      val newLocation = newUriForDatabase()
    +      val newLocation = new Path(newUriForDatabase()).toUri
    --- End diff --
    
    unnecessary change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r110058436
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -285,7 +285,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             // compatible format, which means the data source is file-based and must have a `path`.
             require(table.storage.locationUri.isDefined,
               "External file-based data source table must have a `path` entry in storage properties.")
    -        Some(new Path(table.location).toUri.toString)
    --- End diff --
    
    yea, please go ahead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104261398
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -1843,28 +1846,28 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
                  |OPTIONS(path "$dir")
                """.stripMargin)
             val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
    -        assert(table.location == dir.getAbsolutePath)
    +        assert(table.location == new URI(dir.getAbsolutePath))
     
             dir.delete
    -        val tableLocFile = new File(table.location)
    -        assert(!tableLocFile.exists)
    +//        val tableLocFile = new File(table.location)
    --- End diff --
    
    why comment it out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104359071
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -72,14 +73,14 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
     
       private def createDatabase(catalog: SessionCatalog, name: String): Unit = {
         catalog.createDatabase(
    -      CatalogDatabase(name, "", spark.sessionState.conf.warehousePath, Map()),
    +      CatalogDatabase(name, "", new Path(spark.sessionState.conf.warehousePath).toUri, Map()),
    --- End diff --
    
    StringToURI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    yea, please go ahead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109827813
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    If you reproduce it as a bug, please submit a PR to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73842/testReport)** for PR 17149 at commit [`e792cb6`](https://github.com/apache/spark/commit/e792cb694b37e9d24358f5aded60dc4f891949ad).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73842/testReport)** for PR 17149 at commit [`e792cb6`](https://github.com/apache/spark/commit/e792cb694b37e9d24358f5aded60dc4f891949ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74000/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259724
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -553,21 +555,21 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
       test("alter partitions") {
         val catalog = newBasicCatalog()
         try {
    -      val newLocation = newUriForDatabase()
    +      val newLocationUri = new Path(newUriForDatabase()).toUri
    --- End diff --
    
    keep the previous name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104266763
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -883,7 +885,7 @@ abstract class CatalogTestUtils {
     
       def newFunc(): CatalogFunction = newFunc("funcName")
     
    -  def newUriForDatabase(): String = Utils.createTempDir().toURI.toString.stripSuffix("/")
    +  def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/"))
    --- End diff --
    
    `Utils.createTempDir().toURI` has a suffix '/', here we have to strip it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104259668
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -508,7 +510,7 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
           partitionColumnNames = Seq("partCol1", "partCol2"))
         catalog.createTable(table, ignoreIfExists = false)
     
    -    val tableLocation = catalog.getTable("db1", "tbl").location
    +    val tableLocationPath = new Path(catalog.getTable("db1", "tbl").location)
    --- End diff --
    
    let's keep the previous name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358996
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -95,7 +96,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
       // Create the default database if it doesn't exist.
       {
         val defaultDbDefinition = CatalogDatabase(
    -      SessionCatalog.DEFAULT_DATABASE, "default database", warehousePath, Map())
    +      SessionCatalog.DEFAULT_DATABASE, "default database", new Path(warehousePath).toUri, Map())
    --- End diff --
    
    URIToString?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Our parser might need a change regarding escape handling. We are having a related discussion in another PR: https://github.com/apache/spark/pull/15398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r109854005
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
             "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " +
               "you can only specify one of them.", ctx)
         }
    -    val customLocation = storage.locationUri.orElse(location)
    +    val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_)))
     
    --- End diff --
    
    Hm.. is the input path always expected to be a path? I thought we support both URI and path forms.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104260144
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -397,7 +398,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder {
         val tableDesc = CatalogTable(
           identifier = table,
           tableType = tableType,
    -      storage = storage.copy(locationUri = customLocation),
    +      storage = storage.copy(locationUri = customLocation.map{ loc =>
    --- End diff --
    
    also add comments to explain why we don't create `URI` directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358620
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala ---
    @@ -883,7 +885,7 @@ abstract class CatalogTestUtils {
     
       def newFunc(): CatalogFunction = newFunc("funcName")
     
    -  def newUriForDatabase(): String = Utils.createTempDir().toURI.toString.stripSuffix("/")
    +  def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/"))
    --- End diff --
    
    I think it's ok if we always compare URI with Path, instead of converting it to string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104359096
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -146,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
         try {
           sql(s"CREATE DATABASE $dbName")
           val db1 = catalog.getDatabaseMetadata(dbName)
    -      val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db")
    +      val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri
    --- End diff --
    
    StringToURI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #74000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74000/testReport)** for PR 17149 at commit [`681db88`](https://github.com/apache/spark/commit/681db88b90d71c2302bfa87dcaa6140afd1c8f40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by cnauroth <gi...@git.apache.org>.
Github user cnauroth commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    @HyukjinKwon , nice to meet you!  I see I got notified here for a bit of Hadoop `Path` knowledge, and particularly on Windows.
    
    > Is it okay to use both URIs and local file paths for the input string for org.apache.hadoop.fs.Path in general (when they are expected to be unescaped)?
    
    Yes, this is correct.
    
    Specifically on the topic of Windows, `Path` has special case logic for handling a Windows-specific local file path.  (This logic is only triggered if it detects the runtime OS is Windows.)  On Windows, I expect a call like `new Path("C:\\foo\\bar").toUri` to yield a correct `URI` pointing at that local file path, and further calling `toString` yields a correct `String` representation of the path.  Hadoop code often needs to take a path string that is possibly a relative path and pass it through `Path` to make it absolute and escape it according to Hadoop code expectations.
    
    The standard invocation for doing this in the Hadoop code is `new Path(...).toUri();` or `new Path(...).toUri().toString();`.  This works across all platforms.  I don't have any knowledge of the Spark codebase, but I see this patch uses similar invocations, so I expect it's good.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL][WIP]location for table/partition/data...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104358658
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
    @@ -21,6 +21,7 @@ import scala.collection.JavaConverters._
     
     import org.antlr.v4.runtime.{ParserRuleContext, Token}
     import org.antlr.v4.runtime.tree.TerminalNode
    +import org.apache.hadoop.fs.Path
    --- End diff --
    
    we can remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104506871
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -1686,4 +1689,162 @@ class HiveDDLSuite
           }
         }
       }
    +
    +  Seq("a b", "a:b", "a%b").foreach { specialChars =>
    --- End diff --
    
    This introduces the conflict in my PR https://github.com/apache/spark/pull/16592. Let me remove the duplicate there. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17149#discussion_r104360436
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -146,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
         try {
           sql(s"CREATE DATABASE $dbName")
           val db1 = catalog.getDatabaseMetadata(dbName)
    -      val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db")
    +      val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri
    --- End diff --
    
    I think `StringToURI` should convert `String` to `URI`, not `Option[String]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #74004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74004/testReport)** for PR 17149 at commit [`5b92620`](https://github.com/apache/spark/commit/5b92620d66a1863e92690a28a036597856287fa9).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    cc @cloud-fan @gatorsmile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17149
  
    **[Test build #73996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73996/testReport)** for PR 17149 at commit [`80f2c40`](https://github.com/apache/spark/commit/80f2c40aaeef4d4a00186107057e21ac7c3f9d4b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org