You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by henryr <gi...@git.apache.org> on 2018/01/23 01:26:48 UTC

[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

GitHub user henryr opened a pull request:

    https://github.com/apache/spark/pull/20355

    SPARK-23148: [SQL] Allow pathnames with special characters for CSV / …

    …JSON / text
    
    ## What changes were proposed in this pull request?
    
    Fix for JSON and CSV data sources when file names include characters
    that would be changed by URL encoding.
    
    ## How was this patch tested?
    
    New unit tests for JSON, CSV and text suites


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/henryr/spark spark-23148

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20355
    
----
commit 98da4cd4b865ef8944a10a24ee709ed9ee0a4721
Author: Henry Robinson <he...@...>
Date:   2018-01-19T22:55:53Z

    SPARK-23148: [SQL] Allow pathnames with special characters for CSV / JSON / text
    
    ## What changes were proposed in this pull request?
    
    Fix for JSON and CSV data sources when file names include characters
    that would be changed by URL encoding.
    
    ## How was this patch tested?
    
    New unit tests for JSON, CSV and text suites

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86554/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86577/testReport)** for PR 20355 at commit [`51b0db5`](https://github.com/apache/spark/commit/51b0db5272858995a3b3e389a851f4e4aa04c651).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86502/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86552/testReport)** for PR 20355 at commit [`740def4`](https://github.com/apache/spark/commit/740def4c9a96a7dba5a8f57c49042dee661608b6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163137474
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala ---
    @@ -172,6 +172,14 @@ class TextSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  test("SPARK-23148: test for spaces in file names") {
    --- End diff --
    
    Shall we remove this test case and only leave test CSV / JSON with `multiLine` enabled? I think other tests are basically duplicates of https://github.com/apache/spark/blob/a0aedb0ded4183cc33b27e369df1cbf862779e26/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala#L71.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86503/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Let's don't forget to fix the PR title to `[SPARK-23148][SQL] ... `. This style is actually documented to be encouraged in this project contributing guide. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163434591
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -23,6 +23,7 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
       import testImplicits._
     
       private val allFileBasedDataSources = Seq("orc", "parquet", "csv", "json", "text")
    +  private val nameWithSpecialChars = s"sp&cial%c hars"
    --- End diff --
    
    nit: seems prefix `s` can be removed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86552/testReport)** for PR 20355 at commit [`740def4`](https://github.com/apache/spark/commit/740def4c9a96a7dba5a8f57c49042dee661608b6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/117/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86554/testReport)** for PR 20355 at commit [`1b9420a`](https://github.com/apache/spark/commit/1b9420a4d6fd2b202d6b076405621773d8f9e80d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86505/testReport)** for PR 20355 at commit [`9c56736`](https://github.com/apache/spark/commit/9c567368105522537977437532c3e45c6fd07296).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163129347
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2063,4 +2063,12 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           )
         }
       }
    +
    +  test("SPARK-23148: test for spaces in file names") {
    +    withTempPath { dir =>
    --- End diff --
    
    Could we test `multiLine` here too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163455727
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  // Separate test case for text-based formats that support multiLine as an option.
    +  Seq("json", "csv", "text").foreach { format =>
    --- End diff --
    
    That sounds good - my main concern is to make sure that both `multiLine=true` _and_ `multiLine=false` have coverage with a space in the name, since they are such different paths. I'll keep the change that adds a space to `nameWithSpecialChars`, but otherwise have the tests as you suggest - let me know what you think of the next patch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86502/testReport)** for PR 20355 at commit [`98da4cd`](https://github.com/apache/spark/commit/98da4cd4b865ef8944a10a24ee709ed9ee0a4721).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163411267
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala ---
    @@ -172,6 +172,14 @@ class TextSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  test("SPARK-23148: test for spaces in file names") {
    --- End diff --
    
    In the end, to reduce code duplication, I made it so that orc and parquet run multiline as well (I tried to find a neat way to only run multiline if the format was csv, text or json without having a separate test case but it just complicated things). Let me know if you'd rather I have two separate test cases to avoid running the two redundant cases with orc / parquet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: [SPARK-23148][SQL] Allow pathnames with special c...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20355


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86577/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86577/testReport)** for PR 20355 at commit [`51b0db5`](https://github.com/apache/spark/commit/51b0db5272858995a3b3e389a851f4e4aa04c651).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86503/testReport)** for PR 20355 at commit [`535234c`](https://github.com/apache/spark/commit/535234c57a9b2633c8dbd061577766c24bdbebca).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163146077
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala ---
    @@ -172,6 +172,14 @@ class TextSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  test("SPARK-23148: test for spaces in file names") {
    --- End diff --
    
    @HyukjinKwon good idea, thanks for pointing out that test - how about I just add a space to the special characters string, and have the existing test also test multiline on/off for text, csv and json?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86564/testReport)** for PR 20355 at commit [`51b0db5`](https://github.com/apache/spark/commit/51b0db5272858995a3b3e389a851f4e4aa04c651).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86505/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86552/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/171/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/116/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86554 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86554/testReport)** for PR 20355 at commit [`1b9420a`](https://github.com/apache/spark/commit/1b9420a4d6fd2b202d6b076405621773d8f9e80d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/183/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86564/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163414704
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -68,13 +68,16 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
       }
     
       allFileBasedDataSources.foreach { format =>
    -    test(s"SPARK-22146 read files containing special characters using $format") {
    -      val nameWithSpecialChars = s"sp&cial%chars"
    -      withTempDir { dir =>
    -        val tmpFile = s"$dir/$nameWithSpecialChars"
    -        spark.createDataset(Seq("a", "b")).write.format(format).save(tmpFile)
    -        val fileContent = spark.read.format(format).load(tmpFile)
    -        checkAnswer(fileContent, Seq(Row("a"), Row("b")))
    +    test(s"SPARK-22146 / SPARK-23148 read files containing special characters using $format") {
    +      val nameWithSpecialChars = s"sp&cial%c hars"
    +      Seq(true, false).foreach { multiline =>
    --- End diff --
    
    Less dup is fine but this case slightly confuses like orc and parquet support multiline, and runs duplicated tests as you pointed out if I should nitpick. I think I prefer a separate test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86503/testReport)** for PR 20355 at commit [`535234c`](https://github.com/apache/spark/commit/535234c57a9b2633c8dbd061577766c24bdbebca).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163152164
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala ---
    @@ -172,6 +172,14 @@ class TextSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  test("SPARK-23148: test for spaces in file names") {
    --- End diff --
    
    Alternatively, adding the test like below in `FileBasedDataSourceSuite.scala`, instead of `CSVSuite` and `JsonSuite`:
    
    ```scala
    // Only CSV/JSON supports multiLine option and the code paths are different blabla ..
    Seq("csv", "json").foreach { format =>
      test(s"SPARK-23148 read files containing special characters using $format - multLine enabled") {
        val nameWithSpecialChars = s"sp&ci al%chars"
        withTempDir { dir =>
          ... spark.read.option("multuLine", true).format(format)...
        }
      }
    }
    ```
    
    could be also fine if possible. Either way is fine to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163149134
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala ---
    @@ -172,6 +172,14 @@ class TextSuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  test("SPARK-23148: test for spaces in file names") {
    --- End diff --
    
    Let's just add both `multiLine` specific tests for `CSVSuite` and `JsonSuite` and add a space in "SPARK-22146 read files containing special characters using ..." if you prefer this way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163434524
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
           }
         }
       }
    +
    +  // Separate test case for text-based formats that support multiLine as an option.
    +  Seq("json", "csv", "text").foreach { format =>
    --- End diff --
    
    Actually "text" doesn't support `multiLine` but `wholetext` which runs another code path. Let's take this out.
    
    How about leaving out the tests for `SPARK-22146` as was, and just adding a test dedicated for `multiline` here like:
    
    ```scala
        Seq("json", "csv").foreach { format =>
          test("SPARK-23148 read files containing special characters " +
            s"using $format multiline enabled") {
            withTempDir { dir =>
              val tmpFile = s"$dir/$nameWithSpecialChars"
              spark.createDataset(Seq("a", "b")).write.format(format).save(tmpFile)
              val reader = spark.read.format(format).option("multiLine", true)
              val fileContent = reader.load(tmpFile)
              checkAnswer(fileContent, Seq(Row("a"), Row("b")))
            }
          }
        }
    ```
    
    This PR changes really fix the code paths for both CSV / JSON when `multiLine` is enabled only .. I am not sure why you don't like this suggestion .. one test less invasive and more targeted for one JIRA ... 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/160/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20355: SPARK-23148: [SQL] Allow pathnames with special c...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163422934
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -68,13 +68,16 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
       }
     
       allFileBasedDataSources.foreach { format =>
    -    test(s"SPARK-22146 read files containing special characters using $format") {
    -      val nameWithSpecialChars = s"sp&cial%chars"
    -      withTempDir { dir =>
    -        val tmpFile = s"$dir/$nameWithSpecialChars"
    -        spark.createDataset(Seq("a", "b")).write.format(format).save(tmpFile)
    -        val fileContent = spark.read.format(format).load(tmpFile)
    -        checkAnswer(fileContent, Seq(Row("a"), Row("b")))
    +    test(s"SPARK-22146 / SPARK-23148 read files containing special characters using $format") {
    +      val nameWithSpecialChars = s"sp&cial%c hars"
    +      Seq(true, false).foreach { multiline =>
    --- End diff --
    
    Sounds good to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86564/testReport)** for PR 20355 at commit [`51b0db5`](https://github.com/apache/spark/commit/51b0db5272858995a3b3e389a851f4e4aa04c651).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Please fix the PR title.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: [SPARK-23148][SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86505/testReport)** for PR 20355 at commit [`9c56736`](https://github.com/apache/spark/commit/9c567368105522537977437532c3e45c6fd07296).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/119/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    **[Test build #86502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86502/testReport)** for PR 20355 at commit [`98da4cd`](https://github.com/apache/spark/commit/98da4cd4b865ef8944a10a24ee709ed9ee0a4721).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20355: SPARK-23148: [SQL] Allow pathnames with special characte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20355
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/162/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org