You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jurriaan <gi...@git.apache.org> on 2016/05/28 10:18:20 UTC

[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

GitHub user jurriaan opened a pull request:

    https://github.com/apache/spark/pull/13374

    [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameWriter

    ## What changes were proposed in this pull request?
    
    Adds an quoteAll option for writing CSV which will quote all fields.
    See https://issues.apache.org/jira/browse/SPARK-13638
    
    ## How was this patch tested?
    
    Added a test to verify the output columns are quoted for all fields in the Dataframe
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jurriaan/spark csv-quote-all

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13374
    
----
commit 5a64a3ad7e1ba524ca278dd6d2f0f2a46cd67cf6
Author: Jurriaan Pruis <em...@jurriaanpruis.nl>
Date:   2016-05-28T10:01:32Z

    [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameWriter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameW...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    Rebased, ping @rxin @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add escapeAll option to CSV DataFrame...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    @rxin Sorry for the confusion, fixed it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13374#issuecomment-222307250
  
    **[Test build #3031 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3031/consoleFull)** for PR 13374 at commit [`5a64a3a`](https://github.com/apache/spark/commit/5a64a3ad7e1ba524ca278dd6d2f0f2a46cd67cf6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add quoteAll option to CSV Dat...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13374#discussion_r69279736
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -745,6 +748,8 @@ def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=No
                 self.option("nullValue", nullValue)
             if escapeQuotes is not None:
                 self.option("escapeQuotes", nullValue)
    +        if escapeAll is not None:
    +            self.option("escapeAll", nullValue)
    --- End diff --
    
    Wow!, we should fix this `escapeQuotes` thing too..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add escapeAll option to CSV DataFrame...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    @jurriaan should this be called quoteAll rather than escapeAll?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameW...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    **[Test build #3171 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3171/consoleFull)** for PR 13374 at commit [`e42ab25`](https://github.com/apache/spark/commit/e42ab25c9d87d6ac504b419abfebcb295c3f0d44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add escapeAll option to CSV Da...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13374#discussion_r69283761
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -366,6 +366,32 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils {
         }
       }
     
    +  test("save csv with quoteAll enabled") {
    --- End diff --
    
    Fixed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13374#discussion_r64988710
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala ---
    @@ -76,6 +76,7 @@ private[sql] class LineCsvWriter(params: CSVOptions, headers: Seq[String]) exten
       writerSettings.setQuoteAllFields(false)
    --- End diff --
    
    Shouldn't we remove this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add quoteAll option to CSV Dat...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13374


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameW...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    **[Test build #3171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3171/consoleFull)** for PR 13374 at commit [`e42ab25`](https://github.com/apache/spark/commit/e42ab25c9d87d6ac504b419abfebcb295c3f0d44).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13374#discussion_r64992937
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala ---
    @@ -76,6 +76,7 @@ private[sql] class LineCsvWriter(params: CSVOptions, headers: Seq[String]) exten
       writerSettings.setQuoteAllFields(false)
    --- End diff --
    
    Fixed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13374#issuecomment-222307277
  
    **[Test build #3031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3031/consoleFull)** for PR 13374 at commit [`5a64a3a`](https://github.com/apache/spark/commit/5a64a3ad7e1ba524ca278dd6d2f0f2a46cd67cf6).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add quoteAll option to CSV Dat...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13374#discussion_r69266623
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -745,6 +748,8 @@ def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=No
                 self.option("nullValue", nullValue)
             if escapeQuotes is not None:
                 self.option("escapeQuotes", nullValue)
    +        if escapeAll is not None:
    +            self.option("escapeAll", nullValue)
    --- End diff --
    
    I guess this should be...
    
    ```
    self.option("escapeAll", escapeAll)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameW...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    Merging in master. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add escapeAll option to CSV DataFrame...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    I thought it should be named in line with the escapeQuotes method, but what it's doing is more like quoting all values then escaping all. So i guess that name could make sense after all


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13374#issuecomment-222301359
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add quoteAll option to CSV Dat...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13374#discussion_r69266538
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ---
    @@ -366,6 +366,32 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils {
         }
       }
     
    +  test("save csv with quoteAll enabled") {
    --- End diff --
    
    Hey, it seems `escapeAll` and `quoteAll` are mixed.. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add escapeAll option to CSV DataFrame...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    Yup... would be great if you can update this. Otherwise LGTM.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13638][SQL] Add quoteAll option to CSV ...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on the pull request:

    https://github.com/apache/spark/pull/13374#issuecomment-222301261
  
    cc @rxin @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameW...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    **[Test build #3172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3172/consoleFull)** for PR 13374 at commit [`e42ab25`](https://github.com/apache/spark/commit/e42ab25c9d87d6ac504b419abfebcb295c3f0d44).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13374: [SPARK-13638][SQL] Add quoteAll option to CSV DataFrameW...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13374
  
    **[Test build #3172 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3172/consoleFull)** for PR 13374 at commit [`e42ab25`](https://github.com/apache/spark/commit/e42ab25c9d87d6ac504b419abfebcb295c3f0d44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org