You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/05/01 05:46:31 UTC

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/12818

    [MINOR][SQL] Remove not affected settings for writing in CSV.

    ## What changes were proposed in this pull request?
    
    This PR removes not affected settings for writing CSV files.
    
    - `setComment()` : It seems Univocity parser supports to write comments but only by both [commentRow()](https://github.com/uniVocity/univocity-parsers/blob/93d1fb6437bdeb1531f27156e18e8d8ca7af572f/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L751-L753) and [commentRowToString()](https://github.com/uniVocity/univocity-parsers/blob/93d1fb6437bdeb1531f27156e18e8d8ca7af572f/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L1392-L1394). Both methods are not used in Spark.
    
    - `setLineSeparator()` : It seems this setting is only enabled when writing multiple lines and comments. However, CSV datasource does not write comments and write line by line by Hadoop's `LineRecordWriter`. Spark only uses this library with `\n` line separator with `LineRecordWriter`. Also, the end of each CSV line produced by Univocity is trimmed. (See [CSVParser.scala#L89](https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L89))
    ## How was this patch tested?
    
    Unittests in `CSVSuite` and `./build/sbt scalastyle`
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark minor-remove-unused

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12818
    
----
commit 9b570db3e777e728ea5215cf4affef6dfe1464e5
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-05-01T05:28:16Z

    Remove setting options not affected for writing in CSV.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216024726
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216738292
  
    I just read the rest of this ticket. Be careful with the `setLineSeparator()`. It uses the default OS line separator but that's not always desired.
    
    By default, this is used to transform the `normalizedLineSeparator` when writing. If you are running on Windows this will be:
    
    ```
    normalizedLineSeparator = '\n'
    lineSeparator = '\r\n'
    ```
    
    Then write a value such as `"my \n multi line \n value"`
    
    You will end up with `"my \r\n multi line \r\n value"`
    
    If you actually want to have  `"my \n multi line \n value"` you must either:
    - set the line separator explicitly to `\n`
    - set `normalizeLineEndingsWithinQuotes` to `false`, so whatever is in the input will be written to the output, without any line ending transformation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216023502
  
    cc @falaki


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216747285
  
    Foo and bar are part of the same value, they just happen to have a line ending in between. And yes `setLineSeparator()` it is related to the values themselves when writing, unless `normalizeLineEndingsWithinQuotes=false`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216734653
  
    Please allow me cc you, @jbax, who I guess the author of Univocity parser. 
    Could you please confirm that `Format.setComment()` is not affected if we only calls `CsvWriter.writeRow(...)` and `CsvWriter.writeHeaders()` for writing CSV files?
    
    I think `setComment()` can be confirmed by you. 
    (For `setLineSeparator()` I think  this can be checked enough within Spark code.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216024696
  
    **[Test build #57468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57468/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216743678
  
    @jbax Ah, I guess `foo` and `bar` are separate rows, right? `stripLineEnd` will be applied for each row.
    
    If I got you wrong and `setLineSeparator()` is related with the values themselves, then I should not take that away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216761919
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57720/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216761918
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216749347
  
    **[Test build #57718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57718/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216733974
  
    Hi @falaki, could you take a quick look? it won't be too long!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216019892
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216749947
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57718/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216761768
  
    **[Test build #57720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57720/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216024734
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57470/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216765934
  
    Closing this. I will bring up this again maybe  in https://github.com/apache/spark/pull/12268.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216020146
  
    **[Test build #57468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57468/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216020147
  
    **[Test build #57470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57470/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216743260
  
    What happens if you do this:
    ```
    scala> "foo\r\nbar\r\n".stripLineEnd
    ```
    Shouldn't the result be this?
    ```
    res0: String = foo\r\n
    bar
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216765450
  
    @rxin If it is too minor to merge, I can close and then do this in another PR maybe after investigating the newline stuff discussed above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216737346
  
    Confirmed. It is only used if you call `CsvWriter.commentRow()` or  `CsvWriter.commentRowToString()` to write comments to the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216024733
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216764368
  
    @rxin in terms of funtionalities and performance, No. 
    
    But it shortens codes and I thought it is confusing whether `comment` option in `CSVOptions` affects writing or not when I look at the codes because it is setting the `comment` from the `CSVOptions` to `setComment()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216770729
  
    By the way, may I suggest you guys to upgrade to version 2.1.0 as it comes with substantial performance improvements for parsing and writing CSV.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216765525
  
    Yea I haven't spent much time looking but it doesn't seem worth to me to remove this. We might also use comment in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216763362
  
    Are there any benefits to removing this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216024727
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57468/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216749940
  
    **[Test build #57718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57718/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216748780
  
    Oh, I misunderstood your first comment.I think I should not take out `setLineSeparator()` here but maybe I should open another issue ticket to set `normalizeLineEndingsWithinQuotes=false` with some tests. Thank you for correcting me!  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216750176
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216763118
  
    @rxin To cut it short, I got a confirm, from the original author of Univocity, `setComment()` has no effect as long as Spark does not write comments from `DataFrame`. `CsvWriter.commentRow()` and `CsvWriter.commentRowToString()` are not called in Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216740931
  
    @jbax Cool! Thank you for detailed explanation.
    
    So, this uses OS default newline without `setLineSeparator()`, which is trimmed [here](https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L89) for each row in Spark.
    
    ```scala
    scala> "foo\n".stripLineEnd
    res0: String = foo
    
    scala> "foo\r\n".stripLineEnd
    res1: String = foo
    ```
    
    
    (FYI, actually, currently Saprk writes line by line but [it opens and closes `CSVWriter` for each row](https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L79-L91), which definitely should be refactored. This uses another Haddop thridparty library `LineRecordWriter` to actually write the each line parsed to a string by Univocity. So, I noticed `setLineSeparator()` can be ignorable.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216772060
  
    Thank you @jbax. I will try to do so after checking If I can identify any useful changes with Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216750238
  
    **[Test build #57720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57720/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:

    https://github.com/apache/spark/pull/12818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216749945
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12818#issuecomment-216024701
  
    **[Test build #57470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57470/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org