You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/05/01 05:46:31 UTC
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/12818
[MINOR][SQL] Remove not affected settings for writing in CSV.
## What changes were proposed in this pull request?
This PR removes not affected settings for writing CSV files.
- `setComment()` : It seems Univocity parser supports to write comments but only by both [commentRow()](https://github.com/uniVocity/univocity-parsers/blob/93d1fb6437bdeb1531f27156e18e8d8ca7af572f/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L751-L753) and [commentRowToString()](https://github.com/uniVocity/univocity-parsers/blob/93d1fb6437bdeb1531f27156e18e8d8ca7af572f/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L1392-L1394). Both methods are not used in Spark.
- `setLineSeparator()` : It seems this setting is only enabled when writing multiple lines and comments. However, CSV datasource does not write comments and write line by line by Hadoop's `LineRecordWriter`. Spark only uses this library with `\n` line separator with `LineRecordWriter`. Also, the end of each CSV line produced by Univocity is trimmed. (See [CSVParser.scala#L89](https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L89))
## How was this patch tested?
Unittests in `CSVSuite` and `./build/sbt scalastyle`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark minor-remove-unused
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12818.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12818
----
commit 9b570db3e777e728ea5215cf4affef6dfe1464e5
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-05-01T05:28:16Z
Remove setting options not affected for writing in CSV.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216024726
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216738292
I just read the rest of this ticket. Be careful with the `setLineSeparator()`. It uses the default OS line separator but that's not always desired.
By default, this is used to transform the `normalizedLineSeparator` when writing. If you are running on Windows this will be:
```
normalizedLineSeparator = '\n'
lineSeparator = '\r\n'
```
Then write a value such as `"my \n multi line \n value"`
You will end up with `"my \r\n multi line \r\n value"`
If you actually want to have `"my \n multi line \n value"` you must either:
- set the line separator explicitly to `\n`
- set `normalizeLineEndingsWithinQuotes` to `false`, so whatever is in the input will be written to the output, without any line ending transformation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216023502
cc @falaki
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216747285
Foo and bar are part of the same value, they just happen to have a line ending in between. And yes `setLineSeparator()` it is related to the values themselves when writing, unless `normalizeLineEndingsWithinQuotes=false`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216734653
Please allow me cc you, @jbax, who I guess the author of Univocity parser.
Could you please confirm that `Format.setComment()` is not affected if we only calls `CsvWriter.writeRow(...)` and `CsvWriter.writeHeaders()` for writing CSV files?
I think `setComment()` can be confirmed by you.
(For `setLineSeparator()` I think this can be checked enough within Spark code.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216024696
**[Test build #57468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57468/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216743678
@jbax Ah, I guess `foo` and `bar` are separate rows, right? `stripLineEnd` will be applied for each row.
If I got you wrong and `setLineSeparator()` is related with the values themselves, then I should not take that away.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216761919
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57720/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216761918
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216749347
**[Test build #57718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57718/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216733974
Hi @falaki, could you take a quick look? it won't be too long!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216019892
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216749947
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57718/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216761768
**[Test build #57720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57720/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216024734
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57470/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216765934
Closing this. I will bring up this again maybe in https://github.com/apache/spark/pull/12268.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216020146
**[Test build #57468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57468/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216020147
**[Test build #57470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57470/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216743260
What happens if you do this:
```
scala> "foo\r\nbar\r\n".stripLineEnd
```
Shouldn't the result be this?
```
res0: String = foo\r\n
bar
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216765450
@rxin If it is too minor to merge, I can close and then do this in another PR maybe after investigating the newline stuff discussed above.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216737346
Confirmed. It is only used if you call `CsvWriter.commentRow()` or `CsvWriter.commentRowToString()` to write comments to the output.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216024733
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216764368
@rxin in terms of funtionalities and performance, No.
But it shortens codes and I thought it is confusing whether `comment` option in `CSVOptions` affects writing or not when I look at the codes because it is setting the `comment` from the `CSVOptions` to `setComment()`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by jbax <gi...@git.apache.org>.
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216770729
By the way, may I suggest you guys to upgrade to version 2.1.0 as it comes with substantial performance improvements for parsing and writing CSV.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216765525
Yea I haven't spent much time looking but it doesn't seem worth to me to remove this. We might also use comment in the future.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216763362
Are there any benefits to removing this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216024727
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57468/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216749940
**[Test build #57718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57718/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216748780
Oh, I misunderstood your first comment.I think I should not take out `setLineSeparator()` here but maybe I should open another issue ticket to set `normalizeLineEndingsWithinQuotes=false` with some tests. Thank you for correcting me!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216750176
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216763118
@rxin To cut it short, I got a confirm, from the original author of Univocity, `setComment()` has no effect as long as Spark does not write comments from `DataFrame`. `CsvWriter.commentRow()` and `CsvWriter.commentRowToString()` are not called in Spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216740931
@jbax Cool! Thank you for detailed explanation.
So, this uses OS default newline without `setLineSeparator()`, which is trimmed [here](https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L89) for each row in Spark.
```scala
scala> "foo\n".stripLineEnd
res0: String = foo
scala> "foo\r\n".stripLineEnd
res1: String = foo
```
(FYI, actually, currently Saprk writes line by line but [it opens and closes `CSVWriter` for each row](https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L79-L91), which definitely should be refactored. This uses another Haddop thridparty library `LineRecordWriter` to actually write the each line parsed to a string by Univocity. So, I noticed `setLineSeparator()` can be ignorable.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216772060
Thank you @jbax. I will try to do so after checking If I can identify any useful changes with Spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216750238
**[Test build #57720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57720/consoleFull)** for PR 12818 at commit [`3b289d9`](https://github.com/apache/spark/commit/3b289d9084f8c49e1c8418d7636c5b8df07c1463).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:
https://github.com/apache/spark/pull/12818
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216749945
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216024701
**[Test build #57470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57470/consoleFull)** for PR 12818 at commit [`9b570db`](https://github.com/apache/spark/commit/9b570db3e777e728ea5215cf4affef6dfe1464e5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org